Please read this page from the main YASEP interface
The YASEP's registers and memory
yasep/doc/reg-mem.en.html version 2012-09-30

The YASEP does not use a Load-Store architecture, where memory is accessed through specific instructions. Unlike the huge majority of existing processors, it accesses memory through registers, somewhat like the CDC6600 CPU (except that the YASEP's registers are not dedicated for reads or writes). This saves a bunch of opcodes, increases memory bandwidth per instruction and keeps the instructions orthogonal.

The YASEP is architected around 16 registers. Each of these registers can be of one of 4 types. Here is the list in physical order:

(broken ?)

It's more complex than a traditional RISC architecture but this reduces the number of needed opcodes because each opcode can perform several different actions.

Only two status registers (Carry and Zero) are available, they are accessed only through the conditional instructions. The method to save and restore them (should an exception occur) is not yet defined.


The "normal" registers

R1, R2, R3, R4 and R5 are normal registers, just like in other RISC architectures. One can write and read from them, without any implicit side effect.

; Example 1 MOV 1234h R2   ; Set register R2 with the value 1234h
; Example 2
 ADD 4 R1       ; R1 <- R1 + 4
; Example 3
 ADD 32 R1 R3   ; R3 <- R1 + 32

These register typically hold temporary results of computations, loop counters, function call parameters... The extended instruction form can also increment or decrement them. (ongoing development)


The "PC" register

This register is less usual : PC is the pointer to the currently executing instruction. It is automatically incremented (by 2 or 4, depending on the instruction length) after each new instruction, and can be read and written by a program.

The YASEP's instructions are encoded on 2 or 4 bytes but all addresses have a byte granularity and the instructions are always aligned on even addresses, so the LSB of the address is always implicitly equal to 0.

However do not rely on this: writing an odd address to the PC could result in a CPU freeze/hang, to a software trap or even reboot. The LSB is a reserved bit, "nothing" may happen but always keep it clear !


The "Address" registers

A1, A2, A3, A4 and A5 are the "address registers". They contain the address where data will be read or written in memory. The extended instruction form can also increment or decrement them.


The "Data" registers

D1, D2, D3, D4 and D5 are the "Data registers" and they are closely related to the Address registers : A1 is bound to D1, A2 to D2 etc.

Each data register contains the value of the memory pointed by the associated Address register, the property Dx=memory[Ax] is always preserved by the CPU for each pair.

Note that if a data register is read and written in the same cycle, as both a source and destination, the instruction effectively updates the memory location. This allows "RMW" ("read-modify-write") with a clean RISC core.

; Example 1
 MOV 1234h A1    ; Point A1 to the address 1234h ==> D1 contains the value at this address
 ADD D1 R3       ; Load the contents of [1234] and add it to register R3.
; Example 2
 MOV 1234h A3    ; Point A3 to the address 1234h ==> D3 contains the value at this address
 ADD D3 R2       ; Load the contents of [1234] and add it to register R2.
; Example 3
 ADD 1234h R1 A4 ; point A4 to the address R1+1234h ==> D4 contains the value at this address
 ADD R2 D4       ; load the contents of memory[R1+1234h],
                  ; add it to R2 and put the result back at the same address

When used with auto-decrement or auto-increment features, these register can implement stacks. By convention, A5 is the stack pointer and D5 is the stack top. However, nothing keeps one from creating 2 or 3 stacks, or even moving the standard stack to other registers.


Memory and pointer aliasing

When two (or more) Address registers point to the same location (or memory word), consistency of the values of the Data registers should not be expected after writes. For example, if A1=A2 then writing to D1 will likely not update D2 with the new value.

In the early implementations, it is not feasible to simultaneously write up to 5 Data registers and compare 5 Addresses registers. The pipeline length and the gate count would increase too much. In the simplest cases, the Data registers act like small buffers, which are preserved in the register set through context switches.

However, it is possible (and likely) that more sophisticated implementations solve this problem, using very different structures.

In any case, when aliasing is expected or possible, use critical sections (see the CRIT opcode) and use a single Address/Data pair to access these words. As illustrated by the last code example, read-modify-writes are best done (and shortest) when using just one register pair.


Memory alignment

Beware : the YASEP architecture is a little-endian, byte-oriented architecture (any pointer can address a byte) but ALL the memory accesses are aligned on a natural word boundary.

Unaligned accesses do not trigger an error or raise an exception. The Data registers will always contain aligned data from the memory, without shift or adjustment.

Or you could see it this way :

The "lost bits" address one of the bytes in the memory (half-)word.

; Example 1     (memory read)
 MOV 1231h A5    ; point A5 to the address 1231h (aligned on a byte boundary)
 MOV D5 R2       ; copy the word located at address 1230 into R2.
; Example 1 bis (YASEP32 only)
 MOV 1232h A1    ; point A1 to the address 1232h (aligned on a halfword boundary)
 MOV D1 R1       ; copy the word located at address 1230 into R1.

; Example 2     (write to memory)
 MOV 1231h A2    ; point A2 to the address 1231h (aligned on a byte boundary)
 MOV R2 D2       ; write the contents of R2 to memory location 1230
; Example 2 bis (YASEP32 only)
 MOV 1232h A3    ; point A3 to the address 1232h (aligned on a halfword boundary)
 MOV R4 D3       ; write the contents of R4 to memory location 1230

If bytes or half-words are treated individually, certain instructions perform the adjustments :

; Example 1'
 MOV 1231h A5    ; point A5 to the address 1231h (aligned on a byte boundary)
 ESB D5 R2       ; align and sign-extend the byte at [1231], write the result to R2.
; Example 1' bis (YASEP32 only)
 MOV 1232h A1    ; point A1 to the address 1232h (aligned on a halfword boundary)
 ESH D1 R1       ; align and sign-extend the halfword at [1232], write the result to R1.

; Example 2'
 MOV 1231h A2    ; point A2 to the address 1231h (aligned on a byte boundary)
 IB R2 D2        ; take the lower byte of R2, align and insert the result into D2.
; Example 2' bis (YASEP32 only)
 MOV 1232h A3    ; point A3 to the address 1232h (aligned on a halfword boundary)
 IH R4 D3        ; take the lower half-word of R4, align and insert the result into D3

Unaligned words or half-words must be reconstructed with instruction sequences.

(to be written)


Register Parking

(added to the blog on Tuesday 8 November 2011, 16:29)

As the YASEP architecture specifies, there are 5 normal registers (R1-R5) and 5 pairs of data/address registers (A1/D1, A2/D2...) and it's quite difficult to find the right balance between both : each application and approach requires a different optimal number of registers.

When more registers are needed (if you need R6 or R7) then you could assign them to D1 and D2 for example. However you have to set A1 and A2 to a safe location otherwise chaos could propagate in the software (that would write D1 and D2 to random places). Another issue is that each write to the A registers will update the memory.

Another unwanted situation appears if we use the Ax registers as normal registers : each write will trigger a memory read. And in paged/protected memory systems, this would kill the TLB by flushing it all the time and triggering an avalanche of page fault (and protection) exceptions...

This is "solved" with the "parking" system, which defines hardwired addresses and internal behaviour for using more data registers with less side effects.

"Parking" addresses are defined as "negative" addresses (that is : all the MSB are set to 1). This addressing range, at the "top" of the memory space, is normally not used, or used for special purposes, such as "fast constants" addressed by the short immediate values :

      MOV 6, A3 ; mem[6] contains a constant or a scratch value,
      MOV D3,... ;   whose address fits in 4 bits

To keep the "parking" system compatible with non-parked versions, the addresses are defined globally for all software. They are easy to remember, as the following code shows :

    ; Park all the registers
    MOV -1, A1
    MOV -2, A2
    MOV -3, A3
    MOV -4, A4
    MOV -5, A5

These will become macros or pseudo-instructions.

The internal numbering of the registers is changed to ease hardware implementation. There is a direct match between the binary register number and the binary code of the address :

park address    binary    reg.bin    reg.number   register
     -1          1111      1111         15           A1
     -2          1110      1101         13           A2
     -3          1101      1011         11           A3
     -4          1100      1001          9           A4
     -5          1011      0111          7           A5

Bits 0 to 2 of the parking address are identical to the bits 1 to 3 of the register number. This is very easy to detect with logic gates.

Architecturally, it does not change much. The Data registers can be "cached" by the register set. What the hardware parking system adds is just an inhibition of the "data write" signal that would occur normally each time the core writes to a D register.

Aliasing : No alias detection is expected. If A4/D4 writes to -2, D2 is not updated. Otherwise it would mean that the result bus could write to 5 registers in parallel, which is not reasonable.

Thread backup and restoration : the register set contains the cached version of the memory, it must be refreshed when a thread is restored (swapped in). If the Ax register matches a parked address, the memory doesn't need to be fetched to refresh the cache. Another solution is to save the Dx register through another Ax/Dx, so there is nothing to test during restoration (but memory read cycles could not be spared).

This sytem where the "parking" is defined by an auxiliary value (that is inherently preserved through context switches) is "cleaner" than a more radical approach where "status bits" (one per A/D pair) park the registers. The advantage of the radical approach is that two registers can be parked at once (instead of one) but it gets harder to use with a compiler or from user software (you can play with pointers in C or Pascal easily, though you won't be able to define which pair is used). On top of that, adding status/control bits is usually a nightmare, since 5 more bits have to be saved/restored...

In the end, it's not very complex (not as much as it seems). The hardware price is a few logic gates that detect the parking addresses to inhibit memory writes. For the software writer, it just means more registers on demand and it will work whether the YASEP has the parking hardware or not. You CAN have R6, R7 or R8 but then you'll have to restrict data access and give up A1/D1, A2/D2 and A3/D3. You make the choice !


The "Carry flag"

The carry flag is a 1-bit register that stores the carry or borrow bit of the last executed ADD or SUB instruction. The carry flag is set when an addition overflows :

.profile YASEP16
 MOV 5678h R1
 ADD CDEFh R1 R2  ; R2 <- 5678h + CDEFh = 12467h > FFFFh so the carry is set.

 ADD 1234h R1 R2  ; R2 <- 5678h + 1234h = 68ACh <= FFFFh so the carry is cleared.

This bit can then be tested by a conditional instruction :

 ADD 1234h R1    ; R1 <- R1 + 1234h (change the carry bit)
 ADD 1 R2 CARRY  ; IF the carry bit is 1, then add 1 to R2

The SUB opcode is based on the addition so the borrow condition is the same bit as the carry bit of ADD. However, with SUB, the value is negated so the carry bit is set when the substraction did not overflow :

MOV 4 R1
SUB 3 R1 R2 ; R2 = 3 - 4 = -1 ==> carry=0
SUB 4 R1 R2 ; R2 = 4 - 4 = 0  ==> carry=1
SUB 5 R1 R2 ; R2 = 5 - 4 = 1  ==> carry=1

Only the instructions that are flagged as "CHANGE_CARRY" can change the carry bit. Other instructions are CMPU and CMPS: they are similar to SUB but the destination is not written (write is inhibited). The 32-bits adjustment instructions ESH EZH and IH also set or reset this flag to signal an out-of-word access. All the other operations will preserve this bit.

The carry bit can be tested many cycles after the above instructions are executed, even after function calls or returns. The best way to clear or set the carry flag is to cleverly use the CMPU/CMPS instructions with operands that will affect the carry flag in a deterministic way :

; clear the carry flag :
 CMPU R1, R1   ; R1 equals R1 so the carry can't be set.
; set the flag :
 CMPU 0, PC   ; PC is (almost) always >0 so the carry is set.


The "Zero flag"

Similar to the precedent carry flag, this second 1-bit register is updated by looking at the result of the few opcodes flagged with CHANGE_ZERO.

It may look a bit redundant with the "register zero" condition, because any register can be tested for having all the bits cleared. However some instructions don't write the results of computations: CMPU/CMPS.