CS25: Computer Architecture, Lab 5

Making an EPIC processor and compiler.

by Jeff Kaufman

For this lab I designed an explicitly paralell instruction set, wrote a CPU design in VHDL to implement it, and then wrote a simple compiler to convert serial code to run efficiently on this processor.

An explicitly parallel instruction set makes sense because while all modern CPUs excecute multiple instructions at once, standard superscalar ones have to still present a serial interface to the outside world. This requires the processor to waste a lot of circutry on dependency checking and re-ordering instructions. With and explicitly parallel system we present a parallel interface and shift the burden of converting the serial code to parallel code over to the compiler or the programmer. This makes a lot of sense for several reasons:

The Instruction Set

The instruction set is fixed size and highly consistant to allow for fast decoding. The form of an instruction is:

opp data pred
opp reg reg pred
"LDD" val pred
"JMP" line pred
"ISZ" reg X pred pred
0 1 2 3 4 5 6 7 8 9 A B C D E
opp : Oppcode
reg : GP Register
val : Value
line : Line number
X : Unused bit
pred : Predicate register
"..." : A specific oppcode

The first register is always a source, the second always a destination. The second may also be a source, for instructions such as and which take two operands. Because instructions are of fixed size with their components all of fixed size, decoding can be very fast.

The full instruction set is:

0000 nop No operation performed
0001 mov reg 2 gets reg 1 if pred
0010 add reg 2 gets reg 1 + reg 2 if pred
0011 and reg 2 gets reg 1 & reg 2 if pred
0100 ldd register A gets val if pred
0101 jmp jump to line if pred
1000 inv reg 2 gets the bitwise inverse of reg 1 if pred
1001 sio Output reg 1 to the display if pred
1010 lio Input reg 2 from the DIP switches if pred
1011 rsh reg 2 gets a rightshited reg 1 if pred
1100 lsh reg 2 gets a leftshifted reg 1 if pred
1101 rsa reg 2 gets an arithmatically rightshifted reg 1 if pred
1111 isz pred gets (reg 1 == 0) if pred

Dependencies between the instructions are not dealt with by the CPU and must be dealt with by the programmer or the assembler. The minimum amount of spacing needed between instructions is as follows:

Instruction Type RAW WAR WAW
Register 3 -2 1
IO 1 1 1
Predicate 2 -1 1
Jump 4 -3 1
Note 1: Negative spacing means that the second instruction can be re-ordered to proceed the first
Note 2: For jumps, dependencies must be checked on both sides of the branch.

The processor takes the instructions three at a time, so each line of instruction memory should contain 3 fifteen-bit instructions.

The Processor

The processor has four pipeline stages with three simultaneous excecution units. It has 16 eight-bit general purpose registers and 8 one-bit predicate registers available. Each instruction goes through the folloing stages:

Fetch

The next instruction is brought down from memory into the instruction register (IR).

Evaluate Operands

For each instruction of the three in the IR, the operands are fetched. The processor will always look up the two possible register operands, even when only one or even none are needed.

Excecute

The processor now looks at the opcode for the first time and using the opcode and the decoded register values from the previous step figures out what register the instruction is supposed to set and what to set it to. The predicate also gets evaluated here.

Commit

For instructions with predicates that evaluated to 1, the processor sets the given register to the given value. For ones with a 0 predicate it does nothing.
The structure of the CPU is:

cpu layout
OCU : Operation Commit Unit
EU : Excecution Unit
OFU : Operand Fetch Unit
IR : Instruction Register
PC : Program Counter
DISPLAY : A 2-digit signed hexadecimal LED
DIP : An 8-bit input

The IO

The IO system is the same as the one in lab 3. That is, it has a 8-bit input from DIP switches and a 8-bit output as a signed hexadecimal number.

The Compiler

The compiler converts serial statements into the parallel statements that the CPU uses and outputs a binary mif file in the format expected by the VHDL compiler. It also resolves dependencies between these instructions by spacing them so they will not conflict. It does not put the instructions in out-of-order, though that is a feature that I would add if I had more time, as it looks like it could add major speedups.

I had initially planned to try and take the Ulimate RISC instruction set and make it an EPIC one, but Ultimate RISC doesn't really make for Ultimate EPIC. I figured that I would need to warp it so much to get it remotely efficient that I had better just start over with an EPIC set.

Jeff Kaufman : 2005
cbr at sccs dot swarthmore dot spam edu. Remove spam.

main page