Next: Assembly Language Programming Up: Computer Architecture - CSC Previous: The Components of a Computer


The Central Processing Unit (CPU)


This chapter describes the structure of a computer from the point of view Level 1 - the Microprogramming level. First we will give an overview of how a processor and memory function together to execute a single machine instruction. Then, a sequence of machine instructions - a program.

We will then build up a real Central Processing Unit (CPU) from subsystems - ALU, Registers etc. We will consider a CPU to consist of three primary parts:

The internal registers, the ALU and the connecting buses - sometimes called the data path;

The input-output interface, which is the gateway through which data are sent and received from main memory and input-output devices. Quite often this interface is shown as part of the data path. No harm, as long as you understand its interface role;

The control part, which directs the activities of the data path and input-output interface, e.g. opening and closing access to buses, selecting ALU function, etc.

A fourth part, main memory, is never far from the CPU but from a logical point of view is best kept separate.

We will pay most attention to the data path part of the processor, and what must happen in it to cause useful things to happen - to cause program instructions to be executed.

Finally, we will briefly describe how the control part can be implemented by microprogram, i.e. how the decoding of a machine instruction can lead to the execution of a set of sequencing steps called a microprogram.

Note on terminology: the term `microprogram' came about before microprocessors were ever dreamt of. It refers to a program in which each instruction is concerned with a very small event - like the sending of the contents of a register to the ALU.

Before you continue, you should review the material on multiplexers, decoders, ALUs, registers, buses, etc. in the previous chapter.

The Architecture of Mic-1

Figure 5.1 shows the data path part of our hypothetical CPU from [Tanenbaum, 1990], page 170 onwards.

Figure 5.1: Mic-1 CPU, data path
insert a photocop...

Here, we briefly describe the components of Figure 5.1. Then we give a qualitative discussion of how it executes program instructions. Finally we describe the execution of instructions in some detail.


There are 16 identical 16-bit registers. But, they are not general purpose, each has a special use:

PC, program counter
The PC points to memory location that holds the next instruction to be executed;

AC, accumulator
The accumulator is like the display register in a calculator; most operations use it implicitly as an unmentioned input, and the result of any operation is placed in it. Macro-level (level 2) programmers do all arithmetic and logical operations through it; e.g. ADD, you must put operand 1 in AC, operand 2 must come from a memory location, and the result is put in AC. Of course, micro-level (level 1) programmers can treat all these registers in a uniform way;

SP, stack-pointer
Ignore for now;

IR, Instruction Register:
Holds the instruction (the actual instruction data) currently being executed.

TIR, Temporary Instruction Register
Holds temporary versions of the instruction while it is being decoded;

0, +1, -1
Constants; it is handy to have copies of them close by - avoids wasting time accessing main memory.

Another constant; used for masking (anding) the address part of the instruction; i.e. AMASK and IR $\rightarrow$ address.

ditto for Stack (relative) addresses.

A, B, ...F
: General purpose registers; but general purpose only for the microprogrammer, i.e. the assembly language cannot address them.

Internal Buses

There are three internal buses, A and B (source) buses and C (destination) bus.

External Buses

The address bus and the data bus.


A and B latches hold stable versions of A and B buses. There would be problems if, for example, AC was connected straight into the A input of the ALU and, meanwhile, the output of the ALU was connected to AC, i.e.. what version of AC to use; the answer would be continuously changing.


As described in Figure 4.2.2 above. Recall that the ALU may perform any of four functions:

A + B; note `plus', rather than or; (F1, F0) = (0, 0);

$A\ \mathbf{and}\ B$; (F1, F0) = (0, 1);

A straight through, B ignored; (F1, F0) = (1, 0);

$\mathbf{not}\ A$; (F1, F0) = (1, 1).

Any other functions have to be programmed.


The shifter is not a register - it passes the ALU output straight through: shifted left, shifted right or not shifted.

Memory Address Register (MAR) and Memory Buffer Register (MBR)

The MAR is a register which is used as a gateway - a `buffer' - onto the address bus. Likewise the MBR (it might be better to call this memory data register) for the data bus.

A-Multiplexer (AMUX)

The ALU input A can be fed with either:

The contents of the A latch, or
The contents of MBR, i.e. what was originally the contents of a memory location.


The memory is considered to be a collection of cells or locations, each of which can be addressed individually, and thus written to or read from. Effectively, memory is like an array in Java or any other high-level language. For brevity, we shall refer to this memory `array' as M and the address of a general cell as x and so, the contents of the cell at address x as M[x], or m[x].

To read from a memory cell, the controller must cause the following to happen:

Put an address, x, in MAR;

Requests read - by asserting a read control line;
At some time later, the contents of x, M[x] appear in MBR, from where, the controller can cause it to be ...

Transferred to the ALU or somewhere else.

To write to a memory cell, the controller must cause something similar to happen:

Put an address, x, in MAR;

Put the data in MBR;

Requests write - by asserting a write control line;
At some time later, the data arrive in memory cell x.

It is a feature of all general purpose computers that executable instructions and data occupy the same memory space. Often, programs are organised so that there are blocks of instructions and blocks of data. But, there is no fundamental reason, except tidiness and efficiency, why instructions and data cannot be mixed up together.

Register Transfer Language

To describe the details of operation of the CPU, we use a simple language called Register Transfer Language (RTL). The notation is as follows. M[x] denotes contents of location x; sometimes m[x], or even just [x]. Think of an envelope with 100 in it, and your address on it.

Reg denotes a register; Reg = PC, IR, AC, R1 or R2.

[M[x]] denotes contents of the address contained in M[x]. Think of an envelope containing another envelope.

We use $\leftarrow$ to denote transfer: $A \leftarrow B$. Pronounce this as `A gets B'. In the case of $A \leftarrow M[x]$, we say `A gets contents of x'.

Note: There is a world of difference between an address, 100, say, and data value contained in that address.

The Fetch-Execute Cycle

How does the CPU and its controller execute a sequence of instructions?

Let us start by considering the execution the instruction at location 100Hex; what follows is an endless loop of the so-called fetch-execute cycle:

read the next instruction and put it in the Instruction Register. Point to the next instruction, ready for the next Fetch.

decode and obey that instruction; if it is a JUMP type instruction, then revise the pointing to the jumped-to instruction. Goto Fetch.

Start off with PC = 100H - PC is the Program Counter, and is used to address the instruction (data) to be fetched and executed.


i.e. get the program instruction from memory.

Load the contents of the program counter into the memory address register, which is then put on the address bus. I.e.. $MAR \leftarrow PC$; MAR now holds 100;

Assert RD (read) from memory; this causes the data in cell 100 to be put on the data bus;

Instruct the memory buffer register (MBR) to read the data bus;

Transfer the value in MBR into the Instruction Register, $IR \leftarrow MBR$;

Point to the next instruction: $PC \leftarrow PC + 1$.


, or more precisely, decode and execute. For example, ADD Register +1 to AC; i.e. $AC \leftarrow AC + 1$.

Transfer the contents of +1 to the A input of the ALU, via the A bus; transfer the contents of AC to the A input of the ALU, via the B bus;

Set the ALU function (F0, F1) to ADD. Instruct it to perform the operation; at some time later the result of the ADD will appear on the output of the ALU and hence the C bus.

Transfer the data on the C bus into AC.


In previous versions of this course, I used to go into complete detail of the microprogram which controls the CPU - the microprogram implements the Fetch-(Decode)-Execute cycle. This time I'm going to leave it out. If interested, see section 5.9 appended to this chapter, or [Tanenbaum, 1990], section 4.2.2 onwards.

Instruction Set Model

We now examine the instruction set, by which Level 2 programmers can program the machine; if in doubt, we call these macroinstructions.

Incidentally, you could program in microcode - but life is too short, just like life is too short to program in assembly (macro) code, if you can do it in Java, C, or C++.

We will call the level 2 machine Mac-1a; Mac-1a is a restricted version of Tanenbaum's Mac-1). The main characteristics of Mac-1a are: data word length 16-bit; address size 12-bits.

Exercise: What is the maximum number of words we can have in the main memory of Mac-1a? (neglect memory mapped input-output). How many bytes?

There are two addressing modes : immediate and direct; we will neglect Tanenbaum's local and indirect for the meanwhile.

It is accumulator based: everything is done through AC; thus, `Add' is done as follows: put operand 1 in AC, add to memory location, result is put in AC.

The Mac-1a programmer has no access to the PC or other CPU registers. Also, for present purposes, assume that SP does not exist.

A limited version of the Mac-1 instruction set is shown in Figure 5.2.

The columns are as follows:

Binary code for instruction.
I.e. what the instruction looks like in computer memory, Machine code.

The name given to the instruction. Used when coding in assembly code.
Long name.
Descriptive name for instruction.

What the instruction actually does, described formally in register transfer language (RTL).

Figure 5.2: Mac-1a Instruction Set (limited version of Mac-1)

Memory-mapped Input-output



As stated earlier, there are no direct instructions for input- output; instead Mac-1a uses memory-mapped input-output, whereby some memory cells are mapped to input-output ports; for simplicity we assume that there are only two ports, one connected to a standard-input device, the other connected to a standard-output device:

We assume that each device works with bytes (i.e. 8-bits).

Input from standard-input device

A read from address 0FFCHex yields a 16-bit word, with the actual data byte in the lower order byte. There is no use in reading the input port until the connected device has put the data there: so 0FFDH is used to read the input status register; the top bit (sign) of 0FFDH is set when the input data is available (DAV).

Thus, a read routine should go into a tight loop, continuously reading 0FFDHex, until it goes negative; then 0FFCHex can be read to get the data. Reading 0FFC clears 0FFD again.

Output to the standard-output device

Output, to 0FFE, runs along the same lines as input. A write to 0FFE will send the lower order byte to the standard-output device. The sign bit of 0FFFH signifies that the device is in a ready to receive (RDY) state; again there is no use writing data to the output port until the device is ready to read it.

Microprogram Control versus Hardware Control

As discussed, control of the CPU - fetch, decode, execute - is done by a microcontoller which obeys a program of microinstructions.

We might think of the microcontroller as a `black-box' such as that shown in Figure 5.3.

You can think of it as a black-box which has a set of inputs and a set of outputs - just like any other circuit, ALU, multiplexer, etc. Therefore, instead of microprogramming, it can be made from logic hardware.

Figure 5.3: Controller Black-box, either Microcontroller or Logic
\begin{figure}\begin{tex2html_preform}\begin{verbatim}Op-Code eg. JUMP
Inputs: N...
...vert \vert \vert \vert \vert\end{verbatim}\end{tex2html_preform}\par\end{figure}

To design the circuit, all you have to do is prepare a truth-table (6 input columns - op-code (4 bits) and N, Z, 22 output columns), and generate the logic.

There is no reason why this hardware circuit could not decode an instruction in ONE clock period, i.e. a lot faster than the microcode solution.

The microprogrammed solution allows arbitrarily complex instructions to be built-up. It may also be more flexible, for example, there were many machines that users could microprogram themselves; and, there were computers which differed only by their microcode, perhaps one optimised for execution of C programs, another for COBOL programs.

On the other hand, if implemented on a chip, control store takes up a lot of chip space. And, as you can see by examining [Tanenbaum, 1990], microcode interpretation may be relatively slow -- and gets slower, the more instructions there are.

CISC versus RISC

Machines with large sets of complex (and perhaps slow) instructions (implemented with microcode), are called CISC - complex instruction set computer.

Those with small sets of relatively simple instructions, probably implemented in logic are called RISC - reduced instruction set computer.

Most early machines - before about 1965 - were RISC. Then the fashion switched to CISC. Now the fashion is switching back to RISC, albeit with some special go-faster features that were not present on early RISC.

CISC machines are easier to program in machine and assembly code (see next chapter), because they have a richer set of instructions. But, nowadays, less and less programmers use assembly code, and compilers are becoming better. It comes down to a tradeoff, complexity of `silicon' (microcode and CISC) or complexity of software (highly efficient optimising compilers and RISC).




To control the data path in Figure 5.1 we need 61 different signals:

If we have decoded a macro-instruction and know the values of the 61 signals then it is possible to execute the macro-instruction i.e.. perform one cycle of the data path. [From now on instruction = macro-instruction; micro-instruction will never be shortened.]

We could always work with a 61-bit micro-instruction register - whose outputs are connected to the right places. However, many savings are possible:

Only one scratchpad register may send data to the A bus (at any one time). Thus 16 signals can be encoded in 4-bits; the 4-bits can be decoded on the way out to the registers. 16-4 = 12 signals saved.

Ditto B bus.

12 saved.

The C bus is different - there may be many listeners; but, in practice, it is treated the same as A and B buses.

12 saved.

Latch signals L0 and L1 are ALWAYS needed, and always at the same time in every data path cycle, so you can connect one phase of the clock to them.

2 saved.

One additional signal: ENC - ENable C, ie. sometimes there is no need to put a result in the scratch-pad, e.g. for a compare instruction, ENC=0, the output of ALU goes nowhere.

1 extra.

RD (memory read) and WR (memory write) can be used to control access from MBR to the system data bus. Thus saving 2 signals.

2 saved.

Thus the size of micro-instruction register = 61 - (12+12+12+2+2-1) = 22 bits.

Microinstruction Format

Figure 5.4 shows microinstruction layout, with two new fields: ADDR and COND, which are used by the microcontroller, to control itself.

Figure 5.4: Microinstruction format
\begin{figure}\begin{tex2html_preform}\begin{verbatim}Bits 1 2 2 2 1 1 1 1 1 4 4...
...NC: Allcode 0 = No; 1 = Yes.\end{verbatim}\end{tex2html_preform}\par\end{figure}

Example. Here is the microinstruction which performs the execute part of a JUMP instruction. $amux = 1;\ alu = 01\ (A AND B);\ sh = 00;\
br = mar = rd = wr = 0;\ enc = 1;\ ... 0) <-
result;}\ b = 8, \text{i.e. Bbus <- AMASK;}\ a = 0 \text{(Abus <- PC)}$.

Microinstruction Timing

A four phase clock is used. There are four phases or subcycles:

Load next microinstruction into MIR, the MicroInstruction Register.

Gate registers onto A and B buses. Latch bus data into latch A and B. Both done by clock phase 2.

Load MAR if necessary. Wait for ALU to do its work. (No need to tell it what to do - the appropriate MIR bits are connected to it, all the time)

Shifter output is now stable, and on the C bus. Clock C bus data into scratchpad - if required. Load MBR from C bus - if required.

And that is it - one cycle of the data path operation; but note that it takes more than one of these cycles to execute a macro-instruction.

Figure 5.5 shows the completed microarchitecture.

Figure 5.5: Microarchitecture block diagram
insert a photocopy of...

The components of the microcontrol part are now explained.

Control Store:
High-speed memory store for microinstructions; it is 32-bits wide. The address uses 8-bits.

Control Store Interface:
Recall MAR and MBR for main memory.

MicroProgram Counter. Equivalent to the MAR in the main CPU - the MPC addresses the control store; so, it is also a bit like the main CPU PC register.

MicroInstruction Register. Equivalent to MBR/IR. MIR is loaded only by clock cycle 1.

Decoders are needed for coded A, B, C bus gating signals.

MicroInstruction Sequencing

Often micro-instructions follow in numerical order, i.e. $MPC \leftarrow
MPC+1$ does the sequencing.

But, as with conventional programs, may need to break normal sequence. I.e. jump . The COND field states conditions under which jump may take place (including never).

If a jump is required, $MPC \leftarrow ADDR\ \text{field}$ (via Mmux). The box Micro seq. logic actually controls Mmux, but for clarity, this connection is not shown in Figure 5.5. All this happens at clock phase 4, when ALU flags N, Z are stable.

Main memory read and write

We assume, as is usual, that it takes more than one micro- instruction cycle to Read or write to memory (assume 2 cycles). Thus if you start a Read in cycle n by asserting RD, you must ensure that RD is asserted in cycle n+1; in fact the whole n+1th microinstruction can be RD alone. But, that would be wasteful, and clever microprogrammers can usually find something useful to fill the time.

The Microprogram

Example. We require to add AC to A and store result in AC. This microinstruction would do that:

ENC=1, C=1, B=1, A=10(decimal); fields not mentioned = 0

This is one way of writing it. You could also write out the 32 bits in full (as they would appear in control store or in MIR). However, that would be tedious and error prone. It is better to use a high-level language notation. Thus, we will use RTL, supplemented with:

ALU functions:
$+,\ \mathbf{and},\ \leftarrow\ \text{assignment,
straight-through},\ {\bf not}$; $ALU \leftarrow REG$, means run contents of REG through the ALU, e.g. to test if zero.

Shifter functions:

Conditional statements:
IF cc THEN GOTO xx: where cc = ALU condition-code (N or Z) and xx = microinstruction number to jump to.
RD and WR:
to indicate memory read, write.

Figure 5.6 illustrates the resultant MicroAssembler Language or MAL, and how it translates into numeric microinstructions (we use decimal, instead of binary, to save space) .

Figure 5.6: Illustration of MAL instructions, plus their numeric code
\begin{figure}\begin{tex2html_preform}\begin{verbatim}A C
U N L ...

Figure 5.7 show the full microprogram that runs on Mic-1 and interprets (fetches-decodes-executes) Mac-1A instructions.

Figure 5.7: Illustration of MAL instructions, plus their numeric code


As described earlier, the microprogram continuously loops on:

Line 0 starts by fetching contents of PC; RD is asserted.

Line 1 just holds RD high (we need 2 cycles for memory access); it increments PC while it is waiting.

Line 2 gets the instruction into IR; it also passes IR through ALU to test bit 15; if bit 15 is set then we jump to 0 ( no instructions in Mac-1a has bit 15 set - check in Figure 5.4-4).

Note the subtraction algorithm (see Chapter 2 for discussion of twos complement etc.). If x and y are binary:

x - y = x + (-y) = x + (y' +1) = x + 1 + y'


Simulate, by tracing on paper, the execution of the STOD instruction: STOD is at location 100, it is storing to location 543. Assume PC = 100, AC = 4321.

(a) Write down the contents of all registers at the start.

(b) Ditto for all relevant memory locations, i.e. 100 and 543.

(c) Step through the microcode, from line: 0; note each line visited; write down the contents of any register changed by the microinstruction.

(d) When you get back to line: 0 write down the contents of all registers and all relevant memory locations.

(e) Use (a), (b) and (d) to verify that STOD did what it was meant to do.


(a) Write down the contents of all registers at the start.

PC = 100, AC = 4321, others don't care.

(b) Ditto for all relevant memory locations, ie. 100 and 

[100] = 1543; [543] = don't care.

(c) Step through the microcode, from line: 0; note each line 
visited; write down the contents of any register changed by 
the microinstruction.

0: mar<-pc;rd;         

     mar = 100; RD set;

1: pc<-pc+1;rd;
     pc = 101; RD set;

2: ir<-mbr;if n then goto 0; 

     mbr = 1543; ir = 1543; n=FALSE (0);
     1543 = 0001 0101 0100 0011

3: tir<-lshift(ir+ir); if n then goto 19;

     alu output = 0010 1010 1000 0110
     shf output = 0101 0101 0000 1100
     tir      = 0101 0101 0000 1100
     n = FALSE (0)

4: tir<-lshift(tir); if n then goto 11; 

     alu output = 0101 0101 0000 1100 (NB NOT shifted yet)
     shf output = 1010 1010 0001 1000
     tir      = 1010 1010 0001 1000
     n = FALSE (0) (because bit 15 of alu is 0)

5: alu<-tir; if n then goto 9; 
     alu output = 1010 1010 0001 1000
     Finally, n = TRUE (1),
     so, we go to 9.

9: mar<-ir;mbr<-ac;wr;

     mar = 0101 0100 0011 (543 - it only holds 12 bits)
     mbr = 4321;
     WR set.
10:wr; goto 0;
     WR set
     [543] = 4321 at the end of this microcycle.

(d) When you get back to line: 0 write down the contents of all registers 
    and all relevant memory locations. 

     pc = 101, ac = 4321, [543] = 4321,
     remainder don't care.

(e) Use (a), (b) and (d) to verify that STOD did what it was meant
to do.

I think so!


Consider the Mac-1a assembly code to do the equivalent of: a0 = a1 + a2:

   lodd a1
   addd a2
   stod a0

Taking into account the fetch-execute cycle, and that there is a controller which also uses MAR and MBR, and assuming that the program starts at 100Hex (lodd a1 is there), and that a0, a1, a2 are at 100Hex, 101Hex, and 102Hex, respectively, describe precisely, and in order, all the data travel along the bus, to and from memory. Distinguish addresses and data.

Which Mac-1a instructions make use of the N, Z condition flags in their execution (I say execution to distinguish it from decode).

Some machines allow assembly programmers access to registers such as A, B, C Figure 5.1. Why might programs be speeded up by using these registers? For example, let us assume that there are instructions such as LODDA address, LODDB address, ...which cause registers a, B, etc. to be loaded instead of AC; also, corresponding ADDDA, ADDDB, which cause registers A, B, etc. to be added to AC.

If access to main memory is a bottleneck (the `von Neumann bottleneck'), think of ways a alleviate the problem; hint: find a definition of (a) cache memory, (b) pipelining.

Using assembly language, show how to clear the accumulator on Mac-1A.

Many machines have a HALT instruction - which causes the machine to stop dead at the current PC address. Using assembly language, show how to halt Mac-1A, e.g. make it sit at PC = 100 effectively doing nothing.

Many machines have a NOOP - no-operation, i.e. the instruction wastes a bit of time and no other effect. Using assembly language, show how to program the same effect as a NOOP.

List and describe two major shortcomings of Mac-1A, i.e. the limited set of instructions in Figure 5.2.

Make a prioritized list of instructions you think would improve Mac-1A. Explain and give a rationale in each case.

Given that many machines have writable control store (i.e. you can write your own microcode to interpret your own macro-code), why do programmers hardly ever use this facility. List and describe advantages and disadvantages.

Describe some of the strengths and weaknesses of microprogramming and a general programming technique - compare it to assembly language, to machine language, and to Java.

Assuming that Mac-1a uses 16-bit twos complement to store signed integers, why is it impossible to LOCO 'load constant' minus 1, or indeed, any negative number. Suggest a method to circumvent this problem - which actually impacts positive numbers as well.

Put on your thinking cap. How, in qualitative terms, could you avoid the two cycle wait while Mac-1a reads the next instruction to be executed during the fetch-decode-execute cycle? - at least avoid most, if not all of them.

See the Exercise on tracing the operation of STOD. Count the number of microinstructions required to execute STOD.

If Mac-1A is being clocked at 100 MHz (overall cycle - don't worry about phases). How long does it take to execute a STOD.

Do the following Mic-1 instructions perform the same function (i.e. yield the same set of results):

a<-a+a; if n then goto 0;

a<-lshift(a); if n then goto 0;

Translate the following binary microinstruction into macroassembly language. Figure 5.4 may be a help.

1100 1000 0001 0001 1001 0000 0000 1000

Implement an AOAC ('add-one-to-accumulator') instruction in microcode. Use Binary 1111000000000000 as its machine code.

Implement a CLAC ('clear-accumulator') instruction in microcode. Use 0111000000000000 as its binary code. What is significant about that code. Analyse your implementation to see if it is any faster than the 'old' Mac-1A method.

Find the box marked Micro.seq. logic in Figure 5.5.

Compile a truth-table for the output bit; assume a 0 to the 'Mmux' causes the 'incremented MPC' to be selected; and, therefore, 1 causes ADDR to be selected.

Derive a circuit for Micro. seq. logic.

next up previous contents
Next: Assembly Language Programming Up: Computer Architecture - CSC Previous: The Components of a