Next: Operating Systems - Introduction Up: Computer Architecture - CSC Previous: Assembly Language Programming

Subsections

Further Assembly Programming

Introduction

The instruction set (Mac-1a) introduced in chapter 6 was severely limited - particularly in its inability to call subprograms. We now extend our coverage to the remainder of the Mac-1 instruction set, and attempt some more ambitious programs.

Firstly, we will introduce the additional instructions - especially those based on the stack; we will explain the purpose of a stack and its use for handling subprograms.

In addition, we will describe the memory addressing modes found on most general purpose computers. Next, we will show how some more real programs can be constructed, from simple three or four liners, to subroutines, input-output and interrupts.

Although this chapter is entirely about Mac-1, the presentation is such that the principles of general-purpose computers are emphasised. Thus, someone who follows this chapter will have little difficulty in understanding Motorola 68000, Intel 80x86 series, or virtually any conventional computer.

Addressing - General

 

In a computer, memory locations can hold instructions or data. In addition, as we shall see the data can be interpreted either as a plain value, e.g. 100, or as an address or reference to another data item. Those of who are familiar with C/C++ will recognise pointers; and Java people will recognise references.

In general, machine instructions usually take zero, one, or two operands (e.g. in lodd a0; a0 is the single operand; lodd is the operation;

Actually Mac-1 has no multi-operand instructions. For a start, it is an accumulator machine, i.e. in instructions like addd, lodd, the second - implicit - operand is AC, the accumulator.

Operands can be data, or can refer to data - i.e. address of data, or can be labels - which translate to addresses - of instructions, e.g. for jumps.

The question of addressing is concerned with how operands are interpreted. In the case of data the operand can be:

Addressing modes look complicated, but if you are careful to analyse what a construct means - by drawing a diagram, if necessary - then there are no real pitfalls.

Also, for those who are not specialists in assembly programming, you should keep to the simple modes and only use the complex modes when they are absolutely essential.

Mac-1 Instruction Set Extensions

Figure 7.1 shows the Mac-1 instruction set extended to the full repertoire given in [Tanenbaum, 1990]; we do not bother with the binary version of the instruction - as in Figure 5.2 and Figure 6.1, since we will not be assembling programs using the additional instructions, nor writing them in machine code.


  
Figure 7.1: Complete Mac-1 Instruction Set
\begin{figure}\begin{tex2html_preform}\begin{verbatim}Mnemonic Name Action(s)
--...
... address offset in range 0 - 255\end{verbatim}\end{tex2html_preform}\end{figure}

The Additional Jumps

jneg x    Jump on negative    if ac<0 : pc <- x
jnze x    Jump on nonzero     if ac!=0 : pc <- x

These save some of the trouble encountered using just jpos and jzer; however, as we have seen, they are not essential.

The Stack

 

In Mac-1, the register SP is the stack-pointer and is dedicated to maintaining the stack; the stack itself - the data pointed-to - is actually part of main-memory.

The stack is a memory into which values can be stored and from which they can be retrieved on a last-in-first-out (LIFO) basis. Ideally, you store with a PUSH and retrieve with a POP. It may help to think of an analogy such as a spring loaded canteen tray dispenser, or a bus conductor's coin dispenser; the main point is that you can only put on the top (PUSH), get from the top (TOP) or remove the value at the top (POP). In spite of its simplicity this device has a remarkably large impact on the computational capability of a computer. A stack gives us a sort-of indirect addressing and also a sort-of indexed addressing via a the stack pointer; but a stack does much more than that, it is the basis of the implementation of functions and procedures, and blocks in block-structured high-level languages. SP points to the top of the stack - i.e. to the memory location where the last value was pushed.

In the case of Mac-1, the stack grows from high memory towards low memory. PUSH increases the size of the stack by one and places a value in the new memory cell (at the top). POP exactly reverses the process, i.e.. retrieves the last value written (the top) and decreases the size of the stack. PUSH followed by POP has exactly NO effect. And, as usual with Mac-1 most things are done through the accumulator (AC); PUSH pushes the number in the AC, and POP retrieves the top of the stack into the AC.

PUSH operates as follows:

     PUSH      ;sp <- sp - 1  ;SP decremented, NB. this INCREASES
                              size of stack.
               ;m[sp] <- ac   ;put contents of AC into the memory
                              cell that SP POINTS TO

POP:

     POP       ;ac <- m[sp]   ;get contents of cell pointed to by
                              SP, into AC.
               ;sp <- sp + 1  ;decrease size of stack.

Note carefully again that the stack actually grows downwards, one word at a time - actually this is the case on a great many machines. Normally, in Mac-1 programs, we will assume that SP starts off pointing at memory cell 4020.

Example

. The tables below show how the state of the stack and memory cells change, in response to the following code fragment, (assume SP initially set to 4020, and that a0 is at 500, and contains 30, that a1 is at 501 and contains 91):

                    /(a)
          lodd a0   /ac <- [a0] (=30)
          push      /(b)
          lodd a1   /ac <- [a1] (=91)
          push      /(c)
          pop       /ac <- m[sp]; sp <- sp -1
          stod a0   /(d)
          pop       /
          stod a1   /(e)

In examples like this, to show the address of a memory cell and what it contains, we use the notation:

          address: contents
          500:      30

At the beginning (a):

     a0   500: 30
     a1   501: 91
          AC : ?                   4018:      ?
                                   4019:      ?
          SP: 4020 --points to---> 4020:      ?

At (b):

     a0   500: 30
     a1   501: 91
          AC : 30                  4018:      ?
                                +->4019:      30
          SP: 4019 --points to--+  4020:      ?

At (c):

     a0   500: 30
     a1   501: 91
          AC : 91               +-->4018:      91
                                |   4019:      30
          SP: 4018 --points to--+   4020:      ?

At (c):

     a0   500: 91
     a1   501: 91
          AC : 91                  4018:      ?
                                +->4019:      30
          SP: 4019 --points to--+  4020:      ?

At (e):

     a0   500: 91
     a1   501: 30
          AC : 30                  4018:      ?
                                   4019:      ?
          SP: 4020 --points to---->4020:      ?

Comments:

1.
The contents of A0 and a1 have been swapped; if we had wanted the same values POPped as PUSHed we would have had to POPped in the reverse order of PUSHing;

2.
Once the SP moves back (after POP) we show ? in the stack area; the value would probably remain, but it would be exceptionally foolish to rely on this happening - as we shall see later when we mention interrupts.

Stack Instructions

Direct Accumulator-Stack Instructions

push      Push onto stack     sp <- sp-1; m[sp] <- ac
pop       Pop off stack       ac <- m[sp]; sp <- sp+1

See section  7.4.

Indirect Accumulator-Stack Instructions

pshi      Push indirect       sp <- sp-1; m[sp]<-m[ac]
popi      Pop indirect        m[ac] <- m[sp]; sp <- sp+1

Thus, the AC is used `indirectly' - see indirect addressing mode, section 7.2; i.e. the value that is contents of the memory cell that is pointed-to by [ac] is pushed and popped.

Call and Return - CALL and RETN

CALL and RETN are used for CALLing subprograms (methods or functions in Java) and RETurning from them.

CALL

call x    Call procedure   sp<-sp-1; m[sp] <- pc; pc <- x
                          (get stack (save pc)   (put jump
                           ready)                 address
                                                  in pc)

CALL causes all of the following to happen:

1.
Decrement the stack pointer - so that we will not overwrite last thing put on stack,

2.
The contents of PC - which is pointing to NEXT instruction, the one just after the CALL - is pushed onto the stack, and,

3.
Jump to `x', which is the address of the start of the subprogram is put in the PC register, this is all a jump does. Thus, we go off to the subprogram - just as in JUMP label, but the important difference is that we remember where we were in the calling program, i.e. we must remember where we came from, so that we can get back there again.

RETN

retn      Return from proc.   pc <- m[sp]; sp <- sp + 1

RETN causes all of the following to happen:

1.
Pops the stack, to yield an address/label; if correctly used, the top of the stack will contain the address of the next instruction after the call from which we are returning; it is this instruction with which we want to resume in the calling program;

2.
Jump to the popped address, i.e. put the address into the PC register.

Conclusion

We have now introduced most of the new instructions; we wait for later sections to introduce the others. Also, we shall see in later sections more detail on how subprograms work.

Input-Output Instructions

This is mostly a repeat of section 5.6.

 

Introduction

As stated earlier, there are no direct instructions for input- output; instead Mac-1a uses memory-mapped input-output, whereby some memory cells are mapped to input-output ports; for simplicity we assume that there are only two ports, one connected to a standard-input device, the other connected to a standard-output device:

We assume that each device works with bytes (i.e. 8-bits).

Input from standard-input device

 

A read from address 0FFCHex yields a 16-bit word, with the actual data byte in the lower order byte. There is no use in reading the input port until the connected device has put the data there: so 0FFDH is used to read the input status register; the top bit (sign) of 0FFDH is set when the input data is available (DAV).

Thus, a read routine should go into a tight loop, continuously reading 0FFDHex, until it goes negative; then 0FFCHex can be read to get the data. Reading 0FFC clears 0FFD again.

Output to the standard-output device

 

Output, to 0FFE, runs along the same lines as input. A write to 0FFE will send the lower order byte to the standard-output device. The sign bit of 0FFFH signifies that the device is in a ready to receive (RDY) state; again there is no use writing data to the output port until the device is ready to read it.

Example

Write a fragment of program that will output the contents of the lower-order byte of address 500 to the standard output device mentioned in section 7.6.3.

          test:     lodd fff       /read status
                    jpos test      /not ready
                    jzer test      /not ready
          out:      lodd 500
                    stod ffe       /output

Exercise

Write a fragment of program that will read from the standard input device into 501.

Exercise

Write a program that will output the contents of the lower-order bytes of addresses a0 to a9 (say, 500 to 509) to the standard output device - use the earlier example, and the previous examples on loops (chapter 6) as your building blocks.

Exercise

Write a program which will:

1.
Continuously read from the standard input device, until:

2.
The number -1 (FFFFH) is received to signify END-OF-INPUT;

3.
Send what was read to the standard output device.

Exercise

Write a program which will:

1.
Continuously read from the standard input device, until:

2.
The number -1 (FFFFH) is received to signify END-OF-INPUT;

3.
Add 1 to each number just read;

4.
Send them to the standard output device.

Exercise

Write a program which will:

1.
Continuously read from the standard input device, until:

2.
The number -1 (FFFFH) is received to signify END-OF-INPUT;

3.
For each character read, check if it is in the range 'A' to 'Z', if so, make it lower-case, i.e. add 20 Hex to it;

4.
If it is not an upper case character, leave it alone;

5.
Send it to the standard output device;

Exercise

The example

          teststat: lodd fff       /read status
                    jpos test      /not ready
                    jzer test      /not ready
          out:      lodd 500
                    stod ffe       /output

given above, is unsatisfactory for many purposes; what happens, for example, if the output device is broken, or is switched off, and, as a consequence never becomes ready; the program would stay in the tight loop and the only way to stop it would be to reset/reboot.

Change the code to count its `not-ready' failures and, if this count ever reaches `maxcount' (e.g. maxcount = 100) to put -1 in AC and JUMP to label `exit'.

More Assembly Language Programming

This section further illustrates the use of assembly language with some simple examples.

Note: in none of the following examples have I tried to optimise the code or to be clever; assembly coding is difficult enough without that.

Multiply Two Numbers

Mac-1 has no multiply or divide - not even for integers - so we have to do it all with adds and subtracts.

Note. we will assume that the result will fit into 16-bits; Q. what general check on the two operands will ensure this? Ans. if the multiplication is z = x * y; then if x <= 127 and if y <= 127, the result will certainly be less than 32767, which is the largest positive number that can be stored in 16-bit twos complement.

We will also assume, without any loss of generality that both numbers are positive; working with negative numbers adds a little housekeeping work that need not concern us now.

To implement z = x * y; we use the algorithm shown in Figure 7.2, i.e. we zero z, then add y into it x times. Therefore, we can use most of the for loop mechanism described in chapter 6. The assembly code is shown in Figure 7.3. At the end (end:) the result is in z.


  
Figure 7.2: Multiplication - Algorithm
\begin{figure}\begin{tex2html_preform}\begin{verbatim}z = 0;
for(i=1; i<=x; i++){
z = z + y;
}\end{verbatim}\end{tex2html_preform}\end{figure}


  
Figure 7.3: Multiplication - Assembly code
\begin{figure}\begin{tex2html_preform}\begin{verbatim}init: loco 1
stod i
loco...
..., then end: will be 115H -- why?\end{verbatim}\end{tex2html_preform}\end{figure}

Divide

Again, we assume that both numbers are positive, and that it is sensible to divide them - again we will leave out some of the bullet-proofing details.

          z = x / y;

We use the algorithm in Figure 7.4, i.e. we count how many times we have to subtract 'y' from 'x' to get to a remainder less than 'y'. The assembly code is shown in Figure 7.5.


  
Figure 7.4: Division - Algorithm
\begin{figure}\begin{tex2html_preform}\begin{verbatim}z = 0;
rem = x; // remain...
...{
z = z + 1;
rem = rem - y;
}\end{verbatim}\end{tex2html_preform}\end{figure}


  
Figure 7.5: Division - Assembly code
\begin{figure}\begin{tex2html_preform}\begin{verbatim}init: loco 0
stod i
lodd...
...
stod rem
jump loop
end: .....\end{verbatim}\end{tex2html_preform}\end{figure}

At the end (end:) the result is in `z'. Of course, the result is `integer' divide, e.g. $\frac{9}{2} \rightarrow 4$. You could modify this code to give you % (modulo - or `remainder after division').

Exercise

. Do a dry-run on Figure 7.4, with y = 0, i.e. attempting to divide by zero (undefined), what happens?

Iterate over an Array

Assume that a0, a1, a2, ... a9 are stored contiguously in 500, 501, ...509, and we want to treat them as an array arr[10]; we require to sum them and put the result in sum. We will assume that all the numbers are sensible, i.e. that overflow isn't a problem.

We could address each of them individually:

init      loco 0
          addd a0
          addd a1
          ...
          addd a9
          stod sum

but that wouldn't be too clever for very long arrays.

We'll use a loop, see Figure 7.6. The assembly program is shown in Figure 7.7


  
Figure 7.6: Iteration over an array - Algorithm
\begin{figure}\begin{tex2html_preform}\begin{verbatim}ista = 0;
iend = 9;
sum ...
... = sum + number at (arr + i);
}\end{verbatim}\end{tex2html_preform}\end{figure}


  
Figure 7.7: Iteration over an array - assembly code
\begin{figure}\begin{tex2html_preform}\begin{verbatim}loco 9
stod iend
loco 0
...
...cr
stod i
jump loop
end: .....\end{verbatim}\end{tex2html_preform}\end{figure}

Exercise

Write a program that will write the 10 characters stored in a0..a9 out to the standard-output device.

Exercise

Write a program that will read 10 characters from the standard-input device into a0 ...a9.

Exercise

Write a program that will read (any number of) characters from the standard input device into a0 ...until `z' is input as end-of-string.

Exercise

Write a program that will write the characters stored in a0 ...out to the standard-output device, until the program arrives at a value 0 in the array - 0 is used as an string terminator in some languages.

Subroutines

Introduction

As mentioned above, CALL and RETN are used for CALLing subroutines and RETurNing from them. One objectives of subroutines is to avoid having to repeat large chunks of code.

Side-Note: On modern machines and particularly when block-structured languages like Pascal, C, or Modula-2 are used, subroutines via a stack are essential. Older machines often used a JUMP-and-STORE type of call: the return address was stored in a memory address somewhere amongst the code of the subroutine.

Subprograms without arguments

That is, we use global variables to pass values to and from the subprogram.

Recall the multiplication program above:

          z = x * y;

using the following algorithm:

          z = 0;
          for(i=1; i<=x; i++){
               z = z + y;
          }

This is fine, but have a look at the assembly code; it is not the sort of thing you want to be bothered with each time you want to multiply two numbers. The code for the multiply will never change, only the names of the variables (x, y, z).

Say you wanted to do the following:

          c = a * b;
          d = e * f;
          g = c + d;

A simplistic solution would be:

1.
Load two operands (a, b) into variables x, y respectively;

2.
Jump to 'mult';

3.
`Mult' will run and put the result in `z'; thus, `z' is a global variable - it has global scope;

4.
Copy `z' to `c', i.e. lodd z; stod c;;

5.
Then, similarly, compute 'd', etc.

startprog:
          lodd a    (assume at location 200)
          stod x    
          lodd b    
          stod y
          jump mult                     204
     **   lodd z                        205
          stod c                        206

But, without subprograms, the is a major problem: the program never gets to `**', because when it's finished the multiply program it continues on to the next instruction after `mult' (end: i.e. 115H), not 205H as desired.

Note, however, there is nothing to stop you (at 115H) JUMPing to 205 - but this means that `mult' program is only useful for this one multiply; later down at 20aH (say) when we want to multiply `e' and `f', we want to JUMP back to a different place.

This gets us to one of the crucial differences between JUMP and CALL (subprogram). With JUMP, it's a one way ticket, you don't ever come back! With CALL you can remember where you came from and JUMP back there (using RETN) when you're finished in the subprogram.

Figure 7.8 shows how we can make the multiply program into a a procedure `mult' (just add RETN at the end) and how to use it.


  
Figure 7.8: Subroutine Call and Return
\begin{figure}\begin{tex2html_preform}\begin{verbatim}calling program subprogram...
...ack when done
20d: stod d
...etc\end{verbatim}\end{tex2html_preform}\end{figure}

Note: there is no explicit use of the stack, all PUSHes and POPs are done implicitly by CALL, RETN.

Subprograms with Parameters

 

The procedure `mult' above has no parameters; effectively we used global variables x, y, z for the parameters and return value. In C++ this would look like Figure 7.9.


  
Figure 7.9: Motivation of Subprogram with Parameters
\begin{figure}\begin{tex2html_preform}\begin{verbatim}int x,y,z; // globalvoi...
... //no arguments
c =z ;
/...
}\end{verbatim}\end{tex2html_preform}\end{figure}

This is alright, but I'm sure your Programming courses up to now have warned you against the evils of global variables.

So, we now write `mult1' that looks like this:

          int mult(int x, int y)
          {
             return x * y;
          }

and is called like: c = mult(a, b);

To do this we need to make explicit use of the stack: we PUSH onto the stack the two values to be multiplied - the arguments, then CALL, finally when we RETN we retrieve (POP) the result from the stack. We can make the multiply program into a proper procedure `mult1' (just add RETN at the end) with parameters and return values, see Figure 7.10. We will also change `mult' slightly to ensure that it uses only two local variables - the only reason for this is to make it a bit more like the one in [Tanenbaum, 1990].


  
Figure 7.10: Subprogram with Parameters
\begin{figure}\begin{tex2html_preform}\begin{verbatim}calling program:200: lod...
...'local' variables z and i.
retn\end{verbatim}\end{tex2html_preform}\end{figure}

Subprograms with return values

Above, we used the trick of passing the result back in the AC. This is alright for a single 16-bit result, but, if the result was more than one word (e.g. 32-bit floating point stored in two words, or a record), we would have to use the stack.

The usual solution is to allocate space for the return values just before the return address.

Other Uses for the Stack

Most computers (eg. 80X86, 680X0, VAX) have a multitude of general purpose CPU registers - just as if Mac-1 allowed the assembly programmer access to registers A, B, ...F.

Therefore, as well as storing the return address - which we could term `storing the callers PC', it may be necessary to store these registers as well before going off to a procedure. This is because, inevitably, the procedure will overwrite some or all of the registers and on return, we want to be able to `restore' the registers and carry on where we left off. This pair of processes is called `saving' and `restoring' the 'environment'.

Alternatively, a more common practice is to leave it up to the procedure to save whatever registers it will overwrite - and restore them just before returning.

Here, of course, we are concentrating on subprograms; there is nothing to stop a programmer using the stack as a general temporary storage area, or for a purpose like the swapping mentioned above.

Stack Frame

In connection with procedures, there are four uses for the stack:

1.
Passing parameters:

(a)
Passing parameters to the subprogram;

(b)
Returning values from the subprogram.

2.
Storing the return address;

3.
Saving the environment (registers);

4.
Finally, all local variables are created on the stack.

Local Variables

     int a, b, c;

     a = 25;
     b = 159;
     c = mult1(a, b);
     a = 0
     (*...etc*)

See section 7.9.2 for `mult1' in assembly:

int mult1(int i, j);

Analysis of `Call' to mult1

The interesting action happens after `b:=159;'.

1.
Assume SP starts off at 4020, and instruction corresponding to 'a = 0;' is at address 210; assume the first executable instruction in 'mult1' is at address 100;

2.
The value a (=25) is pushed: [sp]<-401f, [401f]<-25;

3.
The value b (=159) is pushed: [sp]<-401e, [401e]<-159;

4.
210 - the return address is pushed: [sp]<-401d, [401d]<-210;

5.
Jump to 100 - start of 'mult1';

6.
Here, we could save the AC, just to make the general point of register saving, but to retain compatibility with [Tanenbaum, 1990], we'll leave it out. At this point the stack looks like:

     Addr.     Contents
     -----     --------
     4020      ?
     401f         25
     401e        159
     401d        210          <-- SP (SP now points at 401d)

But we're still not done. We must allocate space for the two local variables on the stack. We could do this by:

     loco 0
     push
     push

i.e. push two zeros onto the stack, but a quicker way is

     desp 2

i.e. decrement the SP by 2, in which case the stack looks like

     Addr.     Contents
     -----     --------
     4020      ?
     401f         25
     401e        159
     401d        210          
     401c        ?*
     401b        ?*      <-- SP (SP now points at 401b)

Note that the space for the local variables is now uninitialised, i.e. 401c and 401b contain whatever was last put in them. This is why you can never assume that a local variable is initialised to 0.

Thus, in general, the stack looks like Figure 7.11. This is called the subprogram's stack frame.


  
Figure 7.11: Stack Frame
\begin{figure}\begin{tex2html_preform}\begin{verbatim}4020
High Memory ^
\vert...
...n our example)
Local Variables.\end{verbatim}\end{tex2html_preform}\end{figure}

Exercise

Write a procedure `pow' that will calculate the first argument (x) raised to the power of the second argument (y). Hints: (a) multiply x by itself y times, (b) use 'mult1', ie. just call it, (c) model your answer on how we wrote 'mult1'. `pow' should have two parameters, passed by value, section 7.12, and should return its result in AC. Assume that the result will fit into 16-bits.

Recursive Subprograms

If you called the procedure mult1 recursively three times - or, indeed, there were three nested calls of different procedures - the situation would look like Figure 7.12


  
Figure 7.12: Stack Frames for Recursive or Repeated Procedure Calls
\begin{figure}\begin{tex2html_preform}\begin{verbatim}4020
High Memory ^
\vert...
...n our example)
Local Variables.\end{verbatim}\end{tex2html_preform}\end{figure}

In this way a procedure can call itself again and again, without one call interfering with the other; the only limit is the size of the stack.

Earlier machines, and languages, like FORTRAN, did not use a stack. Parameter passing, return addresses and local storage was all mixed in with the procedure code area - and there was only one copy - so any recursive calls, if they were allowed, would have overwritten the local data area etc. of the previous call.

Exercise

If you use a global variable as working storage in a procedure, e.g. if i - the counter in mult1 - was declared globally, you cannot allow recursive calls to the procedure. Explain why - use a stack-frame diagram if necessary.

Exercise

(a) Write an algorithm to do pow - see above - recursively. Express the algorithm in Java and get it working. It should take no more than three or four lines.

(b) Express the recursive pow algorithm in Mac-1 assembly code.

Parameters Passed By Value

 

In the scheme mentioned in the previous two subsections, parameters are passed by by value, or by copy. Thus, procedure mult1 can do whatever it likes to the memory location that contains the 25 - the copy of a (the parameter) and a itself (the argument) are separate and so the argument a - in the caller - will never change.

In Java parameters are, by default, pass by value; in C++ you have reference parameters which are passed by reference. or by access, so that if you change the parameter inside the procedure, that change is reflected outside the procedure. See section 7.2.

Exercise

Explain two methods of implementing `pass by reference' in Mac-1 assembly language.

A Better Example from Tanenbaum

The following - from [Tanenbaum, 1990], pages 179-183 - shows a much more realistic example. It shows a complete program, in Pascal. In particular, in contrast to some of our examples, it shows that all local variables reside on the stack. I gives a further example of how to handle arrays.

Insert p. 179 here

!
Insert pp. 180-81 here
!
!

!
Insert pp. 182-83 here
!
!

Macros

As indicated above, a non-procedure solution could have been used to repeat the mult code as many times as a multiply was required. As well as making the program bigger, this would have introduced the greater problem of three pieces of code to debug/maintain, instead of just one.

Now, if we are content to accept the increase in program size, use of a macro avoids the second problem (i.e. maintenance of separate copies of the same code).

Essentially, you declare a macro containing the working bits of the subroutine (no need for the housekeeping bits at the top and bottom) and then insert the macro code wherever the CALL appears.

Macros are used whenever you want to trade memory for speed - you waste no time PUSHing and POPping the stack; however, avoid if possible - they make a program difficult to read, think about and, consequently, hard to test.

Interrupts

[NB. Mac-1 has no interrupts - the following is modelled on interrupts on the 80X86].

The scheme of polled input and output indicated in sections 7.6.2 and 7.6.3 is fine for many cases, especially if the computer is just reading data from a single source.

However, consider the case of the keyboard and GUI interfaces with a mouse. In Windows and other operating systems the keyboard is read even when the computer is away running another part of the program. This is done with a special type of subroutine call - an interrupt.

When you hit a key on the keyboard, something like the following happens:

Hardware Actions:
(invisible to programmer)

1.
The keyboard controller detects the key hit;

2.
It asserts an interrupt line on the system bus;

3.
As soon as the CPU is prepared to handle the interrupt, it sends an acknowledge signal to the controller;

4.
The kb. controller sends a small integer - and interrupt vector to identify itself; note that there could be many interrupting devices;

5.
The CPU reads and stores the interrupt vector;

6.
The CPU PUSHes the PC and the flags (N,Z) onto the stack;

7.
The CPU constructs the address of the interrupt service routine from the interrupt vector;

8.
The interrupt service routine i CALLed (but implicitly).

Software Actions:

[We are now in the interrupt service routine]

1.
Save all registers on the stack;

2.
Read the input port - in the normal way; if things are designed properly the device will always be ready after it has generated an interrupt;

3.
Put the data in a buffer, and maintain the buffer pointer; all this means is `put the input character somewhere where the program can get at it';

4.
The interrupt service routine may need to tell the controller that you have completed servicing the interrupt;

5.
Restore (POP) all saved registers;

6.
Execute a return-from-interrupt (IRET on an 80x86); IRET restores flags and POPs return address.

A key factor is the transparency of the interrupt. The interrupt causes the service routine to run, but when that routine is finished, and IRET executed, the executing program should be none the wiser - except, maybe, it notices that an instruction took 20 or 30 microsecs to run, instead of just 1 microsec; and, of course, there will be another character in the input buffer.

Exercise

In a certain computer system the time taken for the processor to recognise and acknowledge an interrupt is 4 microsecs; it takes 10 microsecs to save the PC and flags register, ditto 10 microsecs. to restore them. If the execution time for the interrupt handler instructions for the peripheral device is 70 microsecs,

(a) what is the total time for each interrupt?

(b) estimate the highest interrupt frequency that may occur?

Assume that there are no other generators of interrupts.

Direct Memory Access (DMA)

Mass storage devices like disk or tape require very rapid data transfer of data from the device to memory and vice-versa. A typical hard disk transfers data at about 1 byte per 2 microsecs (1993); neither polled nor interrupt-initiated input-output could cope with this.

In addition, if you look at the system diagram in Figure 7.13, you will see that both the disk (for example) and memory are connected to the system bus. Hence, there may be little requirement for the CPU to get involved in a data transfer, except to initiate it. Typically, a third device will be connected to the bus - a DMA controller - which mediates between the two data transfer devices and ensures an orderly use of the bus.


  
Figure 7.13: System with DMA
\begin{figure}\begin{tex2html_preform}\begin{verbatim}C P U Memory Disk DMA Cont...
...--------------------------------\end{verbatim}\end{tex2html_preform}\end{figure}

In DMA, the only effect on the CPU and executing program is slight slowing down, because the DMA must steal bus cycles -- so called cycle-stealing -- for the data transfer. For example, recalling Mac-1a, and memory read/write: during a LODD, for example, Mac-1a would place the address to be read in MAR, and issue an RD; if a DMA byte transfer was happening, the DMA would have to be able to tell the CPU to wait for a cycle or two.

Exercises

1.
Comparison of the Mac-1 instruction set with those of other machines, e.g. Intel 80X86 (Pentium II is equivalent to 80686).

(a)
Mac-1 has instructions with, at most, one operand, on a computer like the 80X86, instructions often have two operands. Think about operations that could use 3 or more operands - write them down, with justification. Are such instructions a good idea - or bad; give a two or three point discussion.

(b)
Research and design an instruction set that has no operands at all, except for stack operations PUSH and POP. Write up your findings (one A4 page or less).

2.
Figure 7.1 gives a list of Mac-1 instructions. Do some research on the 8086/8088/80X86 or the 68000 series and choose SIX more of your favourite instructions that you want included in the Figure. List them with description and justification - what would you use them for and how they would help

3.
Describe using pictorial illustrations, and examples, the following addressing modes: (a) immediate; (b) direct, (c) indirect.

4.
Given the memory values below and a one-address machine with an accumulator, what values do the following instructions load into the accumulator? Illustrate your answer with picture(s).

addr. 20 contains 40
addr. 30 contains 50
addr. 40 contains 60
addr. 50 contains 70

(1) load immediate 20
(2) load direct 20
(3) load indirect 20
(4) load immediate 30
(5) load direct 30
(6) load indirect 30

5.
Calling subprograms.

(a)
Sketch the start of a subroutine that has 4 (16-bit integer) arguments passed to it. Also, show that CALL in the calling program and the few instructions preceding the CALL.

(b)
Draw a picture of the stack just before the subroutine gets down to its work.

6.
In a certain computer system the time taken for the processor to recognise and acknowledge an interrupt is 4 microsecs; it takes 10 microsecs to save OR restore the PC and flags register. If the execution time for the interrupt handler instructions for peripheral (device) X is 70 microsecs,

(a)
What is the total time for each interrupt?

(b)
Estimate the highest interrupt frequency that may occur?

Assume that there are other generators of interrupts.

6. Explain how an INDEX register may assist in the handling of arrays; use as an example the case where you wish to add 3 to the array of 10 numbers starting at address 500.

7. Multiplication etc.

(a)
Write a procedure 'mpyadd' that takes three integer arguments (x,y,z) and computes (x + y * z) and returns the result in the AC. Assume you can Call `mult1', as described in the notes text, to do the multiplication.

(b)
Draw a picture of the stack at various stages:

i.
just before the call: w = mpyadd(x, y, z);

ii.
just as `mpyadd' is entered;

iii.
just before `mpyadd' gets down to its work.

7.
Write a procedure `outch(c,maxtries)' that will write a character (the first argument) to the standard output device; the second argument should be `maxtries' - the number of `not-readies' that will be declared a `time-out' failure; on success, it should return 0 in the AC, on `timeout' -1 in AC; e.g.

c = 45H;
error = outch(c,1000);

8.
Use `outch' in the previous exercise in writing a procedure that has four character parameters and will write these to the standard-output device, i.e. called as: error = write4(a,b,c,d);.

9.
Use `outch' in writing a procedure that will write an array of characters, up to, but not including the NULL character (value 0). Pass the array as a single address (of the first character) and traverse it as in the example given in Figure 7.7 in the notes.

10.
Write a procedure `ch=inch(maxtries)' that reads from the standard-input device. `ch' is returned in AC. It `times-out' after `maxtries' not-readies and returns -2 in AC. If it gets END-OF-INPUT, it returns -1 in AC. END-OF-INPUT is a special character which you check for - assume that you have a variable set to its value, or, if it makes it easier, assume the special value is 0) it returns -1 in AC.


next up previous contents
Next: Operating Systems - Introduction Up: Computer Architecture - CSC Previous: Assembly Language Programming
jc
2000-11-13