80x86 Assembler, Part 3

Atrevida Game Programming Tutorial #14
Copyright 1997, Kevin Matz, All Rights Reserved.

Prerequisites:

Chapter 13: 80x86 Assembler, Part 2

In this chapter, we'll learn how to control the flow of our assembler programs. We'll learn how to jump to labels (like "goto" statements), how to make decisions (like "if...then" statements), and how to implement loops (like "while" or "do...while"/"repeat...until" loops).

Declaring labels in the code segment

You can declare labels in the code segment. Like variable names in the data segment, labels serve as markers for particular addresses, although labels alone do not reserve any space. We've used one label a few times already: "Start:".

To declare a label, you simply arrive at a reasonably descriptive name (naming conventions are the same for labels as they are for variables), and add a colon at the end. Labels normally begin in the very first column. Here's an example:

    ; Here's some (nonsense) code:
    INC AX
    DEC AX
    INC BX
    DEC BX

SetDate:
    ; Set the date to Apr. 26, 1986 using INT 21h, Serivce 2Bh:
    MOV AH, 02Bh
    MOV CX, 6                          ; Year minus 1980: 6
    MOV DH, 4                          ; Month = 04
    MOV DL, 26                         ; Day of the month = 26
    INT 21h                            ; Set the date

And perhaps at the bottom of your program, you might have:

EndProgram:
    MOV AX, 04C00h
    INT 21h

END Start

Using the JMP instruction

Once you have declared one or more labels, you can use the JMP instruction. JMP will make the program execution "jump" or "branch" to the specified label, just like a goto statement in a high-level langauge:

    JMP SetDate                        ; Go to the SetDate label and
                                       ;  continue executing from there

You could use the following example to restart your program from the beginning:

    JMP Start                          ; Go to the Start label and
                                       ;  continue executing from there

JMP changes the instruction pointer, CS:IP, so that it points to the address of the specified label. If the label exists within the same code segment, only IP is modified, and CS remains the same. If the label exists outside of the current code segment (this is possible), then both CS and IP are updated. The former case (only IP is modified) is called an intrasegment or near jump; the latter case (CS and IP are modified) is called an intersegment or far jump.

A JMP jump is sometimes called an unconditional jump, because it always transfers program execution to the specified label. A conditional jump, on the other hand, would only transfer program execution to a label if a certain condition was true (or false) -- this is like an if statement in a high-level language.

An example program that uses the JMP instruction

Here's a short example program that uses the JMP instruction. Note that no-one would ever jump around in this silly fashion in a real program. (One would also hope that better label names would be used.)

------- TEST4.ASM begins -------

%TITLE "Assembler Test Program #4 -- using JMP"

    IDEAL

    MODEL small
    STACK 256

    DATASEG

; No variables

    CODESEG

Start:
    ; Display the letter "C":
    MOV AH, 2
    MOV DL, 'C'
    INT 21h
    JMP MyLabel_2

MyLabel_1:
    ; Display the letter "T":
    MOV AH, 2
    MOV DL, 'T'
    INT 21h
    JMP EndProgram

MyLabel_2:
    ; Display the letter "A":
    MOV AH, 2
    MOV DL, 'A'
    INT 21h
    JMP MyLabel_1

EndProgram:
    ; Terminate program:
    MOV AX, 04C00h
    INT 21h
END Start

------- TEST4.ASM ends -------

Comparisons and conditional jumps

We'd like to use if...then-like constructs in our programs, so that different courses of action can be taken depending on whether some expression is true. With 80x86 assembler, there are generally two steps: first, you make a comparison using the CMP instruction. The CMP instruction will set or clear flags in the processor's FLAGS register. Then, you use a conditional jump instruction. The conditional jump instruction will examine the flags, and if the flags are in the correct state (the condition is true), execution will be transferred to some label that you specify. If the flags are not in the correct state (the condition is false), then no action is taken, and execution continues with the next instruction in sequence.

What does the compare instruction, CMP, do? It is very similar to the subtraction instruction, SUB. CMP takes two operands. It subtracts the second operand from the first, but then it throws away the result. The flag settings from the subtraction, particularly OF (overflow flag), SF (sign flag), ZF (zero flag), AF (auxiliary flag), PF (parity flag), and CF (carry flag), are saved, however. (Thankfully, for most comparisons, we don't need to know anything about how the flags are set or cleared.)

After a CMP instruction, we can then use a conditional jump instruction. Here's a list of them. (Don't be overwhelmed -- many of them are duplicates for our convenience.)

Instruction   Meaning                                 Flag Conditions
------------------------------------------------------------------------
JA            Jump if above (unsigned)                CF = 0 and ZF = 0
JNA           Jump if not above (unsigned)            CF = 1 or  ZF = 1

JAE           Jump if above or equal (unsigned)       CF = 0
JNAE          Jump if not above or equal (unsigned)   CF = 1

JB            Jump if below (unsigned)                CF = 1
JNB           Jump if not below (unsigned)            CF = 0

JBE           Jump if below or equal (unsigned)       CF = 1 or  ZF = 1
JNBE          Jump if not below or equal (unsigned)   CF = 0 and ZF = 0

JC            Jump if carry flag is set               CF = 1
JNC           Jump if carry flag is not set           CF = 0

JCXZ          Jump if CX equals zero                  (CX = 0.  An
                                                      exception; CX is
                                                      not a flag)

JE            Jump if equal                           ZF = 1
JNE           Jump if not equal                       ZF = 0

JG            Jump if greater (signed)                ZF = 0 and SF = OF
JNG           Jump if not greater (signed)            ZF = 1 or SF <> OF

JGE           Jump if greater or equal (signed)       SF = OF
JNGE          Jump if not greater or equal (signed)   SF <> OF

JL            Jump if less (signed)                   SF <> OF
JNL           Jump if not less (signed)               SF = OF

JLE           Jump if less or equal                   ZF = 1 or SF <> OF
JNLE          Jump if not less or equal               ZF = 0 and SF = OF

JO            Jump if overflow flag is set            OF = 1
JNO           Jump if overflow flag is not set        OF = 0

JP            Jump if parity bit is set               PF = 1
JNP           Jump if parity bit is not set           PF = 0

JPE           Jump if parity is even                  PF = 1
JPO           Jump if parity is odd                   PF = 0

JS            Jump if sign flag is set                SF = 1
JNS           Jump if sign flag is not set            SF = 0

JZ            Jump if zero (ie. zero flag is set)     ZF = 1
JNZ           Jump if not zero (ie. ZF is cleared)    ZF = 0

Upon examination of the flag conditions column of the above table, we see that many of the instructions are duplicates of other instructions. At first this seems wasteful, but the duplicates become quite helpful in making assembler code more readable.

There is an important difference between the above/below and greater/less instructions. The above and below instructions are for use with unsigned numbers. The greater- and less-than instructions are for use with signed numbers. Why are the different sets of instructions needed? Well, consider numbers such as 10010101 bin and 00110111 bin. If we were to consider these numbers unsigned, then the first number would be greater than the second number. But if we were to consider them signed, then the first number would be negative (recall, the first bit is the sign bit), and so the first number would be less than the second number.

Okay, let's try using CMP and the conditional jump instructions. Let's say we want to convert this C code fragment to assembler:

if (x == y)
    z++;
else
    w--;

Let's assume that x and y are declared in our data segment are are both word-sized. (Bytes work as well, of course.)

First, we need to use CMP to compare x and y. But CMP is like many instructions in that it cannot compare the contents of two memory addresses (variables) at once, so we'll temporarily use AX to store the contents of one of the variables. (If AX is being used for something important, use another register instead, or PUSH it to save it temporarily, and then POP it back later.)

    MOV AX, [x]                        ; Let AX = [x]
    CMP AX, [y]                        ; Compare AX with [y]

Now AX (which is equal to x) has been compared to y, which means that y has been subtracted from AX, and the flags have been set accordingly. Now we want to use the JE (jump if equal) instruction. If AX and y are equal, we'll jump to a label called "Case_x_and_y_are_equal". If they're not equal, execution will continue with the instruction immediately following the JE instruction, so that is where we can put the "else" case. But after we finish all the tasks required in the else case (in this case, decrementing variable w), we need to jump over the Case_x_and_y_are_equal section, because we don't want to execute that section. I've called the label following the Case_x_and_y_are_equal section "Bypass". This starts to look messy:

    JE Case_x_and_y_are_equal          ; If AX = [y], then go to Case_1

    ; If execution reaches this point, AX must not have been equal to
    ;  [y], so this is where can can put the "else" case:

    DEC [w]                            ; Decrement [w]

    ; Now we need to bypass the Case_x_and_y_are_equal section:
    JMP Bypass

Case_x_and_y_are_equal:
    ; Perform the tasks required when x and y are equal:
    INC [z]                            ; Increment [z]

Bypass:
    ; The program continues here...

Trace through it to see what happens if x and y are equal, and then what happens if they are inequal.

Let's try another one. Perhaps we want to convert the following C code fragment to assembler:

if (x > 5) {
    x++;
    y -= 100;
}

Let's assume that x is signed. Here's my first attempt:

    CMP [x], 5                         ; Compare [x] and 5
    JGE Case_x_is_greater_than_5       ; If [x] > 5, then go to this label

    ; If execution reaches this point, [x] must be equal to or less than
    ;  5.  There is no "else" case, so jump over the
    ;  "Case_x_is_greater_than_5" section:

    JMP Bypass_2

Case_x_is_greater_than_5:
    INC [x]
    SUB [y], 100

Bypass_2:                              ; We've already used the "Bypass"
                                       ;  label!
     ; The program continues here...

This works fine, but look how messy it is! But if we're willing to change the conditional jump instruction to a different one, we can clean up some of the mess. Here's an alternative to the above code fragment, which does the same thing:

    CMP [x], 5                         ; Compare [x] and 5
    JNGE Bypass_3                      ; If [x] <= 5 (either JNGE or JL
                                       ;  would work), go to this label

    INC [x]
    SUB [y], 100

Bypass_3:
    ; The program continues here...

Much better! Notice that the condition was reversed: the JGE (jump if greater or equal) became a JNGE (jump if NOT greater or equal). A JL instruction could also be used -- it's just a duplicate of JNGE.

So, if you are writing a compare-jump sequence and it starts to look messy, check to see if it can be simplified by reversing the jump condition. You might want to add a comment somewhere to point out what you're really trying to test, because in some situations, the reversed case might not make sense to someone reading the code.

What can we do about all of these silly "Case" and "Bypass" labels? If we accidentally duplicate an existing label, we'll get an assembler error message. If we accidentally jump to an old label, then we'll end up with some very hard-to-find bugs!

Turbo Assembler gives us a partial solution to this problem. Labels that start with "@@" are treated as "local labels". Normal labels have global scope -- you can jump to them from anywhere in your program. But local labels have scope only between two global-scope labels. This lets us re-use label names. (A big program might have hundreds of compare-jump sequences. We don't want to have to make up new label names for every single sequence!)

GlobalLabel_1:                         ; "Boundary" of @@LocalLabel's scope
    .
    .
    JMP @@LocalLabel                   ; This is okay
    .
    .
@@LocalLabel:
    .
    .
    JMP @@LocalLabel                   ; This is okay
    .
    .
GlobalLabel_2:                         ; "Boundary" of @@LocalLabel's scope
    .
    .
    JMP @@LocalLabel                   ; I can't see @@LocalLabel: from
                                       ;  here -- there's a global label
                                       ;  in the way!

@@LocalLabel could then be used elsewhere in the program, as long as global labels exist between the different instances.

It's hard to come up with new label names for all of the "bypass" and "case"-type labels, so this reusability of names comes in handy. And since these particular labels really just get in the way of the "flow" of the program (you don't see "bypass" and "case" labels in a high level language listing), it's common just to give them short, non-distracting names like "@@10" or "@@20".

Conditional jumps without preceeding CMP's

You can sometimes get away with leaving out the CMP instruction before doing a conditional jump. Many instructions like ADD, SUB, INC, DEC, AND, OR, etc. modify the flags when they operate. You can make use of these flag settings at any time, using a conditional jump that takes into account the flags you want. Check the table above.

An instruction set list should list which flags are modified for each instruction. However, I have yet to come across an instruction set list that actually explains in what way each instruction modifies the flags. But if you do know how an instruction modifies the flag settings, and if you can use that to your advantage, then you can certainly use conditional jumps without the preceeding CMP.

An example program that uses conditional jumps for decision making

Here's a reasonably simple assembler program that uses a conditional jump. It's also the first interactive program, because it accepts keyboard input!

------- TEST5.ASM begins -------

%TITLE "Assembler Test Program #5 -- Decision Making w/ Cond. Jumps"

    IDEAL

    MODEL small
    STACK 256

    DATASEG

PromptMessage                     DB   "Do you want to hear a joke (Y/N)? $"
JokeMessage                       DB   10, 13, "Programmers wanted.  Some "
                                  DB   "assembly required.$"
NoJokeMessage                     DB   10, 13, "Okay.$"

    CODESEG

Start:
    ; Allow access to the data segment:
    MOV AX, @data
    MOV DS, AX

    ; Display the prompt message:
    MOV AH, 9
    MOV DX, OFFSET PromptMessage
    INT 21h

    ; Get an character from the keyboard using INT 21h, Service 8
    ;  (Console Input Without Echo, With ^C Check).  This service
    ;  acts like getch() in C -- it waits for a keypress:
    MOV AH, 8
    INT 21h
    ; The character that was entered is stored in AL.

    ; Determine which key was pressed.  If a capital "Y" was entered,
    ;  then display the joke message.  Any other keypresses are
    ;   regarded as "no" responses (this is for simplicity; you can add
    ;   more "if"-type sequences for better error checking).
    CMP AL, 'Y'
    JE @@DisplayJokeMessage

    ; Otherwise, display the "no joke" message, and then quit:
    MOV AH, 9
    MOV DX, OFFSET NoJokeMessage
    INT 21h
    JMP EndProgram
    
@@DisplayJokeMessage:
    ; Display the joke message:
    MOV AH, 9
    MOV DX, OFFSET JokeMessage
    INT 21h

EndProgram:
    ; Terminate program:
    MOV AX, 04C00h
    INT 21h
END Start

------- TEST5.ASM ends -------

Admittedly, the program's "interface" is clumsy (it only accepts the capital "Y" character as an affirmative response), and the joke is stupid.

One side note: if you used TASM and TLINK to assemble the program, and then you used Turbo Debugger to examine the program in action, you might have come across extra instructions that were inserted by the assembler. In particular, you probably saw "NOP" instructions. NOP means "no operation" -- this instruction really does nothing but waste a tiny amount of time. It also takes up one byte of space in your executable file.

TASM is called a one-pass assembler because it reads through your source code only once. But to do that, it has to make predictions for some types of jump instructions, and to make a long story short, it occasionally has to reserve space for longer jump instructions, which it does by adding NOP instructions. (Here's the secret for avoiding those extra NOP's: use "JMP SHORT" instead of "JMP" if you are certain that the label is less than 127 bytes ahead or 128 bytes back from the instruction after the jump.)

Here's one use for NOP's: in the debugger, if you don't want to execute certain instructions, you can temporarily replace them with NOP's. You could also use NOP's to reserve space in the code segment (who knows why; perhaps you want to do something clever). And if you're writing mutating viruses, you can apparently insert extra NOP's during the virus' replication phase -- this can fool some types of virus scanners that simply search for certain byte patterns.

Conditional jump limitations

Conditional jumps (all of the J instructions except JMP) have a serious limitation: they can only jump forward 127 bytes, or backward 128 bytes (starting with the byte immediately following the conditional jump instruction). Many instructions assemble to three or four bytes in length (or more, or less, of course), so this doesn't allow much of a range.

If you try to conditional-jump to a label that is more than 127 bytes ahead or 128 bytes back (the distance from the instruction after the conditional-jump instruction to the label is called the displacement), the assembler will vigorously complain.

Here's an example that would generate an error message:

    CMP AX, 1234h
    JE FarAwayLabel
    .
    .
    ; Put plenty of instructions (at least 127 bytes worth) here...
    .
    .
FarAwayLabel:

Fortunately, the JMP instruction doesn't have a displacement restriction, so there is a way to solve this problem: re-write the conditional jump so that the reverse condition is tested for, and afterwards, use a JMP to the desired label, like this:

    CMP AX, 1234h
    JNE @@Bypass                       ; Reversed condition
    JMP FarAwayLabel
@@Bypass:
    .
    .
    ; Put plenty of instructions (at least 127 bytes worth) here...
    .
    .
FarAwayLabel:

Very clumsy indeed. Add extra comments to explain if the logic becomes less than clear.

In many cases you can get away with avoiding these "gymnastics" -- in the above TEST5.ASM program, we didn't have any problem. Personally, I don't even think about this problem unless the assembler complains, in which case the code can be modified reasonably quickly.

The 386 and later processors actually permit conditional-jump displacements of 32767 bytes ahead and 32768 bytes back, but in order to use extended instruction sets (286, 386, etc.) you need to inform the assembler using special directives. Let's ignore this for now.

Loops using jumps

You can construct loops by using jumps to transfer execution to previously-occurring labels. Here's an infinite loop:

@@Loop:

    ; Do something here

    JMP @@Loop

It's usually convenient to be able to end a loop, however. You could put a compare-jump construction inside an infinite loop as a way of getting out of the infinite loop, but it would be more sensible just to use the conditional jump like this:

@@Loop:

    ; Do something here

    ; Test for some condition:
    CMP BX, 50
    JNE @@Loop

This loop would continue until BX contained 50 dec.

One very common type of loop involves using a counter, so that some portion of code is repeatedly executed some pre-determined number of times. In C, to display ten asterisks in a row, we could do this:

for (x = 0; x <= 10; x++)
    printf ('*');

In assembler, we are provided with an instruction called LOOP. In loops that use the LOOP instruction, register CX is used as the counter. Before entering the loop, you must load some value, such as 10, into CX. The LOOP instruction goes at the bottom of the loop, and it takes as an operand a label, just like a conditional jump instruction. When the LOOP instruction is executed, it first decrements CX. Then, if CX is zero, the loop ends; if CX is non-zero, execution jumps to the specified label.

Here's an assembler equivalent for the above C example:

    MOV CX, 10                         ; Start the LOOP counter at 10
@@MyLoop:
    ; Display an asterisk:
    MOV AH, 2
    MOV DL, '*'
    INT 21h

    LOOP @@MyLoop

An easy way to create a hard-to-find bug is to modify CX within the loop. You are certainly permitted to read the value of CX, and you are also allowed to modify it, to extend or shorten or end the loop. But if you're going to modify CX for some other reason, PUSH and POP the CX register like this:

    MOV CX, 10
@@MyLoop:
    PUSH CX
    ; Use CX for evil purposes here
    POP CX

    LOOP @@MyLoop

An example program that uses loops

Here's a quick program that uses loops. And it uses arrays, too: it will add up all of the values in an array. It doesn't display anything, so if you want to see it in action, try using a debugger. In Turbo Debugger, hit Control-F7 to watch a variable:

------- TEST6.ASM begins -------

%TITLE "Assembler Test Program #6 -- Using Loops and Arrays"

    IDEAL

    MODEL small
    STACK 256

    DATASEG

ArrayOfWords                      DW   10, 20, 50, 100, 563
NumberOfWords                     DW   5   ; The array has 5 elements
Sum                               DW   0

    CODESEG

Start:
    ; Make the data segment accessible:
    MOV AX, @data
    MOV DS, AX

    ; Let DS:SI initially point to the first element in the array:
    MOV SI, OFFSET ArrayOfWords

    ; Set up the LOOP counter:
    MOV CX, [NumberOfWords]
@@AdditionLoop:
    ; Get the word currently pointed to by DS:SI:
    MOV AX, [WORD DS:SI]

    ; Add that value to the Sum variable:
    ADD [Sum], AX

    ; Advance to the next word in the array (1 word = 2 bytes):
    ADD SI, 2

    LOOP @@AdditionLoop

    ; At this point, Sum should contain 743 dec (02E7 hex).

EndProgram:
    MOV AX, 04C00h
    INT 21h
END Start

------- TEST6.ASM ends -------

I made sure that the sum of the five values in the array would not exceed the limit for a word, 65535 dec. If you do find yourself in a situation where you need to add lists of bigger numbers (doublewords, perhaps), look up the instruction ADC, meaning "add with carry". For subtracting bigger numbers, a related instruction is SBB, meaning "subtract with borrow".

Declaring variables in the code segment

Now that we know how to use jump instructions, we can declare variables in the code segment. Here's an example:

    CODESEG

Start:
    ; Put instructions here...
    JMP @@Temporary

MyWord                            DW   0ABCDh
MyByteArray                       DB   5, 6, 7, 8, 9, 10

@@Temporary:
    ; Put more instructions here...

You have to jump over the declared data. If you didn't, the processor would continue reading instructions from the code segment, including those declared bytes! The processor would try to execute the instructions meant by those bytes, which might cause some strange behavior!

If you use variables in the code segment, remember to use the override segment "CS:", like this...

    MOV AX, [CS:MyWord]
    MOV BL, [CS:MyByteArray + 3]

...because DS is still the default segment for data.

Of course, variables are not usually declared in the code segment. The data segment is a much better place for them. But if the data segment gets full (there is a 64K limit under the small memory model), then you can use this alternative.

Summary

In this chapter, we've been introduced to labels and unconditional jumps. We've learned how to make decisions by using comparisons and conditional jumps. We have also seen how to implement loops, both by using conditional jumps and by using the LOOP instruction. We have also wasted our time on such junk as NOP, avoiding the use of CMP, and putting variables in the code segment.

If you haven't tried writing your own assembler programs yet, now would be a good time to give it a shot. Review anything you're uncomfortable with, and then try playing around with some instructions and interrupt services. If you feel comfortable with loops, try writing a program to write out lines such as "I will not chew gum in class" fifty or a hundred times. Or fix up the TEST5.ASM sample program so that it accepts "Y" and "N", both upper case and lower case. Or modify TEST6.ASM in some way; perhaps it could find the sum of a list of bytes instead of a list of words.

In the next chapter, we'll learn more instructions, and we'll find out how to use procedures, which are just like functions in C.

A project