Binary Operations

Atrevida Game Programming Tutorial #2
Copyright 1997, Kevin Matz, All Rights Reserved.

"Prerequisites:"

Chapter 1: Introduction to Binary and Hexadecimal

Binary, as related to computer programming

This section will consider some of the terminology associated with binary and hexadecimal numbers in a computing context.

It is customary to refer to a digit in binary (either 1 or 0) as a bit, which supposedly stands for "Binary digIT". A byte is a set of eight bits. So, for example, 10101101 is a byte. A "nibble" (sometimes spelled "nybble", to mimic the intentional misspelling of "byte") is half of a byte, which is to say, it is a binary number with four bits.

A word is a sixteen-bit binary number. It is equivalent to two bytes, or four nibbles. And a double word, or "dword", is a 32-bit binary number, so a double word is equivalent to two words, or four bytes, or eight nibbles. Most commonly, the number of bits (or bytes) in a number type is referred to as that type's width.

Here are some examples of these types:

Bit             1 bin
Nibble          0110 bin
Byte            00100101 bin
Word            1110100001010110 bin
Double Word     10100110000111101110111011010101 bin

Now, a common question is: what is the range of each type? At this point, we will only be considering positive numbers (more often called unsigned numbers). We can assume that the lowest number will be zero, and so we can rephrase the question as, what is the maximum number possible for each type? That is, if all of the bits in one of the above types were ones, what would that number represent in decimal?

If you recall from the last chapter, we do have a method of converting binary numbers to decimal. For example, for the nibble, the maximum number would be 1111 bin, and for the decimal equivalent, we get:

                    3           2           1           0
Max. Value  =  1 * 2   +   1 * 2   +   1 * 2   +   1 * 2   =  15 dec

So for an unsigned nibble, the minimum value is zero, and the maximum value is 15 dec. Alternately, we could say that a nibble can hold sixteen distinct values.

Fine, but using this method to find the range becomes time-consuming for words and double words. Is there a faster way? Let us consider the fact that a bit can have two values or states, 1 and 0. And with a set of two bits, we can have four combinations: 0 and 0, 0 and 1, 1 and 0, or 1 and 1. And a set of three bits can have eight combinations: 000, 001, 010, 011, 100, 101, 110, and 111. And we have seen that a nibble, with four bits, can have sixteen combinations. Notice that this 2, 4, 8, 16 pattern is a geometric sequence. So we can use the "formulae"...

                  n
Combinations  =  2        , where n is the number of bits in the type

                  n 
Max. Value    =  2  -  1  , where n is the number of bits in the type

...to find the number of combinations and the maximum value for an unsigned type. For a byte, there are 2^8 = 256 dec possible combinations. Taking into account the zero, the maximum value for an unsigned byte is 255 dec (use the second formula). Using Pascal syntax, the range is 0..255.

To save you the trouble of calculating the ranges yourself, here is a table listing the number of combinations and the ranges for the unsigned types. Also included are the ranges for the signed types; we'll discuss these shortly. (All values listed here are in decimal.)

Type     (Width)  Combinations    Unsigned Range              Signed Range
----------------------------------------------------------------------------
Nibble       (4)            16             0..15                     -8..7
Byte         (8)           256            0..255                 -128..127
Word        (16)         65536          0..65535             -32768..32767
Double Word (32)    4294967296     0..4294967295   -2147483648..2147483647

Data in memory, and wraparound

As you probably already know, memory is divided into many equal-sized sections, ie. bytes. On the PC, we can access memory on a byte-by-byte basis. This means that we can store byte-sized numbers easily, as well as word- and double-word-sized numbers, but there is no direct way to store nibble- or bit-sized numbers. For this reason, we won't consider nibbles and bits to be considered actual types. (Nibbles were shown in the above table only to demonstrate the ranges. In another chapter, we will learn how to store and retrieve nibble- and bit-sized data within a byte or other type.)

For now, we are only concerned with the C aspect, so I will list the binary types we have discussed with the types present in C (particularly, the Borland and Turbo C and C++ compilers, although most C/C++ compilers on the PC use these type sizes as well; from this point on I'll assume that the information in this table is correct for your compiler):

                          Borland or Turbo
Type                      C/C++ Type                               Bytes
--------------------------------------------------------------------------
Byte (unsigned)           unsigned char                                1
Byte (signed)             char                                         1
Word (unsigned)           unsigned int                                 2
Word (signed)             int                                          2
Double Word (unsigned)    unsigned long int (or "unsigned long")       4
Double Word (signed)      long int          (or "long")                4

I bring up C at this point because there is a common occurrence in C that is directly related to the next topic of discussion, wraparound.

Wraparound reminds me of the odometer in an automobile; think of what happens as the number that the odometer digits represent reaches 1000000 miles or kilometres, or whatever the limit is. The odometer reads 999997, 999998, 999999, and then all of the digits "wrap around" to zeroes, and the odometer starts counting from 000000 again.

Imagine that we have a byte, and we are continually incrementing it (perhaps we are using "x++" in C). After enough increments, we'll count to 11111100 bin, 11111101 bin, 11111110 bin, 11111111 bin, and now you probably see where I am headed. As you might imagine, given the previous scenario, the digits in the byte wrap around to zeroes. So why does this happen? 11111111 bin + 1 bin = 100000000 bin. The byte only holds the eight least-significant bits (which are all zeroes), and so the most-significant bit, the one, is essentially discarded.

This also happens when we are repeatedly decrementing a byte or other type. We might have a byte that counts down until it reaches 00000011 bin, 00000010 bin, 00000001 bin, 00000000 bin, and then then byte wraps around to 11111111 bin. For that matter, wraparound happens when you do any addition, subtraction, or other operation that results in a value that falls outside the range of the given type. I'll give two examples; I'm sure many programmers have run into these situations while learning C or C++:

/* Example 1: */
unsigned int x;
for (x = 0; x <= 100000; x++) {
    /* do something */
}

In this example, the for loop becomes an infinite loop, because x is an unsigned int, with a range of 0..65535. When x is 65535, and the x++ instruction is executed, x wraps around to 0, and so the x <= 1000000 condition is always true.

/* Example 2: */
unsigned int y;
y = 30000 + 50000;
printf ("%d\n", y);

Instead of the expected 80000, you will probably get 14464. Again, this is due to wraparound. You can think of this as 30000, incremented 50000 times. You can correct these situations here by using a larger data type -- a long int or an unsigned long int (equivalent to double words) would both work here.

Bitwise logical operations

Very often we will need to manipulate binary numbers at the bit level. For some reason or another, we may need to turn on, turn off, or toggle (switch a 0 to 1 or a 1 to a 0) certain bits in a byte or word. For these tasks, we can use the bitwise logical operators AND, OR, XOR (exclusive-OR), and NOT. AND, OR, and XOR are called binary operators, not because they operate on binary-coded operands, but because they take two operands. The + operator, for addition, is a binary operator for this reason: in the expression "9 + 4", the two operands are 9 and 4. In the same way, AND, OR, and XOR, use two operands; for example, we could have an expression "9 XOR 4" (although this could be expressed better using the binary-coded equivalents of 9 and 4). NOT is a unary operator: it uses only one operand. The - operator, for negation (not for subtraction), is considered a unary operator: if you have a variable x, "-x" negates the variable x. So for NOT, "NOT x" is a valid expression. Let's now explore these operators and their uses, as well as how to use them in C or C++.

AND

AND is the operator that is used to turn off (set to zero) bits in a binary number. This is often called "masking off" bits. To see how to use AND, we need to consider AND's truth table, shown below:

0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1

You can think of 0 as FALSE and 1 as TRUE. You can memorize the table if you wish, or you can "think logically" when considering AND: imagine that someone has given you two statements in one sentence, and you need to decide whether the sentence is true overall. The sentence is true if both of the statements in it are true. If the sentence given to you was, "Fifty is greater than twenty, and Rome is a city in Italy", then you would say that yes, both statements are true, so the sentence is true. But if the first statement is false, or if the second statement is false, or both statements are false, then the sentence is false. (Part of the sentence may be true, but everything in the sentence is not true.) This is essentially how AND works.

So if we have the expression "1 AND 1", both operands are TRUE, and so the expression evaluates to 1, or TRUE. But if we have "1 AND 0", both operands are not TRUE, so the expression evaluates to 0, or FALSE.

Often, however, you will see operands where there are more than one binary digit, such as "01101101 AND 10101010" (for this chapter, assume that any operands for these operators are in binary, unless stated otherwise). Here, you pair up the operands and use the AND operator on corresponding digits in the two numbers. Here's how we would evaluate the above example:

        0 1 1 0 1 1 0 1
   AND  1 0 1 0 1 0 1 0
  -----------------------
        0 0 1 0 1 0 0 0

So, in the left-most column, we see that 0 AND 1 is 0, so the result is 0; in the next column, 1 AND 0 is also 0; in the next column, 1 AND 1 gives a result of 1, and so on.

Earlier it was mentioned that AND is often used to turn off bits in a number. To do this, we AND the number we wish to modify with a specially constructed number called a mask. The mask should contain 0 bits where the corresponding bits in the original number are to be turned off. The mask should contain 1 bits everywhere else. When the number and the mask are ANDed together, you will notice that the bits in the result that correspond with 1 bits in the mask are the same as the corresponding bits in the original number. In effect, they are "left alone". But the bits in the result that correspond with 0 bits in the mask will all be set to 0, or "masked off". This is because anything ANDed with 0 is 0 (see the truth table above).

Say that we have the mask "11110001 bin". If we AND any byte with this mask byte, the left-most four bits and the right-most bit will all remain intact in the result, and the remaining three bits will be zeroes. So if we AND the number "10110110 bin" with our above mask byte, we will get the result "10110000 bin". Can you confirm that this is correct?

OR

OR is basically the opposite of AND: OR is used to turn on bits (set them to 1). OR works the same way that AND does, except that it uses the following truth table:

0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1

Again, you can memorize the table, or you can think of it this way: given two statements in a sentence again, you must decide if the statement is at least partially true. So if both statements in the sentence are true, then the entire statement is true, and that meets our conditions. If only one statement is true (such as in the sentence, "A week contains seven days, or, all politicians are honest"), you can ask, is either the first or second statement true, and declare that, yes, at least one statement in the sentence is true. But if both statements are false, then the entire sentence is false.

Here's an example of ORing two bytes together:

        0 0 1 0 1 0 1 1
    OR  1 0 1 1 0 0 1 0
   ----------------------
        1 0 1 1 1 0 1 1

Notice that there are 1 bits everywhere in the result except where the corresponding bits in the operands are both zeroes.

To turn on certain bits in a number, we can OR the number with an "OR mask", which has 1 bits in the places in which the corresponding bits in the result are to be turned on. 0 bits should be left everywhere else, and those bits will be "left alone" (they will simply be copied from the original number to the result).

So, if we OR any binary number with the OR mask "00000011 bin", the result will have the least-significant two bits turned on. Try ORing the mask with "11010000 bin". Did you get "11010011 bin"? Also remember that the bits will remain turned on if a 1 is already in that position in the original number. So if we were to OR our above mask with "00101011 bin", the least-significant two bits will remain as ones.)

XOR

XOR, the exclusive-OR operator, can be used to toggle (turn a 0 into a 1 and turn a 1 into a 0) specified bits. (XOR has some fun properties, described later.) Here is the XOR truth table:

0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0

Given our example of evaluating the truthfulness of two statements in a sentence, think of XOR as "is one or the other statements true, but not both?"

To toggle a bit in a number, put a 1 in the corresponding place in a XOR mask. Leave zeroes everywhere else; these positions will be left alone. So, given the XOR mask "11110000 bin", and the number "10100110 bin", we get "01010110 bin"; notice how the first four digits have been toggled, and the remaining digits are the same as in the original number:

        1 0 1 0 0 1 1 0
   XOR  1 1 1 1 0 0 0 0
  -----------------------
        0 1 0 1 0 1 1 0

Now for XOR's interesting property: if you XOR a number, say the variable x, with a XOR mask, and then you XOR the result with the XOR mask again, you get x back as a result! Try it with our above example. If we XOR the result, "01010110 bin", with the XOR mask, "10100110 bin", we get "10100110 bin". Here are two applications of this property (I'll try to keep this brief):

We can use this property later when doing graphics: we can essentially XOR an image to the screen, and later XOR the image to the screen again, and the background will be restored. Unfortunately, this method cannot always be used, as we'll see when we get to the graphics chapters. (The colors in the image get distorted, as they become dependent on the colors that are on the background.)

Another interesting application is very simple cryptography. If we have a byte-sized "key" (like a password), we can XOR all of the bytes in a message with the key, and we get a somewhat garbled message ("somewhat garbled", because, for example, a key of "00000000 bin" will not change any of the bytes -- try it!). Then to decrypt our encrypted message, we can XOR all of the bytes with the secret key, and we get our original message back. Never use this by itself for any real cryptography, of course; it's just too easy to break (just try all of the keys from 0..255).

NOT

NOT is used to flip (toggle) all of the bits in a number. (You can also do this by XORing the number with enough ones.) NOT's truth table is:

NOT 0 = 1
NOT 1 = 0

Remember that NOT is an unary operator, so it only takes one operand. All of the ones in a binary number become zeroes, and all of the zeroes become ones. Here's an example:

   NOT  1 0 0 1 0 0 1 1
  ----------------------
        0 1 1 0 1 1 0 0

Just like with XOR, using NOT on a number and then using NOT on the result gives back the original number.

Using bitwise logical operators in C and C++

C and C++ unfortunately do not allow you to enter binary values directly into your program listings. You must first convert binary numbers into decimal or hexadecimal. Hexadecimal requires a different syntax than we are accustomed to: you must prefix hexadecimal numbers with "0x", like this: "0x3D7B" for 3D7B hex.

Listed in the table below are our operators and their C equivalents.

Bitwise
Logical Operator  C/C++ Op.   Sample
----------------------------------------------------------------------------
AND               &           x = 0xF2 & 0xEC;
OR                |           printf ("Result is %X hex\n", 0x30 | 19);
XOR               ^           y = my_char ^ my_key;

Notice in the OR code example, you can use "%X" as a "conversion command" with the printf() function. If you use "%x", the digits A through F are displayed in lower case. (I haven't found any information regarding the output of hex numbers using C++'s cout.) Notice that in the XOR example, both variables my_char and my_key should be the same type (chars or ints, for example), although if one variable is of a smaller type, all of the bits to the left of the smaller number are considered to be zeroes. And notice that C and C++ do not provide a bitwise NOT operator; to perform a NOT, recall that you can XOR a number with a series of 1's (eg. FF hex for a byte) to get the same effect.

A common error is to replace the & and | operators with && and ||. These operators have separate functions!

C and C++ provide the operators &=, |=, and ^=, which are used in the same way that other related operators such as += and /= do.

If you're somewhat uncomfortable with this topic, it might be helpful to take a few minutes to experiment with these operators in C or C++.

Negative numbers in binary, and two's complement negation

Now that we know all about how unsigned binary numbers are stored and handled, we can turn our attention to signed binary numbers. How does the computer represent negative binary numbers? To answer this question, let's first consider the fact that the number of possible digit combinations in a given number type remains the same, regardless of whether the type is signed or unsigned. For bytes, there are always 256 possible distinct values. Could we use one of the bits in the type to indicate whether the value is negative or positive? Yes, and in fact, this is the way in which signed numbers are stored. The most-significant (left-most) bit is used to keep track of the sign of the number. This bit is sometimes called the sign bit. Notice, however, that that one bit takes away from the number of bits in that type that can store an actual number. For example, in an unsigned byte, there are eight bits available to store a number; in a signed byte, there is one sign bit and only seven bits available to store the actual number, but of course, you can now use both positive and negative numbers.

The sign bit is zero if the number stored is to be considered positive; the sign bit is one if the number is to be considered negative. (One way of thinking of this is: ask the question, "is the number negative?" If the answer is TRUE, then the sign bit is 1 (TRUE), otherwise, the sign bit is 0 (FALSE).)

But the story is not over yet... So far, I have probably given you the impression that the number part of a signed type (that is, all of the bits except the sign bit) would be equal for both positive and negative numbers that had the same absolute value. Put another way, say that we had a number x, which was of a signed type, and we wanted to change the sign of the number that x represented. We would think that changing the sign would be as easy as changing the sign bit from 0 to 1 or 1 to 0. Unfortunately, it's not that easy: we have to use what is called two's complement negation. Fortunately, two's complements aren't too difficult to perform; in fact, there are only two steps.

First, we apply the bitwise NOT operator to our number. Then we add one to the result to get the original number's two's complement. The two's complement will have a 1 in the sign bit if the original number was positive (if it had a 0 in the sign bit), and vice versa, so the two's complement is the negative equivalent of the original number. This process is reversible, so we can use the same steps to switch a number back and forth between positive and negative states.

Perhaps this is easier to understand given a few examples. All of the binary numbers in the following chart are bytes:

Binary Number   Decimal Value        Two's Complement   Decimal Value
--------------------------------------------------------------------------
     00000000               0                00000000               0
     00000001               1                11111111              -1
     00000010               2                11111110              -2
     00000011               3                11111101              -3
     00000100               4                11111100              -4
     00000101               5                11111011              -5

     00100000              32                11100000             -32
     01111111             127                10000001            -127

     11111001              -5                00000111               5
     10000000            -128                10000000            -128 (!)

You may notice the pattern developing in the two's complement column in the first section of the table. We've seen the 1, 10, 11, 100, 101, etc. pattern when counting; here there are zeroes following that same pattern, surrounded by ones! But you'll notice a one-to-one connection with the original binary numbers does not exist; the two's complement pattern is actually "shifted down" on the chart by one row. (This is due to the fact that we are adding one to the result of the NOT operation.)

Also shown are some other positive and negative values. The final entry in the table seems surprising, until we recall the range of a signed byte, which is -128..127. There is no way to express the positive form of -128 (in a byte), so this minor inconsistancy results. (With the two's complement scheme, it looks like there are more negative numbers possible than positive numbers. This is true, but if we bend mathematics a little and consider zero to be a positive number, then the numbers of positive and negative numbers available are equal.)

Try converting some bytes to their two's complements. Then try converting some word-sized binary numbers. The steps are the same: a NOT, then an increment.

Briefly, why do we need to represent negative numbers using the two's complement? It allows us to have a continuous range of numbers. We can decrement or subtract past the zero on the number line, and the results (as long as they are within range) will be correct. Using a scheme where only the sign bit changes, decrementing a 0 byte would give -127, and for that matter, there would be two ways to represent zero: a "positive zero" and a "negative zero".

Subtraction by using addition

We can do subtraction easily with binary if we have two signed binary numbers of the same type (byte, word, etc.). With normal decimal numbers, we can rewrite "5 - 3" as "5 + (-3)", so that the expression uses addition instead of subtraction. We can do the same thing with binary: if we want to perform the subtraction "a - b", then we take the two's complement of b and add it to a (don't take the two's complement of a; leave it as it is). Here is an example, where where we will subtract "00101110 bin" from "11001010 bin":

11001010 bin - 00101110 bin  =  11001010 bin + 11010010 bin

     11001010 bin
   + 11010010 bin
  -----------------
   1 10011100 bin     --->  10011100 bin

We can disregard the carried 1. Let's check this subtraction by converting it into decimal:

  11001010 bin  =  -(00110110 bin)  =  -(32 + 16 + 4 + 2)  =  -52 dec

  00101110 bin  =  32 + 8 + 4 + 2  =  48 dec

  -52 dec - 48 dec  =  -100 dec  =  -(100 dec)  =  -01100100 bin
                                                =   10011100 bin  (yes!)

Summary

In this chapter, we've become acquainted with some of the terminology associated with binary and hexadecimal numbers, including the different number types (such as bytes and words). We've learned the ranges of the number types we will be using, and we're now aware of issues such as wraparound. AND, OR, XOR, and NOT were introduced, and their associated operators in C and C++ were given. Finally, we learned about storing and using negative binary numbers, by using the two's complement, and we learned a tricky way of performing subtraction.

We will be using essentially all of these topics in later chapters. For instance, in the next chapter, we will use the bitwise logical operators in conjunction with some other operators in order to efficiently manipulate bits in binary numbers.

A project