Chapter 12
C provides operators for direct manipulation of bits and bytes. These operators, and other facilities such as pointers and casts, enable C to perform low-level operations that are typically the domain of assembly languages. With care, many of these low-level functions can be implemented in a portable fashion. However, for some programs (e.g., embedded systems), it may be necessary to write non-portable specific code to interface with particular hardware devices. In these cases, it is critical to be able to control the individual bits corresponding to the device pins.
Consider a 16-bit (two-byte) integer with the decimal value 22425. This value is stored as the following bit pattern.
0101 0111 1001 1001
Since the number is in base 2, each digit from right-to-left represents a successive power of two (20, 21, . . . , 215). The right-most bit is called the least-significant bit (LSB) and the left-most bit is the most-significant bit (MSB). For signed types, a value is negative if the MSB is 1, and the MSB is termed the sign bit. Most machines store negative values in 2’s-complement form, such that the value −22425 is represented as
1010 1000 0110 0111
The conversion of a negative number to 2’s-complement form is straightforward. For example, the value −22425 may be converted to 2’s-complement binary as follows.
1. Take the positive value (22425) and subtract one (22424).
2. Represent this number in binary (0101 0111 1001 1000 = 0x5798).
3. Perform 1’s (bitwise) complement (1010 1000 0110 0111 = 0xa867 = −22425).
For unsigned types, all bits, including the MSB, contribute to the magnitude of a positive number.
Thus, unsigned variables may store values twice as large as signed variables of equivalent capacity.
In the example above, the value of the bit-pattern for an unsigned type is 43111.
The C language has no mechanism to represent binary numbers directly, but provides ways to represent values in decimal, octal, and hexadecimal. For most purposes, decimal numbers are most convenient, but for bitwise operations, a hexadecimal representation is usually more appropriate. Hexadecimal (base 16) numbers e ectively break binary values into blocks of four, making them easier to manage. With experience, hex tends to be more comprehensible than plain binary.
C provides six bitwise operators: AND &, OR |, exclusive OR (XOR) ^, NOT ~, left-shift <<, and right-shift >>. These operators may be applied only to integer operands, char, short, int, and long, which may be signed or unsigned; they may not be used with floating-point operands. The &, |, ^, <<, and >> operators each have corresponding assignment operators &=, |=, ^=, <<=, and >>=. These behave analogously to the arithmetic assignment operators. For example,
z &= x | y;
is equivalent to
z = z & (x | y);
Important. As mentioned in Section 2.9, the bitwise operators, &, |, and ~, are di erent from the logical operators AND &&, OR ||, and NOT !, and should not be confused with them. These di erences will be made clear in the discussion below.
The &, |, ^, and ~ operators permit Boolean logic operations on individual bits. The ~ operator is a unary operator, which simply converts a 0 to 1 and vice-versa. The other three are binary operators, which compare two bits according to the rules in the following table.
x y x & y x | y x ^ y
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
Consider the following 8-bit (1-byte) example. We define three unsigned variables
unsigned char x = 55; /* 55 (dec) = 0x37 (hex) = 0011 0111 (binary). */ unsigned char y = 25; /* 25 (dec) = 0x19 (hex) = 0001 1001 (binary). */ unsigned char z;
and perform a series of operations on them, storing the result in z. The first, z = x & y, makes z equal to 17.
0011 0111
0001 1001 &
0001 0001 /* result is 17 (dec) = 0x11 (hex) */
The bitwise & (AND) operator sets a bit in z to 1 only if both corresponding bits in x and y are one. This operator is typically used to reset bits to zero and to select certain bits for testing.
The second operation, z = x | y, makes z equal to 63.
0011 0111
0001 1001 |
0011 1111 /* result is 63 (dec) = 0x3f (hex) */
The bitwise | (inclusive OR) operator sets a bit to 1 if either or both corresponding bits in x and y are one. It is typically used to set selected bits to one.
The third operation, z = x ^ y, makes z equal to 46.
0011 0111
0001 1001 ^
0010 1110 /* result is 46 (dec) = 0x2e (hex) */
The bitwise ^ (exclusive OR) operator sets a bit to 1 if either, but not both, corresponding bits in x and y are one. It is typically used to toggle selected bits to the alternate state.
The fourth operation, z = ~x, makes z equal to 200.
0011 0111 ~
1100 1000 /* result is 200 (dec) = 0xc8 (hex) */
The bitwise ~ (one’s complement) operator converts a 1 bit to 0 and a 0 bit to 1. It is commonly used to negate a group of bits, and is often used in conjunction with the shift operators.
12.2.2 Right Shift and Left Shift
The shift operators, << and >>, shift the bits of an integer variable to the left or right, respectively. For a left-shift operation x << n, the bits in x are shifted n bit positions to the left, with the shifted-out bits being lost and the vacated bits being filled with zeros.
char x = 0x19; /* Equals 0001 1001 binary (25 decimal). */ char y = x << 2; /* Equals 0110 0100 binary (100 decimal). */
Shifting x by negative n, or by n greater than or equal to the width of x (in bits), results in an undefined value.
The right-shift operator is a little more complicated. If the variable x is unsigned, then x >> n behaves analogously to left-shift; the n right-most bits are shifted-out and the vacated bits are filled with zeros. However, if x is a signed variable, the expression is implementation dependent. Some machines perform “logical shift”, which fills the vacated bits with zeros (as for unsigned right-shift), while others perform “arithmetic shift”, which fills the vacated bits with the value of the sign bit. For example,
signed char x = -75; /* Equals 1011 0101 binary. */
signed char y = x >> 2; /* Equals 0010 1101 if logical shift. */
/* Equals 1110 1101 if arithmetic shift. */
Thus, on a machine that performs logical right-shift, the value of y is 45, while on a machine that performs arithmetic right-shift, the value of y is −19. In general, it is usually preferred to use unsigned types for bitwise operations to avoid non-portable behaviour.
Notice that a left-shift by n is equivalent to multiplication by 2n (e.g., x << 2 is equal to x * 4). Similarly, right-shift by n is equivalent to division by 2n. (More precisely, right-shift by n is equivalent to division by 2n if the operand is unsigned or non-negative, or if the variable is negative and the machine performs arithmetic right-shift.) Bitwise operations are sometimes used to perform arithmetic expressions if the right-hand operand is a power of two. For example, the following expressions are equivalent.
x * 2 is equivalent to x << 1
x / 16 is equivalent to x >> 4
x % 8 is equivalent to x & 7
Bitwise expressions tend to be faster than integer arithmetic, but such optimisations are generally redundant as modern compilers will tend to replace “power of two” arithmetic with bitwise operations automatically.
The precedence of the bitwise operators is lower than the arithmetic operators, with the exception of the unary ~ operator, which has equal precedence to the unary (logical) ! operator. The left and right shift operators have greater precedence than the relational operators < and >, but &, |, and ^ have lower precedence. The precedence of & is greater than ^, which is greater than |. All bitwise operators have greater precedence than the logical operators && and ||.
As with the arithmetic, relational and logical operators, it is unwise to rely on precedence in multi-operation expressions. Such practice tends to produce obscure and error-prone code. It is far better to make ones intent clear with parentheses.
Bitwise operations are commonly used for one of two purposes. The first is to conserve space by packing several variables into a single byte (e.g., binary “flags” that need only represent the values 0 or 1). The second is to interface with hardware, where a group of bits at a certain address correspond to the pins of some device. In both cases we want to be able to manipulate and test individual bits or groups of bits.
The following example presents some common idioms for dealing with bits, which allow us to turn bits on or o and to test their current state. These techniques are collectively known as “masking” as they enable particular bits to be selected according to a specified bit-mask. The first step in creating a mask is to define variables to represent each bit of the integer variable; (we will consider only the lowest 4 bits in this, rather contrived, example).
enum {
FIRST = 0x01, /* 0001 binary */
SECND = 0x02, /* 0010 binary */
THIRD = 0x04, /* 0100 binary */
FORTH = 0x08, /* 1000 binary */
ALL = 0x0f /* 1111 binary */
};
The constants are each powers of two, so that just one bit is set and the rest are zeros. The exception is ALL, which defines a mask to select all four bits. An equivalent way to define the above constants is to use the left-shift operator as follows.
enum {
FIRST = 1 << 0,
SECND = 1 << 1,
THIRD = 1 << 2,
FORTH = 1 << 3,
ALL = ~(~0 << 4)
};
The last of these is a trifle cryptic, but is a common technique for creating a specified number of ones. The key is to realise that ~0 creates a value where all bits are 1’s. For example, (assuming we are dealing with an 8-bit value)
1111 1111 /* ~0 */
1111 0000 /* ~0 << 4 */
0000 1111 /* ~(~0 << 4) */
The set of statements below depict a number of typical masking operations. By combining masks, we can select specific groups of bits to set, reset, or toggle, or to test as a conditional expression.
unsigned flags = 0;
flags |= SECND | THIRD | FORTH; /* Set bits 2,3,4 (1110). */
flags &= ~(FIRST | THIRD); /* Reset bits 1,3 (1010). */
flags ^= THIRD | FORTH; /* Toggle bits 3,4 (0110). */
if ((flags & (FIRST | FORTH)) == 0) /* TRUE if bits 1 and 4 are off. */
flags &= ~ALL; /* Reset all bits (0000). */
There are several points to note from this example. First, notice how the | operator is used to combine masks, and the ~ operator is used to negate this result so that all bits are one except the mask bits. Second, notice the use of the various assignment operators, which apply the masks to the left-hand operand, flags. The |= operator is used to set bits (i.e., turn bits on) and the &= operator is used to reset bits (i.e., turn bits o ). These operations determine the state of the specified bits in flags regardless of their current state. On the other hand, the ^= operator is used to toggle selected bits so that they become the opposite of their current state. Finally, notice in the second-last line how the bitwise operators can be used to create conditional expressions based on the state of particular bits. Individual bits, or groups of bits, may be selected via the & operator.
12.4 Bit-fields
C provides an alternative to bit-masking operations for packing several objects into a single machine word. This facility is called a bit-field, and it uses structure syntax to define several objects within a single variable. Thus, the example from the previous section may have been represented as
struct {
unsigned first : 1;
unsigned secnd : 1;
unsigned third : 1;
unsigned forth : 1;
} flags;
This defines a variable flags that contains four 1-bit fields. The number following the colon rep-resents the field width in bits. The individual fields are accessed in the same manner as structure members, so that individual bits may be manipulated as follows.
flags.secnd = flags.third = flags.forth = 1;
flags.first = flags.third = 0;
flags.third = !flags.third;
flags.forth = !flags.forth;
if (flags.first == 0 && flags.forth == 0)
flags.first = flags.secnd =
flags.third = flags.forth = 0;
Notice that these operations perform the same basic tasks as the bit-masking code in the previous section.
To novice C programmers, bit-fields may seem a more natural representation than the previous bit-masking operations, but this is a superficial advantage and bit-fields should be used with caution.
The problem is that almost everything about bit-fields is implementation dependent and, without due care, they can promote non-portable code. For example, di erent machines apply di erent restrictions on field widths; some machines impose a maximum width of 16-bits while others permit 32-bits. Also, the way the fields are packed within the structure (i.e., their alignment) varies between machines. Byte-order dependencies are another issue.
In general, bitwise operations and masking are preferred over bit-fields. They permit direct control of bit packing and, with a little experience, the masking idioms tend to be more convenient for managing groups of bits. When needing to manipulate individual bits, additional convenience can be attained via a few choice macros, such as the following (obtained from bitops.h in the Snippets source repository, www.snippets.org).
#define BitSet(arg,posn) ((arg) | (1L << (posn)))
#define BitClr(arg,posn) ((arg) & ~(1L << (posn)))
#define BitFlp(arg,posn) ((arg) ^ (1L << (posn)))
#define BitTst(arg,posn) ((arg) & (1L << (posn)))
These macros set, reset, toggle, and test, respectively, the posn bit of variable arg. They might be used as in the examples below.
enum { FIRST, SECND, THIRD, FORTH };
unsigned flags = 0;
flags = BitSet(flags, FIRST); /* Set first bit. */ flags = BitFlp(flags, THIRD); /* Toggle third bit. */ if (BitTst(flags, SECND) == 0) /* Test second bit. */
flags = 0;
Bitwise Operations
C provides operators for direct manipulation of bits and bytes. These operators, and other facilities such as pointers and casts, enable C to perform low-level operations that are typically the domain of assembly languages. With care, many of these low-level functions can be implemented in a portable fashion. However, for some programs (e.g., embedded systems), it may be necessary to write non-portable specific code to interface with particular hardware devices. In these cases, it is critical to be able to control the individual bits corresponding to the device pins.
12.1 Binary Representations
Computers store and manipulate information as bits—ones and zeros. These are grouped in cells or bytes, usually composed of eight bits, and each byte of memory is located by a particular address. In C, the various types each specify a certain number of bytes, and variables of a particular type are allocated the required amount of memory. The numerical value of a variable is stored as a bit-pattern within its allocated memory region.Consider a 16-bit (two-byte) integer with the decimal value 22425. This value is stored as the following bit pattern.
0101 0111 1001 1001
Since the number is in base 2, each digit from right-to-left represents a successive power of two (20, 21, . . . , 215). The right-most bit is called the least-significant bit (LSB) and the left-most bit is the most-significant bit (MSB). For signed types, a value is negative if the MSB is 1, and the MSB is termed the sign bit. Most machines store negative values in 2’s-complement form, such that the value −22425 is represented as
1010 1000 0110 0111
The conversion of a negative number to 2’s-complement form is straightforward. For example, the value −22425 may be converted to 2’s-complement binary as follows.
1. Take the positive value (22425) and subtract one (22424).
2. Represent this number in binary (0101 0111 1001 1000 = 0x5798).
3. Perform 1’s (bitwise) complement (1010 1000 0110 0111 = 0xa867 = −22425).
For unsigned types, all bits, including the MSB, contribute to the magnitude of a positive number.
Thus, unsigned variables may store values twice as large as signed variables of equivalent capacity.
In the example above, the value of the bit-pattern for an unsigned type is 43111.
The C language has no mechanism to represent binary numbers directly, but provides ways to represent values in decimal, octal, and hexadecimal. For most purposes, decimal numbers are most convenient, but for bitwise operations, a hexadecimal representation is usually more appropriate. Hexadecimal (base 16) numbers e ectively break binary values into blocks of four, making them easier to manage. With experience, hex tends to be more comprehensible than plain binary.
12.2 Bitwise Operators
C provides six bitwise operators: AND &, OR |, exclusive OR (XOR) ^, NOT ~, left-shift <<, and right-shift >>. These operators may be applied only to integer operands, char, short, int, and long, which may be signed or unsigned; they may not be used with floating-point operands. The &, |, ^, <<, and >> operators each have corresponding assignment operators &=, |=, ^=, <<=, and >>=. These behave analogously to the arithmetic assignment operators. For example,
z &= x | y;
is equivalent to
z = z & (x | y);
Important. As mentioned in Section 2.9, the bitwise operators, &, |, and ~, are di erent from the logical operators AND &&, OR ||, and NOT !, and should not be confused with them. These di erences will be made clear in the discussion below.
12.2.1 AND, OR, XOR, and NOT
The &, |, ^, and ~ operators permit Boolean logic operations on individual bits. The ~ operator is a unary operator, which simply converts a 0 to 1 and vice-versa. The other three are binary operators, which compare two bits according to the rules in the following table.
x y x & y x | y x ^ y
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
Consider the following 8-bit (1-byte) example. We define three unsigned variables
unsigned char x = 55; /* 55 (dec) = 0x37 (hex) = 0011 0111 (binary). */ unsigned char y = 25; /* 25 (dec) = 0x19 (hex) = 0001 1001 (binary). */ unsigned char z;
and perform a series of operations on them, storing the result in z. The first, z = x & y, makes z equal to 17.
0011 0111
0001 1001 &
0001 0001 /* result is 17 (dec) = 0x11 (hex) */
The bitwise & (AND) operator sets a bit in z to 1 only if both corresponding bits in x and y are one. This operator is typically used to reset bits to zero and to select certain bits for testing.
The second operation, z = x | y, makes z equal to 63.
0011 0111
0001 1001 |
0011 1111 /* result is 63 (dec) = 0x3f (hex) */
The bitwise | (inclusive OR) operator sets a bit to 1 if either or both corresponding bits in x and y are one. It is typically used to set selected bits to one.
The third operation, z = x ^ y, makes z equal to 46.
0011 0111
0001 1001 ^
0010 1110 /* result is 46 (dec) = 0x2e (hex) */
The bitwise ^ (exclusive OR) operator sets a bit to 1 if either, but not both, corresponding bits in x and y are one. It is typically used to toggle selected bits to the alternate state.
The fourth operation, z = ~x, makes z equal to 200.
0011 0111 ~
1100 1000 /* result is 200 (dec) = 0xc8 (hex) */
The bitwise ~ (one’s complement) operator converts a 1 bit to 0 and a 0 bit to 1. It is commonly used to negate a group of bits, and is often used in conjunction with the shift operators.
12.2.2 Right Shift and Left Shift
The shift operators, << and >>, shift the bits of an integer variable to the left or right, respectively. For a left-shift operation x << n, the bits in x are shifted n bit positions to the left, with the shifted-out bits being lost and the vacated bits being filled with zeros.
char x = 0x19; /* Equals 0001 1001 binary (25 decimal). */ char y = x << 2; /* Equals 0110 0100 binary (100 decimal). */
Shifting x by negative n, or by n greater than or equal to the width of x (in bits), results in an undefined value.
The right-shift operator is a little more complicated. If the variable x is unsigned, then x >> n behaves analogously to left-shift; the n right-most bits are shifted-out and the vacated bits are filled with zeros. However, if x is a signed variable, the expression is implementation dependent. Some machines perform “logical shift”, which fills the vacated bits with zeros (as for unsigned right-shift), while others perform “arithmetic shift”, which fills the vacated bits with the value of the sign bit. For example,
signed char x = -75; /* Equals 1011 0101 binary. */
signed char y = x >> 2; /* Equals 0010 1101 if logical shift. */
/* Equals 1110 1101 if arithmetic shift. */
Thus, on a machine that performs logical right-shift, the value of y is 45, while on a machine that performs arithmetic right-shift, the value of y is −19. In general, it is usually preferred to use unsigned types for bitwise operations to avoid non-portable behaviour.
Notice that a left-shift by n is equivalent to multiplication by 2n (e.g., x << 2 is equal to x * 4). Similarly, right-shift by n is equivalent to division by 2n. (More precisely, right-shift by n is equivalent to division by 2n if the operand is unsigned or non-negative, or if the variable is negative and the machine performs arithmetic right-shift.) Bitwise operations are sometimes used to perform arithmetic expressions if the right-hand operand is a power of two. For example, the following expressions are equivalent.
x * 2 is equivalent to x << 1
x / 16 is equivalent to x >> 4
x % 8 is equivalent to x & 7
Bitwise expressions tend to be faster than integer arithmetic, but such optimisations are generally redundant as modern compilers will tend to replace “power of two” arithmetic with bitwise operations automatically.
12.2.3 Operator Precedence
The precedence of the bitwise operators is lower than the arithmetic operators, with the exception of the unary ~ operator, which has equal precedence to the unary (logical) ! operator. The left and right shift operators have greater precedence than the relational operators < and >, but &, |, and ^ have lower precedence. The precedence of & is greater than ^, which is greater than |. All bitwise operators have greater precedence than the logical operators && and ||.
As with the arithmetic, relational and logical operators, it is unwise to rely on precedence in multi-operation expressions. Such practice tends to produce obscure and error-prone code. It is far better to make ones intent clear with parentheses.
12.3 Common Bitwise Operations
Bitwise operations are commonly used for one of two purposes. The first is to conserve space by packing several variables into a single byte (e.g., binary “flags” that need only represent the values 0 or 1). The second is to interface with hardware, where a group of bits at a certain address correspond to the pins of some device. In both cases we want to be able to manipulate and test individual bits or groups of bits.
The following example presents some common idioms for dealing with bits, which allow us to turn bits on or o and to test their current state. These techniques are collectively known as “masking” as they enable particular bits to be selected according to a specified bit-mask. The first step in creating a mask is to define variables to represent each bit of the integer variable; (we will consider only the lowest 4 bits in this, rather contrived, example).
enum {
FIRST = 0x01, /* 0001 binary */
SECND = 0x02, /* 0010 binary */
THIRD = 0x04, /* 0100 binary */
FORTH = 0x08, /* 1000 binary */
ALL = 0x0f /* 1111 binary */
};
The constants are each powers of two, so that just one bit is set and the rest are zeros. The exception is ALL, which defines a mask to select all four bits. An equivalent way to define the above constants is to use the left-shift operator as follows.
enum {
FIRST = 1 << 0,
SECND = 1 << 1,
THIRD = 1 << 2,
FORTH = 1 << 3,
ALL = ~(~0 << 4)
};
The last of these is a trifle cryptic, but is a common technique for creating a specified number of ones. The key is to realise that ~0 creates a value where all bits are 1’s. For example, (assuming we are dealing with an 8-bit value)
1111 1111 /* ~0 */
1111 0000 /* ~0 << 4 */
0000 1111 /* ~(~0 << 4) */
The set of statements below depict a number of typical masking operations. By combining masks, we can select specific groups of bits to set, reset, or toggle, or to test as a conditional expression.
unsigned flags = 0;
flags |= SECND | THIRD | FORTH; /* Set bits 2,3,4 (1110). */
flags &= ~(FIRST | THIRD); /* Reset bits 1,3 (1010). */
flags ^= THIRD | FORTH; /* Toggle bits 3,4 (0110). */
if ((flags & (FIRST | FORTH)) == 0) /* TRUE if bits 1 and 4 are off. */
flags &= ~ALL; /* Reset all bits (0000). */
There are several points to note from this example. First, notice how the | operator is used to combine masks, and the ~ operator is used to negate this result so that all bits are one except the mask bits. Second, notice the use of the various assignment operators, which apply the masks to the left-hand operand, flags. The |= operator is used to set bits (i.e., turn bits on) and the &= operator is used to reset bits (i.e., turn bits o ). These operations determine the state of the specified bits in flags regardless of their current state. On the other hand, the ^= operator is used to toggle selected bits so that they become the opposite of their current state. Finally, notice in the second-last line how the bitwise operators can be used to create conditional expressions based on the state of particular bits. Individual bits, or groups of bits, may be selected via the & operator.
12.4 Bit-fields
C provides an alternative to bit-masking operations for packing several objects into a single machine word. This facility is called a bit-field, and it uses structure syntax to define several objects within a single variable. Thus, the example from the previous section may have been represented as
struct {
unsigned first : 1;
unsigned secnd : 1;
unsigned third : 1;
unsigned forth : 1;
} flags;
This defines a variable flags that contains four 1-bit fields. The number following the colon rep-resents the field width in bits. The individual fields are accessed in the same manner as structure members, so that individual bits may be manipulated as follows.
flags.secnd = flags.third = flags.forth = 1;
flags.first = flags.third = 0;
flags.third = !flags.third;
flags.forth = !flags.forth;
if (flags.first == 0 && flags.forth == 0)
flags.first = flags.secnd =
flags.third = flags.forth = 0;
Notice that these operations perform the same basic tasks as the bit-masking code in the previous section.
To novice C programmers, bit-fields may seem a more natural representation than the previous bit-masking operations, but this is a superficial advantage and bit-fields should be used with caution.
The problem is that almost everything about bit-fields is implementation dependent and, without due care, they can promote non-portable code. For example, di erent machines apply di erent restrictions on field widths; some machines impose a maximum width of 16-bits while others permit 32-bits. Also, the way the fields are packed within the structure (i.e., their alignment) varies between machines. Byte-order dependencies are another issue.
In general, bitwise operations and masking are preferred over bit-fields. They permit direct control of bit packing and, with a little experience, the masking idioms tend to be more convenient for managing groups of bits. When needing to manipulate individual bits, additional convenience can be attained via a few choice macros, such as the following (obtained from bitops.h in the Snippets source repository, www.snippets.org).
#define BitSet(arg,posn) ((arg) | (1L << (posn)))
#define BitClr(arg,posn) ((arg) & ~(1L << (posn)))
#define BitFlp(arg,posn) ((arg) ^ (1L << (posn)))
#define BitTst(arg,posn) ((arg) & (1L << (posn)))
These macros set, reset, toggle, and test, respectively, the posn bit of variable arg. They might be used as in the examples below.
enum { FIRST, SECND, THIRD, FORTH };
unsigned flags = 0;
flags = BitSet(flags, FIRST); /* Set first bit. */ flags = BitFlp(flags, THIRD); /* Toggle third bit. */ if (BitTst(flags, SECND) == 0) /* Test second bit. */
flags = 0;
No comments:
Post a Comment