Course 1: Bits, Bytes and Integers
数字的位级表示(bit level representation)
Decimal system十进制;
In 1948, start to use binary system
low votage -> 0, high -> 1
store a digital value is easier than an analog value
Using bits can represent fractional, floating point value
小数点后从左到右权重逐渐增加,比如
0.2 = 0\times 2^{-1} + 0\times 2^{-2} + 1\times 2^{-3} + 1\times 2^{-4} + ...
0.2_{10}=0.[0011]_{...2}
32-bit / 64-bit float point values are annoying -> group collection into a time and use hexadecimal representation: 0 ~ 9 + A ~ F
1010 —— A , 1100 —— C,1111 —— F
C data type
some standards varies from machines.
USE the sizeof(*)
function if your program is sensitive to the bytes (我们的程序全程都是用了sizeof获取运行平台上的字节数定义)
Boolean Algebra(布尔代数)
A&B: "and"
A|B: "or"
^\simA: "not"
A^B: "Xor"/"Exclusive-Or"
Representation
We can use this to represents sets of values implicitly
——subsets of {0,1,2,3,4,5,6,7}, because one byte has 8 bits
0 1 1 0 0 1 1 0 -> {6,5,2,1}
{7,6,5,4,3,2,1,0}
一个字节上的布尔代数和全集{0,1,2,3,4,5,6,7}的子集之间的集合运算之间存在同构:
& - 并
| - 交
~ - 补
^ - 对称差(symmetric difference)
This operations would be applied to file I/O
&& and & / || and I / ~ and !(bang)
&&, ||, !
- view 0 as False
- other values as True
- always return 0(0x00) or 1(0x01)
Test a Null pointer before access: p && *p
(p值为0或者p分配空指针都是返回0)
Shift Operator
Left shift is always the same, while there are 2 right shift operators:
Left shift : x << n
Fill 0s at the right side
Right shift: x >> n
Logical shift : fill with 0s
Arithmetic shift(算数位移) : fill the signal bit (MY compiler does this, but C doesn't define the rule of using which right shift)
x = 01100010
<< 1 : 11000100
Log. >> 1 : 00110001
Arith. >> 1 : 00110001
x = 11100010
<< 2 : 10001000
Log. >> 2 : 00111000
Arith. >> 2 : 11110000
x >> n just like (int) (x / 2^n) (ignore the lowest n terms)
specially, (int) x >> sizeof(int) \equiv 0
原码,反码与补码
- 原码:即通常所说的二进制代码,非unsign型左起第一位为符号位,0表示正数,1表示负数,比如5(int)的二进制原码表示为: 00 .... 0101 (一共32b),而(int)-3的二进制原码表示为: 10 .... 0011 (一共32b)
- 反码:与按位取反~不同,反码分正负数,正数的反码是它本身,负数的反码是除符号位外的每一位按位取反,比如(int)5的反码表示为: 00 .... 0101 (一共32b),-1(int)的反码表示为: 11 .... 1110 (一共32b)。而(int)5按位取反的结果为: 11 .... 1101 (一共32b),(int)-1按位取反的结果为: 01 .... 1110 (一共32b)
- 补码:现代计算机语言中在内存中实际存储的二进制数均为补码,使用补码存储的最大好处是不需要设计减法器,直接用补码的加法可以同时表示整数的加法和减法。
- 其运算规则如下:正数的补码就是原码,负数的补码是反码+1比如(int)5的补码: 00 .... 0101 (一共32b),而(int)-1的补码为11 ... 1111 (一共32b)
- 负数补码+1的目的是可以利用加法器直接求解减法。比如1-1变为1+(-1)后的补码运算是
00...00001 + 11...11111
,加完之后进位会导致所有位都是0(数学的证明就略了)
- 几个tips:
- 任何数与(int)1 (内存中存储的bi: 0000....0001)做按位与,得到最后一位的二进制值(相当于模2)
- 任何数与(int)-1 (内存中存储的bi: 1111....1111)做按位异或后+1,得到它的相反数的补码()
- 任何数与(int)0做按位异或得到它自己
- 任何数与(int)-1做按位或得到-1
- 任何数与(int)-1做按位与得到他自己
Exercise: 给一个short型,正负未知,在仅用加减法和位运算并且ban掉if和switch语句的前提下如何写一个函数输出二进制原码
#include<iostream>
using namespace std;
const int BIT_SHORT = sizeof(short) * 8;
void binary(short m, char cptr[])
{
short sign;
sign = m >> (BIT_SHORT-1); //注意位运算的优先级后于加减,但是为了明确起见还是加了括号
//m位正数sign为0,为负数sign为-1
//short temp = (-1) * sign;
//-1的补码二进制每一位都是1,若m为正数,定义的sign直接为0
short n = (sign^m) - sign;
//若m为正数,则0和m取按位异或,仍然得到m;若m为负数,则取按位异或相当于取反码,之后再+1则变成了-m的原码。n始终为正数
cptr[BIT_SHORT] = '\0'; //加结束符,否则会读取出一些奇奇怪怪的东西出来
for(int i = 0; i<BIT_SHORT-1; ++i)
{
cptr[BIT_SHORT-1-i] = (char)((n&1) + '0'); //0的ASCII码为48,这里做加法强制转换为ASCII码计算
//n与1做按位与,只有最后一位为1时值为1,否则为0
n = n>>1; //注意,计算机中所有的数都是按补码存储的,不能直接位运算
//注意,正数的右移运算右侧补0,而负数的右移运算右侧是补1的
}
cptr[0] = -sign + '0'; //第一位为-sign
return;
}
int main()
{
cout << "The type \"short\" occupies " << BIT_SHORT << " bits." << endl;
short i = -10; //为了简洁,我们使用2个字节的short型演示
short j = 5;
char cptr_bi[BIT_SHORT + 1]; //没有初始化的字符串数组内的元素是随机的,所以需要手动加上结束符\0
char cptr_bj[BIT_SHORT + 1];
binary(i, cptr_bi);
binary(j, cptr_bj);
cout << "The binary representation of " << i << " is:" << cptr_bi << endl;
//输出1000000000001010
cout << "The binary representation of " << j << " is:" << cptr_bj << endl;
//输出0000000000000101
return 0;
}
Another way to understand the two's complement(补码)
Use w bits to represent an int
:
Two's complement:
B2T(x)=-x_{w-1}\cdot 2^{w-1} + \sum _{i=0}^{w-2} x_i\cdot 2^i
Where the x_is are called the two's complement. The first bit x_{w-1} is called sign bit.
We can verify that the two understandings are equivalent (a trivial mathematic problem, we omit it).
Maximum number:
0111111....111\implies 2^{w-1}-1
Minimum number:
1000000....000\implies -2^{w-1}
The two's complement representation starts from 10000...000, ending up with 01111...111. We can verify that the value increases progressively and linearly with the two's complement. —— The most important reason why the computer stores two's complement form.
Of course, such a mapping is invertible (幂级数展开的完备性,使用多项式环的理论可以证明)
Signed vs. Unsigned in C
- A mix of unsigned and signed in single expression (including mutiplication and division) --> signed value
int
implicitly cast to unsigned unsigned int
!!!
- To avoid bugs, I think it's better NOT to use unnecessary unsigned type
- when you put a "u" or "U" after the number, it would be cast to unsigned type
- consider the following examples
-1 < 0
, -1 > 0u
, 2147483647u < 2147483647 -1
, 2147483647 > 2147483647 -1
, 2147483647 > 2147483648u
, 2147483647 < (int)2147483648u
*2147483647 is T_{max}
**(int)2147483648u is cast to signed, and the sign bit is 1. The complier views its value as -2147483648
- A typical bug (endless drop) :
#include<iostream>
using namespace std;
int main()
{
unsigned int n = 10;
int i;
unsigned int j = 0;
//这样写不会出bug, unsigned值被强制转换为signed,左值优先级高
for(i = n;i>=0;i--)
{
j++;
cout << "i = " << i << endl;
if(j > n + 2)
{
cout << "bug!" << endl;
break;
}
}
j = 0;
i = 10;
//这样写会出Bug,而且编译器报警
for(n = i;n>=0;n--)
{
j++;
cout << "n = " << n << endl;
//输出完n = 0后会接着输出n = 4294967295
//把unsigned和sign进行比较会报警(同为正数没有问题,但是一正一负就会出bug)
if(j > i + 5)
{
cout << "bug!" << endl;
break;
}
}
j = 0;
n = 10;
//这样写会出bug! 编译器报警,sizeof返回的是一个unsigned值,因而判断表达式左侧为unsigned value!
for(i = n;i - sizeof(int) >= 0;i--)
{
j++;
cout << "i = " << i << endl;
if(j > n + 2)
{
cout << "bug!" << endl;
break;
}
}
j = 0;
//这样写不会出bug,但会报警,同为正数时,signed和unsigned比较没有问题
for(i = n;i >= sizeof(int);i--)
{
j++;
cout << "i = " << i << endl;
if(j > n + 2)
{
cout << "bug!" << endl;
break;
}
}
return 0;
}
an important tip : sizeof(*)
is an operator, not a function! And it returns an unsigned value!
Robert Seacord's standard
unsigned
类型也不是一无是处,它可以提供有保障的越界行为(mod 2n ),而符号数越界时输出的值是没有规律的
short l = -(pow(2,15));
cout << l << endl; //输出-32768
printf("%hd\n",l-1); //输出-32767
cout << l-1 << endl; //输出-32769
cout << (l<<1) << endl; //输出-65536
printf("%hd\n",l<<1); //输出0
//看起来cout输出会把short的输出类型直接换成int——自适应转换输出类型
int m = (int)(-pow(2,31));
cout << m << endl; //输出-2147483648
cout << m-1 << endl; //输出2147483647
cout << (m<<1) << endl; //输出0
int k = -(int)(pow(2,31));
cout << k << endl; //输出-2147483647
cout << k-1 << endl; //输出-2147483648
cout << (k<<1) << endl; //输出2
//上面这两个的区别是pow输出的是double型,类型转换引起了1的误差
//C语言规定,符号数的越界结果等于几都有可能,不保证结果
//unsigned的加减乘除越界行为都是mod 2^n的,因而unsign的存在意义之一就是提供了有规律的越界行为
Sign Extension
Aim:
- given w-bit signed integer x
- convert it to (w+k)-bit interger with the same value
Rule:
- make k copies of sign bit ----- A rather simple way to expand the type!
- X' = (x_{w-1},x_{w-1},...,x_{w-1},x_{w-2},...,x_0)
- It is also a trivial mathematic problem