xzw码农转型笔记_II - CSAPP

Tavior

上次说到在深入了解Cpp的面向对象编程前我会先学习计算机系统这门课程，使用的参考书是CSAPP，同时配合B站上作者的教学视频食用，不求深入掌握，但求有一个总体的了解，明白一些基础概念和计算机在编译和运行程序时的基本过程，为之后学习并行计算编程打基础，鉴于本书最后三章内容与并行计算直接相关，所以，不可避免地，我只能完整地完成ICS+的课程内容......

先说明一下，从本周一开始我预计花三天的时间去集中阅读我们组这几年的理论文献，所以会咕咕三天 😀

Tavior

Lecture 1: Cource Overview

这一课提到的例子是数字溢出的问题
Prof举的例子 300 * 400 * 500 * 600虽然溢出，但仍然满足结合律，应该是因为这个结构实际上是一个剩余类环
浮点数的加法出现了一点古怪，当大数和小数加减时结合律不满足：(1e20 - 1e20) + 3.14 == 3.14，而1e20 - (1e20 + 3.14) == 0。
英文里指数的读法：1e20读作one E to the 20th

课程安排

数字的表示
二进制编码指令，阅读编译器给出的汇编代码
内存系统——我需要重点学习的东西
- 这里Prof举了一个数组越界的例子——类似我当初遇到的那种巧合，两个int覆盖了一个double
- 这种错误会导致修改了不正确的数据，很难找到
从程序底层提升性能
- 举的例子是行优先和列优先
网络编程——我需要重点学习的东西

小毛龙

大数吃小数现象在数值计算方面也是一个很经典的问题www 受制于ieee浮点数表示规范约束，一般设计算法都会尽量避免这种情况出现。不过也可以比较不计代价的封装一个自定义的数字类牺牲性能换取计算精度就是了...

Tavior

小毛龙嗯www明白啦，mathematica就是号称有无限多位的数字精度，他们那个应该就是自定义了，另外从今天开始正式开坑啦

Tavior

先说点题外话，从上周四开始，因为隔壁融科出现了一例阳性，我们单位一大波人健康宝都出现了弹窗，所以从那天起就开始居家办公了，在居家当天下午就传出了lockdown的传闻，而且那天原定于下午五点的发布会迟迟没有动静，去美团买菜发现当日运力已经饱和，就有点恐慌，紧急加购了一堆肉类、料理包，并且让我们室友里唯一一个没有弹出的去超市采购蔬果。听室友说当时超市里已经人山人海，水泄不通，他结账的时候整个超市的物资基本都被搬空了。发布会虽然之后辟谣了，不过超市里的人还是没有少很多（说起来中间掺和了几个一脸懵逼来日常买菜的大叔大妈，估计完全不知道发生了什么，一个大叔看到这阵仗愣住了，过了半天指着一堆人后面弱弱地问了一句“是在这里称西瓜吧” 🤣 ）

之后并没有Lockdown，所以美团买的物资退掉大半，但是菜买回来了还是得炒，就保留了猪肉没有退，所以这几天正好在家练习厨艺，自己烧菜吃。

从五月开始从2天一次的核酸逐渐收紧到每天都要全员核酸筛查，排队的人很多，挑着人不多的点去排队也要排上半个小时左右，不过稍好一点的是检测点地上画了两米线，大部分时候大家都还是隔得很开的。（吐槽一下做核酸身份证不消毒，而且捅喉咙极其敷衍，这能测个阳性出来估计满嘴都是病毒了）

然后昨天下午去居委会消弹窗，签未去过涉疫位点的承诺书，出于对人脸识别的厌恶，我并没有补录出入小区的人脸信息（这个人脸识别门禁只有在最方便出入小区的那一个口有，其他口是一直开着的，属实是除了恶心居民什么贼也防不住了），下午四点消了弹窗，六点就接到了这一周整周居家办公的通知（上午才发可以来单位但要签两点一线承诺书，朝令夕改了属于是。不过我正好不想签那个东西，那玩意纯粹甩锅用的，把防疫责任全部转嫁给个人罢了）

于是今天开始居家办公，上午开完组会然后做核酸，现在开始码字看书ww

Tavior

操，我码了半天的字不见了

Tavior

Course 1: Bits, Bytes and Integers
数字的位级表示(bit level representation)
Decimal system十进制；
In 1948, start to use binary system
low votage -> 0, high -> 1
store a digital value is easier than an analog value

Using bits can represent fractional, floating point value

小数点后从左到右权重逐渐增加，比如
0.2 = 0\times 2^{-1} + 0\times 2^{-2} + 1\times 2^{-3} + 1\times 2^{-4} + ...
0.2_{10}=0.[0011]_{...2}
32-bit / 64-bit float point values are annoying -> group collection into a time and use hexadecimal representation: 0 ~ 9 + A ~ F
1010 —— A , 1100 —— C，1111 —— F

C data type

some standards varies from machines.
USE the sizeof(*) function if your program is sensitive to the bytes (我们的程序全程都是用了sizeof获取运行平台上的字节数定义)

Boolean Algebra(布尔代数)

A&B: "and"
A|B: "or"
^\simA: "not"
A^B: "Xor"/"Exclusive-Or"

Representation

We can use this to represents sets of values implicitly
——subsets of {0,1,2,3,4,5,6,7}, because one byte has 8 bits
0 1 1 0 0 1 1 0 -> {6,5,2,1}
{7,6,5,4,3,2,1,0}

一个字节上的布尔代数和全集{0,1,2,3,4,5,6,7}的子集之间的集合运算之间存在同构：
& - 并
| - 交
~ - 补
^ - 对称差(symmetric difference)
This operations would be applied to file I/O
&& and & / || and I / ~ and !(bang)
&&, ||, !

view 0 as False
other values as True
always return 0(0x00) or 1(0x01)

Test a Null pointer before access: p && *p(p值为0或者p分配空指针都是返回0)

Shift Operator

Left shift is always the same, while there are 2 right shift operators:

Left shift : x << n
Fill 0s at the right side
Right shift: x >> n
Logical shift : fill with 0s
Arithmetic shift(算数位移) : fill the signal bit (MY compiler does this, but C doesn't define the rule of using which right shift)

x = 01100010
<< 1 : 11000100
Log. >> 1 : 00110001
Arith. >> 1 : 00110001

x = 11100010
<< 2 : 10001000
Log. >> 2 : 00111000
Arith. >> 2 : 11110000
- x >> n just like (int) (x / 2^n) (ignore the lowest n terms)
- specially, (int) x >> sizeof(int) \equiv 0

原码，反码与补码

原码：即通常所说的二进制代码，非unsign型左起第一位为符号位，0表示正数，1表示负数，比如5(int)的二进制原码表示为: 00 .... 0101 (一共32b)，而(int)-3的二进制原码表示为: 10 .... 0011 (一共32b)
反码：与按位取反~不同，反码分正负数，正数的反码是它本身，负数的反码是除符号位外的每一位按位取反，比如(int)5的反码表示为: 00 .... 0101 (一共32b)，-1(int)的反码表示为: 11 .... 1110 (一共32b)。而(int)5按位取反的结果为: 11 .... 1101 (一共32b)，(int)-1按位取反的结果为: 01 .... 1110 (一共32b)
补码：现代计算机语言中在内存中实际存储的二进制数均为补码，使用补码存储的最大好处是不需要设计减法器，直接用补码的加法可以同时表示整数的加法和减法。
- 其运算规则如下：正数的补码就是原码，负数的补码是反码+1比如(int)5的补码: 00 .... 0101 (一共32b)，而(int)-1的补码为11 ... 1111 (一共32b)
- 负数补码+1的目的是可以利用加法器直接求解减法。比如1-1变为1+(-1)后的补码运算是00...00001 + 11...11111，加完之后进位会导致所有位都是0（数学的证明就略了）
几个tips：
- 任何数与(int)1 (内存中存储的bi: 0000....0001)做按位与，得到最后一位的二进制值（相当于模2）
- 任何数与(int)-1 (内存中存储的bi: 1111....1111)做按位异或后+1，得到它的相反数的补码（）
- 任何数与(int)0做按位异或得到它自己
- 任何数与(int)-1做按位或得到-1
- 任何数与(int)-1做按位与得到他自己

Exercise: 给一个short型，正负未知，在仅用加减法和位运算并且ban掉if和switch语句的前提下如何写一个函数输出二进制原码

#include<iostream>
using namespace std;

const int BIT_SHORT = sizeof(short) * 8;

void binary(short m, char cptr[])
{
    short sign;  

    sign = m >> (BIT_SHORT-1);          //注意位运算的优先级后于加减，但是为了明确起见还是加了括号
    //m位正数sign为0，为负数sign为-1
    //short temp = (-1) * sign;
    //-1的补码二进制每一位都是1，若m为正数，定义的sign直接为0
    short n = (sign^m) - sign;                                        

    //若m为正数，则0和m取按位异或，仍然得到m；若m为负数，则取按位异或相当于取反码，之后再+1则变成了-m的原码。n始终为正数

cptr[BIT_SHORT] = '\0';     //加结束符，否则会读取出一些奇奇怪怪的东西出来
for(int i = 0; i<BIT_SHORT-1; ++i)
{
    cptr[BIT_SHORT-1-i] = (char)((n&1) + '0');        //0的ASCII码为48，这里做加法强制转换为ASCII码计算
    //n与1做按位与，只有最后一位为1时值为1，否则为0
    n = n>>1;               //注意，计算机中所有的数都是按补码存储的，不能直接位运算
    //注意，正数的右移运算右侧补0，而负数的右移运算右侧是补1的
}
cptr[0] = -sign + '0';       //第一位为-sign

return;
}

int main()
{
    cout << "The type \"short\" occupies " << BIT_SHORT << " bits." << endl;

short i = -10;       //为了简洁，我们使用2个字节的short型演示
short j = 5;
char cptr_bi[BIT_SHORT + 1];    //没有初始化的字符串数组内的元素是随机的，所以需要手动加上结束符\0
char cptr_bj[BIT_SHORT + 1];
binary(i, cptr_bi);
binary(j, cptr_bj);

cout << "The binary representation of " << i << " is:" << cptr_bi << endl;
//输出1000000000001010
cout << "The binary representation of " << j << " is:" << cptr_bj << endl;
//输出0000000000000101
return 0;
}

Another way to understand the two's complement(补码)

Use w bits to represent an int:
Two's complement:
B2T(x)=-x_{w-1}\cdot 2^{w-1} + \sum _{i=0}^{w-2} x_i\cdot 2^i
Where the x_is are called the two's complement. The first bit x_{w-1} is called sign bit.
We can verify that the two understandings are equivalent (a trivial mathematic problem, we omit it).
Maximum number:
0111111....111\implies 2^{w-1}-1
Minimum number:
1000000....000\implies -2^{w-1}

The two's complement representation starts from 10000...000, ending up with 01111...111. We can verify that the value increases progressively and linearly with the two's complement. —— The most important reason why the computer stores two's complement form.

Of course, such a mapping is invertible (幂级数展开的完备性，使用多项式环的理论可以证明)

Signed vs. Unsigned in C

A mix of unsigned and signed in single expression (including mutiplication and division) --> signed value int implicitly cast to unsigned unsigned int!!!
To avoid bugs, I think it's better NOT to use unnecessary unsigned type
when you put a "u" or "U" after the number, it would be cast to unsigned type
consider the following examples
-1 < 0, -1 > 0u, 2147483647u < 2147483647 -1, 2147483647 > 2147483647 -1, 2147483647 > 2147483648u, 2147483647 < (int)2147483648u
*2147483647 is T_{max}
**(int)2147483648u is cast to signed, and the sign bit is 1. The complier views its value as -2147483648
A typical bug (endless drop) :

#include<iostream>
using namespace std;

int main()
{
    unsigned int n = 10;
    int i;
    unsigned int j = 0;

//这样写不会出bug, unsigned值被强制转换为signed，左值优先级高
for(i = n;i>=0;i--)     
{
    j++;
    cout << "i = " << i << endl;
    if(j > n + 2)
    {
        cout << "bug!" << endl;
        break;
    }
}

j = 0;
i = 10;

//这样写会出Bug,而且编译器报警
for(n = i;n>=0;n--)     
{
    j++;
    cout << "n = " << n << endl;
    //输出完n = 0后会接着输出n = 4294967295

    //把unsigned和sign进行比较会报警（同为正数没有问题，但是一正一负就会出bug）
    if(j > i + 5)       
    {
        cout << "bug!" << endl;
        break;
    }
}

j = 0;
n = 10;

//这样写会出bug! 编译器报警，sizeof返回的是一个unsigned值，因而判断表达式左侧为unsigned value!
for(i = n;i - sizeof(int) >= 0;i--)     
{
    j++;
    cout << "i = " << i << endl;
    if(j > n + 2)
    {
        cout << "bug!" << endl;
        break;
    }
}

j = 0;
//这样写不会出bug，但会报警，同为正数时，signed和unsigned比较没有问题
for(i = n;i >= sizeof(int);i--)     
{
    j++;
    cout << "i = " << i << endl;
    if(j > n + 2)
    {
        cout << "bug!" << endl;
        break;
    }
}

return 0;
}

an important tip : sizeof(*) is an operator, not a function! And it returns an unsigned value!
Robert Seacord's standard

unsigned类型也不是一无是处，它可以提供有保障的越界行为（mod 2ⁿ ），而符号数越界时输出的值是没有规律的

    short l = -(pow(2,15));
    cout << l << endl;          //输出-32768

    printf("%hd\n",l-1);        //输出-32767
    cout << l-1 << endl;        //输出-32769
    

    cout << (l<<1) << endl;     //输出-65536
    printf("%hd\n",l<<1);       //输出0
    //看起来cout输出会把short的输出类型直接换成int——自适应转换输出类型

    int m = (int)(-pow(2,31));
    cout << m << endl;          //输出-2147483648
    cout << m-1 << endl;        //输出2147483647
    cout << (m<<1) << endl;     //输出0

    int k = -(int)(pow(2,31));
    cout << k << endl;          //输出-2147483647
    cout << k-1 << endl;        //输出-2147483648
    cout << (k<<1) << endl;     //输出2
    //上面这两个的区别是pow输出的是double型，类型转换引起了1的误差
    //C语言规定，符号数的越界结果等于几都有可能，不保证结果
    //unsigned的加减乘除越界行为都是mod 2^n的，因而unsign的存在意义之一就是提供了有规律的越界行为

Sign Extension

Aim:

given w-bit signed integer x
convert it to (w+k)-bit interger with the same value

Rule:

make k copies of sign bit ----- A rather simple way to expand the type!
X' = (x_{w-1},x_{w-1},...,x_{w-1},x_{w-2},...,x_0)
It is also a trivial mathematic problem

Tavior

Properties of Integer's Calculation

Addition/Substraction

Only consider the first w bits
Ignores carry output
a ring of residue classes of 2^w

two's complement addition
TAdd and UAdd have IDENTICAL bit-level behavior

int a, b, c, d;
c = (int)((unsigned)a + (unsigned)b);
d = a + b;
// c == d

negative/positive overflow (the sign bit would paticipate in the addition calculation as a normal bit)

Multiplication
- If two w-bit numbers are multipled, the result may require 2w bits to represent.
- there is also a truncate
- multiplier: multiplication in binary
  - The calculation process is similar to decimal situation
  - Unsigned number: UMul(u * v)\implies (u\cdot v) mod 2^w
  - Two's complement number: TMul(u * v)\implies U2T((u\cdot v) mod 2^w)
  - Ring keeps the multiplication —— The bit representation of the two types above keeps the same AFTER TRUNCATE (omit the proof), which means the lower bits are the same
  - A shared multiplier can be applied to both the unsigned and signed type
  - k*2 \implies k<<1
  - The complier would judge whether shift is more effective

Division

Unsigned power-of-two division
- k/2^w\implies k>>w
- The division rule in C/C++ is round down to 0 (向0舍入，而非四舍五入)
  1.5 -> 1, 1.9 -> 1, -1.1 -> -1, -1.9 -> -1
Signed power-of-two division
- Arithmetic shift
- Before do the division of minus number, we need to add a BIAS
  1010 (-6) -> "+1" -> 1011 (-5) -> ">>1" -> 1101 (-3) -> "+1" -> 1110 (-2) -> ">>1" -> 1111 (-1)
- Division is much slower than other operations

Opposite number

The easiest way: get the one's complement (反码) then plus 1
Some counter examples:
- x > y ≠ -x <-y (Let y = Tmin, then -y = Tmin, too!)
- x >= 0 = -x <= 0
- x <= 0 ≠ -x >= 0 (Only Tmin)
- (x|-x) >> 31 == -1, x≠0

Byte-Oriented Memory Organization

The operating system only allows the program to use certain regions. Segmentation faults would take place when you try to acess other regions.
Word Size: how big a pointer is in this language
- 64-bit machine: pointers has 64 bits
- The code can be specified to 64/32-bit with GCC
听不懂什么叫big-endian和little-indian，以及ARM处理器是什么。。。

Byte Ordering (example)

Variable x = 0x01234567 (0x为16进制的前缀，每个16进制占4个bits，因而x是一个4字节的变量)
Address given by &x = 0x100
- Big-endian: write the words from smallest address up to the highest (natural)
  | 0x100 | 0x101 | 0x102 | 0x103 |
  | --01-- | --23-- | --45-- | --67-- |
- Small-endian: put the least significant byte into the first byte (Modern)
  | 0x100 | 0x101 | 0x102 | 0x103 |
  | --67-- | --45-- | --23-- | --01-- |

Representation of Strings

ASCII is a little bit out-of-date (but C only support it)
Strings