汇编与接口

参考资料 - Intel 指令集手册

MASM 汇编

Introduction

不同的编译选项，结果可能不同

示例代码

int silly(int a){
    return (a+1)>a;
}

未定义行为

-O0 - 无优化
-O1 - 将 a 优化掉，直接 return 1
当 a = MAX_INT 时，分别返回 0 和 1

不同的编译器，编译结果不同

int Sum1ToN(int n) { 
int sum = 0; 
for (int i = 0; i < n; i++) { 
sum += i; 
} 
return sum; 
}

gcc -O3 - 向量展开
icc -O3 - 循环展开，流水线并行
clang -O3 - 把循环求和转换为高斯公式（idiom recognition）

汇编语言的用处

分析最终代码问题
极端优化

接口的用处

communication
Control

Chap.1-2 The microprocessor and its architecture

1–1 A HISTORICAL BACKGROUND

计算机发展史上的多个“第一”

第一台可编程电脑 ENIAC - 1946

first bug - 1947

first develop a system to accept instructions and store them in memory - John von Neumann

0/1 机器码 -> 汇编语言（能被人类读懂、使用）

first high-level programming language - FLOWMATIC（1957 by Grace Hopper）

First successful, widespread programming language for business applications - COBOL

第一个微型处理器 - Intel 4004（1971）, 4-bit-wide memory locations

Modern Microprocessor

8086 and 8088 - IA-32 (Intel Architecture 32) family, 16-bit 位宽, 20-bit寻址
- segmentation
Intel 286 - protect mode, 16-bit
- Four privilege levels
- Segment limit checking
- Read-only and execute-only segment options
Intel 386 - 流水线、分页、虚拟 8086, 32-bit
- 低 16 位仍可作为上一代程序的寄存器，保证可移植性
Intel 486 - L1 cache、FPU (floating point unit)
Pentium - 两条流水线实现 superscalar、MESI protocol（supprot write back cache）、Branch prediction
MMX technology、SSE 指令集、NetBurst 微型架构、多核、Out-of-order Execution

Hyper-Threading Technology

超线程技术

多个逻辑核，一个 stall 后，另一个启动

多核

alt text

Intel 64（64 位 = IA-32e）

linear address space for software to 64 bits
physical address space up to 52 bits

IA-32e mode

Compatibility mode - 运行 32 位程序 unmodifiedly
64-bit mode - 运行 access 64-bit address space 的程序

1–2 COMPUTER DATA FORMATS

ASCII 码

7-bit code，第八位用于 hold parity（奇偶校验，most significant）
printer - most significant bits are 0 for alphanumeric printing; 1 for graphics
PC - extended ASCII character set is selected by placing 1 in the leftmost bit（包含一些外国字母/数学符号等）

Unicode

16-bit
0000H–00FFH - same as ASCII
其他用于各种国家的字符

BCD 码

digit extends from 0000 to 1001, for 0–9 decimal
用于计算简单、面向用户的程序
packed BCD form & unpacked BCD form（ returned from a keypad or keyboard）

alt text

Byte-Sized Data - unsigned and signed integers（8-bit 2 进制和 2's 数）

Word-Sized Data（16-bit）

little-endian format - least significant byte always stored in the lowest-numbered memory location
- Intel
big-endian format - Numbers are stored with the lowest location containing the most significant data
- Motorola family

Doubleword-Sized Data（32-bit）

product after a multiplication
dividend before a division
Define using the assembler directive define doubleword(s), or DD

Real Numbers - 浮点数

4-byte - 单精度
8-byte - 双精度
偏移量分别是 0x7F 和 0x3FF
特殊值
- zero - 除符号位全为 0
- Infinity - 指数全为 1，fraction 全为 0
- NAN - 指数全为 1，fraction 不全为 0

alt text

subnormal number - 超越浮点数精度的小数
subnormal = \((-1)^{sign}*fraction*2^{1-bias}\)（保证连续性）
性能会极度下降

浮点数加法：

对齐 -> 求和 -> round（向精度小的约）
会损失精度
Herbie rewrites floating point expressions to make floating point arithmetic more accurate

为 AI 训练产生了更多浮点数标准

2–1 INTERNAL MICROPROCESSOR ARCHITECTURE

8086 只有指令指定的寄存器是可见的

IA-32 寄存器概览

General-Purpose Registers

RAX

64-bit RAX，32-bit EAX，16-bit AX，8-bit AH/AL
partial register - 小的寄存器是大的寄存器的一部分（兼容不同架构）
- 比如 RAX(64 位) 高 32 位是 EAX
EAX 的 ADD 指令比别的寄存器短 1 B - higher code density and is more cache-friendly
用小的寄存器，指令会变短，可以放更多条，提升性能

RBX - addressable as RBX, EBX, BX, BH, BL

(base index) holds offset address

RCX - (count) holds the count for various instructions

RDX - (data) holds a part of the result from a multiplication or part of dividend before a division

接下来的寄存器没有 H/L 的寻址模式

RBP - as RBP, EBP, or BP (base pointer)

RDI - (destination index) address string destination data for the string instructions

RSI - (source index) addresses source string data for the string instructions

一个特殊性质

MOV RAX

MOV EAX - 高位自动清零（兼容保护）

MOV 更小的部分寄存器 - 不会清零

有些架构 unable to rename a partial register（用于乱序）

因此避免使用 partial register

Spetial Register

segment registers include CS, DS, ES, SS, FS, and GS

RIP - (instruction pointer) addresses the next instruction in a section of memory

RSP - (stack pointer) addresses an area of memory called the stack

RFLAG

alt text

Status Flags

condition & control operation
C 位 - carry after addition or borrow after subtraction
Z 位 - the result of an arithmetic or logic operation is zero
S 位 - arithmetic sign of the result after an arithmetic or logic instruction executes
O 位 - Overflow (occurs when signed numbers are added or subtracted)

alt text

P 位 - the least-significant byte 有偶数个 1
A 位 - 记录 BCD 码个位到十位的进位，支持 BCD 码直接计算
D 位 - selects increment or decrement mode for the DI and/or SI registers，字符串比对是低位还是高位

alt text

System Flags

T (trap) - The trap flag enables trapping through an on-chip debugging feature.
I (interrupt) - controls operation of the INTR (interrupt request) input pin.
VM (virtual mode) flag bit selects virtual mode operation in a protected mode system

其他不重要，应该

Segment Registers

CS (code)

DS (data)

ES (extra) - used by some instructions to hold destination data

SS (stack)

FS and GS

two additional memory
for OS-related functionality

64-bit mode - ignore DS, ES, and SS ( assumes they have a base address of 0)

alt text

System Registers

Control Register - control system operation

Discripter-Table Register - hold segmentation data structures used in protected mode

Task-Register - hold state information for a given task

2–2 Modes of Operation

Long mode & 64-bit mode

Overview (from AMD64)

alt text

Long mode

Intel calls IA-32e ("e" for "extensions"), is an extension of legacy protected mode.
2 submodes
- 64-bit mode - 支持 64 位架构
- compatibility mode - 二进制兼容 16/32 位应用
长模式不支持 legacy real mode or legacy virtual-8086 mode

Legacy mode consists of three submodes

Protected mode
- 支持 16/32 位程序
- 有 memory segmentation, optional paging, and privilege-checking
- 程序最多能访问 4GB 内存
Virtual-8086 mode
- 16 位实模式程序
- 最多访问 1MB
Real mode
- 16 位程序
- Segment
- 1 MB 内存

System management mode (SMM)

an operating mode designed for system-control activities
for platform firmware & 底层 device drivers

运行模式切换

有点难啊这个，感觉记不住

alt text

Memory Management Requirements

Relocation - 代码无论放哪都能运行
Protection - 程序内存相互隔离
Sharing - 多个程序访问同一段内存

分段和分页的内存管理方式

Segmentation - 每段大小可能不同
- 每段有自己的基地址和 limit
- external fragmentation
Paging - 每页大小固定（一般 4kb）
- internal fragmentation
- 转换复杂，比分段慢

alt text

2–3 REAL MODE MEMORY ADDRESSING

实模式运算只支持前 1MB 的内存地址
前 1MB 又被叫做 real memory, conventional memory, or DOS memory system

Segments and Offsets

segment address - 一个 64KB 段的起始地址
offset address (Effective Address) - 64KB 段内的偏移地址
Linear address - segment adress（16 位）左移 4 位加上 offset address（16 位）

线性地址 -开启分页-> 虚拟地址

一个有四个段的程序

地址回滚问题

不太懂，应该不是重点

alt text

2–4 INTRO TO PROTECTED MODE MEMORY ADDRESSING

Segment（粗粒度）

保护模式

选择子 - 存在段寄存器中，用来在描述符表中找到描述符
描述符 - 包含段的 base address, limit, and access rights
- 全局描述符 - 规定第一个是空描述符（用于 unused 段的初始化）
- 局部描述符
- 门描述符 - 提供代码选择子和 entry point
描述符表
- 全局（系统）描述符表（GDT）- holds descriptors available to all programs（必须）
- 局部（应用）描述符表（LDT）- holds descriptors used by a single program（可选）
- interrupt-descriptor table - 存储门描述符（必须）

alt text

保护模式下，能跑多少个线程？

Solution: GDT 64KB，一个条目 8B，一个程序至少需要两个条目（Data 和 Code 段）

因此最多 64K/16 = 4K = 4096 个线程

alt text

地址访问越界，a general protection exception (#GP) occurs

G 位，granularity bit

G = 0 - ending = starting + Limit B
G = 1 - ending = starting + (Limit)*4K B

描述符的访问权限

alt text

通过门描述符，实现越权访问

0 是最高权限

DPL

保护模式的三个安全级别

DPL - 操作系统（客体）
RPL - 程序（主体）
CPL - CPU（主体）

检查

alt text

Memory Addressing

选择子（索引）-> 描述符（全局/局部，八个字节）-> 安全检测 -> 基地址+偏移地址（线性地址）

Invisible Register

GDTR，IDTR - 存储 GDT 和 IDT 的起始地址，在模式转换时设定
LDTR - 通过 GDTR 设置
- 切换 task 时切换 LDTR 保证内存隔离

Protected Mode Segmented-Memory Models

Multi-segmented memory model
- 每个段分开，有各自的权限和空间
Flat-memory model

2-5 Memory Paging

细粒度

四种内存地址类型及其转换关系

effective addresses, or segment offsets = Base + (Scale*Index) + Displacement（保护模式）
Logical addresses = Segment Selector : Offset
Linear (virtual) addresses = Segment Base Address + Effective Address
Physical addresses = linear address work through page table

alt text

放一放，感觉四门课都讲过这个，不能再熟了

multi-level paging method

保证每级的索引都能放到一个 4KB 的页中

12 bits for page offset (一页 4KB)
52 bits for PTE
每个 PTE 4B - 一页能放 2^10 个 PTE
52 / 10 = 6 级

如何满足内存管理的需求

Relocation - 应用程序的地址可以放在任何物理地址
Protection -
Sharing - 不同的虚拟地址可以映射到同一物理地址

Page Size Extension

有 4MB 的大页
标志位（CR4.PSE）为 1 时，将地址解释为大页（低 22 位都是 offset）

Physical Address Extensions(PAE)

Self-Referencing - PTE 指向自己而不是二级页表（不考

Total Meltdown