汇编与接口
参考资料 - Intel 指令集手册
MASM 汇编
Introduction
不同的编译选项,结果可能不同
未定义行为
- -O0 - 无优化
- -O1 - 将 a 优化掉,直接 return 1
- 当 a = MAX_INT 时,分别返回 0 和 1
不同的编译器,编译结果不同
- gcc -O3 - 向量展开
- icc -O3 - 循环展开,流水线并行
- clang -O3 - 把循环求和转换为高斯公式(idiom recognition)
汇编语言的用处
- 分析最终代码问题
- 极端优化
接口的用处
- communication
- Control
Chap.1-2 The microprocessor and its architecture
1–1 A HISTORICAL BACKGROUND
计算机发展史上的多个“第一”
第一台可编程电脑 ENIAC - 1946
first bug - 1947
first develop a system to accept instructions and store them in memory - John von Neumann
0/1 机器码 -> 汇编语言(能被人类读懂、使用)
first high-level programming language - FLOWMATIC(1957 by Grace Hopper)
First successful, widespread programming language for business applications - COBOL
第一个微型处理器 - Intel 4004(1971), 4-bit-wide memory locations
Modern Microprocessor
- 8086 and 8088 - IA-32 (Intel Architecture 32) family, 16-bit 位宽, 20-bit寻址
- segmentation
- Intel 286 - protect mode, 16-bit
- Four privilege levels
- Segment limit checking
- Read-only and execute-only segment options
- Intel 386 - 流水线、分页、虚拟 8086, 32-bit
- 低 16 位仍可作为上一代程序的寄存器,保证可移植性
- Intel 486 - L1 cache、FPU (floating point unit)
- Pentium - 两条流水线实现 superscalar、MESI protocol(supprot write back cache)、Branch prediction
- MMX technology、SSE 指令集、NetBurst 微型架构、多核、Out-of-order Execution
Hyper-Threading Technology
超线程技术
- 多个逻辑核,一个 stall 后,另一个启动
多核
Intel 64(64 位 = IA-32e)
- linear address space for software to 64 bits
- physical address space up to 52 bits
IA-32e mode
- Compatibility mode - 运行 32 位 程序 unmodifiedly
- 64-bit mode - 运行 access 64-bit address space 的程序
1–2 COMPUTER DATA FORMATS
ASCII 码
- 7-bit code,第八位用于 hold parity(奇偶校验,most significant)
- printer - most significant bits are 0 for alphanumeric printing; 1 for graphics
- PC - extended ASCII character set is selected by placing 1 in the leftmost bit(包含一些外国字母/数学符号等)
Unicode
- 16-bit
- 0000H–00FFH - same as ASCII
- 其他用于各种国家的字符
BCD 码
- digit extends from 0000 to 1001, for 0–9 decimal
- 用于计算简单、面向用户的程序
- packed BCD form & unpacked BCD form( returned from a keypad or keyboard)
Byte-Sized Data - unsigned and signed integers(8-bit 2 进制和 2's 数)
Word-Sized Data(16-bit)
- little-endian format - least significant byte always stored in the lowest-numbered memory location
- Intel
- big-endian format - Numbers are stored with the lowest location containing the most significant data
- Motorola family
Doubleword-Sized Data(32-bit)
- product after a multiplication
- dividend before a division
- Define using the assembler directive define doubleword(s), or DD
Real Numbers - 浮点数
- 4-byte - 单精度
- 8-byte - 双精度
- 偏移量分别是 0x7F 和 0x3FF
- 特殊值
- zero - 除符号位全为 0
- Infinity - 指数全为 1,fraction 全为 0
- NAN - 指数全为 1,fraction 不全为 0
- subnormal number - 超越浮点数精度的小数
- subnormal = \((-1)^{sign}*fraction*2^{1-bias}\)(保证连续性)
- 性能会极度下降
浮点数加法:
- 对齐 -> 求和 -> round(向精度小的约)
- 会损失精度
- Herbie rewrites floating point expressions to make floating point arithmetic more accurate
为 AI 训练产生了更多浮点数标准
2–1 INTERNAL MICROPROCESSOR ARCHITECTURE
8086 只有指令指定的寄存器是可见的
IA-32 寄存器概览
General-Purpose Registers
RAX
- 64-bit RAX,32-bit EAX,16-bit AX,8-bit AH/AL
- partial register - 小的寄存器是大的寄存器的一部分(兼容不同架构)
- 比如 RAX(64 位) 高 32 位是 EAX
- 比如 RAX(64 位) 高 32 位是 EAX
- EAX 的 ADD 指令比别的寄存器短 1 B - higher code density and is more cache-friendly
- 用小的寄存器,指令会变短,可以放更多条,提升性能
RBX - addressable as RBX, EBX, BX, BH, BL
- (base index) holds offset address
RCX - (count) holds the count for various instructions
RDX - (data) holds a part of the result from a multiplication or part of dividend before a division
接下来的寄存器没有 H/L 的寻址模式
RBP - as RBP, EBP, or BP (base pointer)
RDI - (destination index) address string destination data for the string instructions
RSI - (source index) addresses source string data for the string instructions
一个特殊性质
MOV RAX
MOV EAX - 高位自动清零(兼容保护)
MOV 更小的部分寄存器 - 不会清零
有些架构 unable to rename a partial register(用于乱序)
因此避免使用 partial register
Spetial Register
segment registers include CS, DS, ES, SS, FS, and GS
RIP - (instruction pointer) addresses the next instruction in a section of memory
RSP - (stack pointer) addresses an area of memory called the stack
RFLAG
Status Flags
- condition & control operation
- C 位 - carry after addition or borrow after subtraction
- Z 位 - the result of an arithmetic or logic operation is zero
- S 位 - arithmetic sign of the result after an arithmetic or logic instruction executes
- O 位 - Overflow (occurs when signed numbers are added or subtracted)
- P 位 - the least-significant byte 有偶数个 1
- A 位 - 记录 BCD 码个位到十位的进位,支持 BCD 码直接计算
- D 位 - selects increment or decrement mode for the DI and/or SI registers,字符串比对是低位还是高位
System Flags
- T (trap) - The trap flag enables trapping through an on-chip debugging feature.
- I (interrupt) - controls operation of the INTR (interrupt request) input pin.
- VM (virtual mode) flag bit selects virtual mode operation in a protected mode system
其他不重要,应该
Segment Registers
CS (code)
DS (data)
ES (extra) - used by some instructions to hold destination data
SS (stack)
FS and GS
- two additional memory
- for OS-related functionality
64-bit mode - ignore DS, ES, and SS ( assumes they have a base address of 0)
System Registers
Control Register - control system operation
Discripter-Table Register - hold segmentation data structures used in protected mode
Task-Register - hold state information for a given task
2–2 Modes of Operation
Long mode & 64-bit mode
Overview (from AMD64)
Long mode
- Intel calls IA-32e ("e" for "extensions"), is an extension of legacy protected mode.
- 2 submodes
- 64-bit mode - 支持 64 位架构
- compatibility mode - 二进制兼容 16/32 位应用
- 长模式不支持 legacy real mode or legacy virtual-8086 mode
Legacy mode consists of three submodes
- Protected mode
- 支持 16/32 位程序
- 有 memory segmentation, optional paging, and privilege-checking
- 程序最多能访问 4GB 内存
- Virtual-8086 mode
- 16 位实模式程序
- 最多访问 1MB
- Real mode
- 16 位程序
- Segment
- 1 MB 内存
System management mode (SMM)
- an operating mode designed for system-control activities
- for platform firmware & 底层 device drivers
运行模式切换
有点难啊这个,感觉记不住
Memory Management Requirements
- Relocation - 代码无论放哪都能运行
- Protection - 程序内存相互隔离
- Sharing - 多个程序访问同一段内存
分段和分页的内存管理方式
- Segmentation - 每段大小可能不同
- 每段有自己的基地址和 limit
- external fragmentation
- 每段有自己的基地址和 limit
- Paging - 每页大小固定(一般 4kb)
- internal fragmentation
- 转换复杂,比分段慢
2–3 REAL MODE MEMORY ADDRESSING
- 实模式运算只支持前 1MB 的内存地址
- 前 1MB 又被叫做 real memory, conventional memory, or DOS memory system
Segments and Offsets
- segment address - 一个 64KB 段的起始地址
- offset address (Effective Address) - 64KB 段内的偏移地址
- Linear address - segment adress(16 位)左移 4 位加上 offset address(16 位)
线性地址 -开启分页-> 虚拟地址
一个有四个段的程序
地址回滚问题
不太懂,应该不是重点
2–4 INTRO TO PROTECTED MODE MEMORY ADDRESSING
Segment(粗粒度)
保护模式
- 选择子 - 存在段寄存器中,用来在描述符表中找到描述符
- 描述符 - 包含段的 base address, limit, and access rights
- 全局描述符 - 规定第一个是空描述符(用于 unused 段的初始化)
- 局部描述符
- 门描述符 - 提供代码选择子和 entry point
- 描述符表
- 全局(系统)描述符表(GDT)- holds descriptors available to all programs(必须)
- 局部(应用)描述符表(LDT)- holds descriptors used by a single program(可选)
- interrupt-descriptor table - 存储门描述符(必须)
保护模式下,能跑多少个线程?
Solution: GDT 64KB,一个条目 8B,一个程序至少需要两个条目(Data 和 Code 段)
因此最多 64K/16 = 4K = 4096 个线程
地址访问越界,a general protection exception (#GP) occurs
G 位,granularity bit
- G = 0 - ending = starting + Limit B
- G = 1 - ending = starting + (Limit)*4K B
描述符的访问权限
通过门描述符,实现越权访问
0 是最高权限
- DPL
保护模式的三个安全级别
- DPL - 操作系统(客体)
- RPL - 程序(主体)
- CPL - CPU(主体)
检查
Memory Addressing
- 选择子(索引)-> 描述符(全局/局部,八个字节)-> 安全检测 -> 基地址+偏移地址(线性地址)
Invisible Register
- GDTR,IDTR - 存储 GDT 和 IDT 的起始地址,在模式转换时设定
- LDTR - 通过 GDTR 设置
- 切换 task 时切换 LDTR 保证内存隔离
Protected Mode Segmented-Memory Models
- Multi-segmented memory model
- 每个段分开,有各自的权限和空间
- 每个段分开,有各自的权限和空间
- Flat-memory model
2-5 Memory Paging
细粒度
四种内存地址类型及其转换关系
- effective addresses, or segment offsets = Base + (Scale*Index) + Displacement(保护模式)
- Logical addresses = Segment Selector : Offset
- Linear (virtual) addresses = Segment Base Address + Effective Address
- Physical addresses = linear address work through page table
放一放,感觉四门课都讲过这个,不能再熟了
multi-level paging method
保证每级的索引都能放到一个 4KB 的页中
- 12 bits for page offset (一页 4KB)
- 52 bits for PTE
- 每个 PTE 4B - 一页能放 2^10 个 PTE
- 52 / 10 = 6 级
如何满足内存管理的需求
- Relocation - 应用程序的地址可以放在任何物理地址
- Protection -
- Sharing - 不同的虚拟地址可以映射到同一物理地址
Page Size Extension
- 有 4MB 的大页
- 标志位(CR4.PSE)为 1 时,将地址解释为大页(低 22 位都是 offset)
Physical Address Extensions(PAE)
Self-Referencing - PTE 指向自己而不是二级页表(不考
Total Meltdown