计算机体系结构
- 期末 40%
- 作业 6%
- 展示 6%
- 实验 48%
- Forwarding+Pipeline - 8%
- Interrupt exception - 8%
- Branch prediction - 8%
- Cache design - 10%
- Out-of-order execution - 14%
重点
- 存储层级
- 乱序算法
Chapter 1: Overview
1.1 Introduction
冯诺伊曼结构
Big Men in Architecture
- Mater Valero - 乱序、并行、大规模处理器
- John Leroy Hennessy - Mips、Risc-I
- David Andrew Patterson - RISC-V
- Brooks
- ……
计算机分类
- 桌面计算机
- 服务器
- 嵌入式
- 个人智能设备
- 超级计算机
Flynn 分类
- SISD - 单指令流,单数据流
- SIMD - 单指令,多数据
- MISD - 多、单
- MIMD - 多、多
理解性能
- 算法 - 决定操作数量
- 编程语言、编译器、架构 - 决定每条操作会转换成几条机器指令
- 处理器和内存系统 - 决定指令执行快慢
- I/O 系统(包括 OS)- 决定 I/O 操作多快
According to the process of using data, computers are developing in three fields:
- Speed up processing (parallel)
- Speed up transmission (accuracy)
- Increase storage capacity and speed up storage (reliability)
The central issue discussed and studied in this course is
- Processing
- Storage
- transmission
1.2 Performance
性能指标
- Latency (Response Time) - 执行一个任务需要多久
- Throughput (bandwidth) - 单位时间内能做多少工作
定义
“X is n time faster than Y” - X's Performance/Y's Performance = n
衡量执行时间
- Elapsed time - Total response time, including all aspects (Processing, I/O, OS overhead, idle time)
- CPU time - Time spent processing a given job
- User CPU time : CPU time spent in the program itself
- System CPU time: CPU time spent in the OS, performing tasks on behalf of the program
改进 architecture 的目的 - 提高 performance
1.3 Technology trend
- 硬件性能越来越好,越来越便宜
A major turn in computer architecture
- From instruction level parallelism to development thread level parallelism and data level parallelism.
- The computer architecture plays an important role in the development of computer.
1.4 Quantitative approaches
CPU performance formula
Average cycles per instruction(CPI)
多核不太好衡量
Amdahl's Law - make the common case fast
1.5 GREAT ARCHITECTURE IDEAS
- 摩尔定律
- 抽象设计
- common case fast
- 并行
- 流水线
- 预测
- 存储层级
- 冗余增加可靠性
1.6 ISA
Instruction Set Design Issues
- Where are operands stored?
- registers, memory, stack, accumulator
- How many explicit operands are there? (Classification of ISAs)
- How is the operand location specified? (Addressing Modes)
- register, immediate, indirect, . . .
- What type & size of operands are supported? (Data Representation)
- byte, int, float, double, string, vector. . .
- What operations are supported? (Types of Instructions)
- add, sub, mul, move, compare . . .
Instruction Set 设计原则:
- Compatibility - 兼容性
- Versatility - 有多种功能
- High efficiency
- Security
The Four ISA Design Principles
- 简单、规则
- smaller is faster - 用小的 reg 而不是主存
- make the common case fast
- 兼容 - e.g. a long jump(beyond a small constant)
GPR Classification
寄存器按照用途分类
Chapter 2: Pipeline
2.1 What is pipelining
如何提速
- 缩短每条指令的执行时间
- 减少一段程序的执行时间 - pipeline
pipeline
- 定义 - 在一条指令执行完之前,另一条指令就开始执行
- 核心 - 重叠执行(overlap)
- 适用情况 - 大量无关指令(如向量积)
- pass time 和 empty time 很难避免
- Pass time: the time for the first task from beginning (entering the pipelining) to ending.
- Empty time: the time for the last task from entering the pipelining to having the result
- longest section - the bottleneck
- 每个阶段需要中间寄存器 (latch)
三种执行模式
- Sequential execution
- Single overlapping execution - 𝑇 =(2𝑛+1)∆t
- Twice overlapping execution - 𝑇 =(𝑛+2)∆t
2.2 流水线的分类
- 单功能 - only one fixed function pipelining.
- 多功能 - 可以按不同的方式连接
- 静态 - 同一时间只能执行一种功能
- 动态 - 可以同时执行不同功能
- 粒度分类
- Component level pipelining (in component- operation pipelining): 逻辑运算和算数运算分成了不同的段,因此可以执行多种运算
- Processor level pipelining (inter component- instruction pipelining): 每条指令被分成不同的子进程,由不同的 function unit 执行
- Inter processor pipelining (inter processor- macro pipelining): 多个处理器完成同一 task
- Linear & Nonlinear - 有无 feedback loop
- 顺序和乱序
- Scalar & Vector
2.3 Performance evaluation of pipelining
- Throughput(TP) = 指令条数/总时间
\(T = (m+n-1)\times \Delta t_0\)
\(TP = n/((m+n-1)\times \Delta t_0)\)
\(if\ n>>m,\ TP_{max} = 1/\Delta t_0\)
否则 \(TP = \frac{n}{m+n-1}TP_{max}\)
2.4 Hazards of Pipelining
Dependence
- Data Dependence - 需要前面指令的执行结果
- 导致 RAW 冲突 - 一条指令的源寄存器来自于它前面的某条指令计算的结果
- Name Dependences
- Anti-dependence - 一条指令的目的寄存器和它前面的某条指令的源寄存器是一样的,后一条指令的写可能先于前一条指令的读(WAR)
- Output-dependence - 两条指令都将结果写到同一个目的寄存器中,前一条指令的写可能覆盖了后一条的写(WAW)
- Control Dependences
- 造成 control hazard
Hazard - 阻碍下一条指令在下一周期执行的情况
都能用 stall 解决
- structure hazards - A required resource is busy
- 需要同时使用 Memory 获取指令和 data
- 分开两个 memory/cache
- data hazard - Need to wait for previous instruction to complete its data read/write
- 需要前面指令的执行结果
- 解决方法:前递
- load-stall 无法避免
- Code Scheduling to Avoid Stalls:
- 需要前面指令的执行结果
- control hazard - Deciding on control action depends on previous instruction
- 需要一次 stall
- 如果需要前面指令的结果作为比较操作数,跳转指令前需要 stall 一次/两次(load)
- 需要一次 stall
2.5 Dynamic Branch Prediction
- Branch History Table (BHT)
- Branch-Target Buffer (BTB) - Cache of target addresses
2.6 Schedule of Nonlinear pipelining
没懂,考吗?
Initial conflict vector
Conflict vector
State transition graph
Circular queue
Shortest average interval