Skip to content

计算机体系结构

  • 期末 40%
  • 作业 6%
  • 展示 6%
  • 实验 48%
    • Forwarding+Pipeline - 8%
    • Interrupt exception - 8%
    • Branch prediction - 8%
    • Cache design - 10%
    • Out-of-order execution - 14%

重点

  • 存储层级
  • 乱序算法

Chapter 1: Overview

1.1 Introduction

冯诺伊曼结构

alt text

Big Men in Architecture

  • Mater Valero - 乱序、并行、大规模处理器
  • John Leroy Hennessy - Mips、Risc-I
  • David Andrew Patterson - RISC-V
  • Brooks
  • ……

计算机分类

  • 桌面计算机
  • 服务器
  • 嵌入式
  • 个人智能设备
  • 超级计算机

Flynn 分类

  • SISD - 单指令流,单数据流
  • SIMD - 单指令,多数据
  • MISD - 多、单
  • MIMD - 多、多

alt text


理解性能

  • 算法 - 决定操作数量
  • 编程语言、编译器、架构 - 决定每条操作会转换成几条机器指令
  • 处理器和内存系统 - 决定指令执行快慢
  • I/O 系统(包括 OS)- 决定 I/O 操作多快

According to the process of using data, computers are developing in three fields:

  • Speed up processing (parallel)
  • Speed up transmission (accuracy)
  • Increase storage capacity and speed up storage (reliability)

The central issue discussed and studied in this course is

  • Processing
  • Storage
  • transmission

1.2 Performance

性能指标

  • Latency (Response Time) - 执行一个任务需要多久
  • Throughput (bandwidth) - 单位时间内能做多少工作

定义

\[ Performance = \frac{1}{Execution Time} \]

“X is n time faster than Y” - X's Performance/Y's Performance = n

衡量执行时间

  • Elapsed time - Total response time, including all aspects (Processing, I/O, OS overhead, idle time)
  • CPU time - Time spent processing a given job
    • User CPU time : CPU time spent in the program itself
    • System CPU time: CPU time spent in the OS, performing tasks on behalf of the program

改进 architecture 的目的 - 提高 performance


1.3 Technology trend

  • 硬件性能越来越好,越来越便宜

A major turn in computer architecture

  • From instruction level parallelism to development thread level parallelism and data level parallelism.
  • The computer architecture plays an important role in the development of computer.

1.4 Quantitative approaches

CPU performance formula

\[ CPU\ Execution\ Time=CPU\ Clock\ Cycles\times Clock\ Period=\frac{CPU\ Clock\ Cycles}{Clock\ Rate} \]

Average cycles per instruction(CPI)

\[ CPI=\frac{CPU\ Clock\ Cycles}{Instruction\ Count} \]

多核不太好衡量

Amdahl's Law - make the common case fast

\[ T_{improve}=\frac{T_{affected}}{improvement\ factor}+T_{unaffected} \]

1.5 GREAT ARCHITECTURE IDEAS

  • 摩尔定律
  • 抽象设计
  • common case fast
  • 并行
  • 流水线
  • 预测
  • 存储层级
  • 冗余增加可靠性

1.6 ISA

Instruction Set Design Issues

  • Where are operands stored?
    • registers, memory, stack, accumulator
  • How many explicit operands are there? (Classification of ISAs)
  • How is the operand location specified? (Addressing Modes)
    • register, immediate, indirect, . . .
  • What type & size of operands are supported? (Data Representation)
    • byte, int, float, double, string, vector. . .
  • What operations are supported? (Types of Instructions)
    • add, sub, mul, move, compare . . .

Instruction Set 设计原则:

  • Compatibility - 兼容性
  • Versatility - 有多种功能
  • High efficiency
  • Security

The Four ISA Design Principles

  • 简单、规则
  • smaller is faster - 用小的 reg 而不是主存
  • make the common case fast
  • 兼容 - e.g. a long jump(beyond a small constant)

GPR Classification

寄存器按照用途分类

alt text

Chapter 2: Pipeline

2.1 What is pipelining

如何提速

  • 缩短每条指令的执行时间
  • 减少一段程序的执行时间 - pipeline

pipeline

  • 定义 - 在一条指令执行完之前,另一条指令就开始执行
  • 核心 - 重叠执行(overlap)
  • 适用情况 - 大量无关指令(如向量积)
  • pass time 和 empty time 很难避免
    • Pass time: the time for the first task from beginning (entering the pipelining) to ending.
    • Empty time: the time for the last task from entering the pipelining to having the result
  • longest section - the bottleneck
  • 每个阶段需要中间寄存器 (latch)

三种执行模式

  • Sequential execution
  • Single overlapping execution - 𝑇 =(2𝑛+1)∆t alt text
  • Twice overlapping execution - 𝑇 =(𝑛+2)∆t alt text

2.2 流水线的分类

  • 单功能 - only one fixed function pipelining.
  • 多功能 - 可以按不同的方式连接

alt text

  • 静态 - 同一时间只能执行一种功能
  • 动态 - 可以同时执行不同功能

alt text

  • 粒度分类
    • Component level pipelining (in component- operation pipelining): 逻辑运算和算数运算分成了不同的段,因此可以执行多种运算
    • Processor level pipelining (inter component- instruction pipelining): 每条指令被分成不同的子进程,由不同的 function unit 执行
    • Inter processor pipelining (inter processor- macro pipelining): 多个处理器完成同一 task
  • Linear & Nonlinear - 有无 feedback loop
  • 顺序和乱序
  • Scalar & Vector

2.3 Performance evaluation of pipelining

  • Throughput(TP) = 指令条数/总时间

\(T = (m+n-1)\times \Delta t_0\)

\(TP = n/((m+n-1)\times \Delta t_0)\)

\(if\ n>>m,\ TP_{max} = 1/\Delta t_0\)

否则 \(TP = \frac{n}{m+n-1}TP_{max}\)


2.4 Hazards of Pipelining

Dependence

  • Data Dependence - 需要前面指令的执行结果
    • 导致 RAW 冲突 - 一条指令的源寄存器来自于它前面的某条指令计算的结果
  • Name Dependences alt text
    • Anti-dependence - 一条指令的目的寄存器和它前面的某条指令的源寄存器是一样的,后一条指令的写可能先于前一条指令的读(WAR)
    • Output-dependence - 两条指令都将结果写到同一个目的寄存器中,前一条指令的写可能覆盖了后一条的写(WAW)
  • Control Dependences alt text
    • 造成 control hazard

Hazard - 阻碍下一条指令在下一周期执行的情况

都能用 stall 解决

  • structure hazards - A required resource is busy
    • 需要同时使用 Memory 获取指令和 data
    • 分开两个 memory/cache
  • data hazard - Need to wait for previous instruction to complete its data read/write
    • 需要前面指令的执行结果 alt text
    • 解决方法:前递
    • load-stall 无法避免
    • Code Scheduling to Avoid Stalls: alt text
  • control hazard - Deciding on control action depends on previous instruction
    • 需要一次 stall alt text
    • 如果需要前面指令的结果作为比较操作数,跳转指令前需要 stall 一次/两次(load) alt text

2.5 Dynamic Branch Prediction

  • Branch History Table (BHT) alt text
  • Branch-Target Buffer (BTB) - Cache of target addresses

2.6 Schedule of Nonlinear pipelining

没懂,考吗?

Initial conflict vector

Conflict vector

State transition graph

Circular queue

Shortest average interval