Chapter3 Memory Hierarchy
By taking advantage of the principle of locality:
- Present the user with as much memory as is available in the cheapest technology.
- Provide access at the speed offered by the fastest technology.
3.1 Introduction
Cache
计组学的还行,放一放
Definition: Cache - a safe place for hiding or storing things.(一般是 SRAM)
- 地址离开 CPU 遇到的第一层存储
- 通过缓存重复利用经常出现的条目
- Cache hit/miss - When the processor can/cannot find a requested data item in the cache
- Block/Line Run - 固定大小的,从 memory 拿到 cache 的数据块
- Cache Locality
- Temporal locality - need the requested word again soon
- Spatial locality - likely need other data in the block soon
- Cache Miss Time 取决于
- Latency: the time to retrieve the first word of the block
- Bandwidth: the time to retrieve the rest of this block
来自 GPT,一直有点混淆的概念 - SRAM 和 DRAM
SRAM 与 DRAM 的比较
1. 速度
- SRAM(Static RAM,静态随机存取存储器)比 DRAM(Dynamic RAM,动态随机存取存储器)更快
- SRAM 使用 触发器(flip-flops)来存储数据,每个存储单元由多个晶体管组成,数据存储稳定且不需要定期刷新。这使得 SRAM 在读取和写入数据时速度较快,访问时间通常在 1–2 纳秒 范围内。
- DRAM 使用电容来存储数据,电容会随着时间而放电,因此需要定期刷新以保持数据。由于电容的特性,DRAM 存储单元的访问时间较长,一般在 10–20 纳秒 范围内。
2. 功耗
- SRAM 的功耗相对较高,因为每个存储单元需要更多的晶体管,导致更高的静态功耗。
- DRAM 的功耗较低,因为每个存储单元只有一个晶体管和一个电容,电容的充放电过程比 SRAM 的复杂电路消耗更少的功率。
3. 在存储层级中的应用
- SRAM 通常用于 缓存(Cache)。由于其高速和低延迟的特点,SRAM 被广泛应用于处理器的 L1 缓存、L2 缓存、L3 缓存 中。
- DRAM 通常用于 主内存(RAM)。因为其成本相对较低且存储密度较高,DRAM 是大多数计算机系统的主要内存类型。
辅助存储(Storage):如硬盘(HDD)或固态硬盘(SSD),最慢,但容量最大。
3.2 Technology Trend and Memory Hierarchy
Cache Designers
Block Placement
Direct mapped
Fully-associative
Set-associative
Block Identification
- Write back
- Write through
Write miss
- Write allocate(通常 write back)- 从主存拿块到 cache,再写 cache
- Write around(通常 write around)- 直接写到主存
衡量
平均访存时间
提升效率的策略
- Reduce the miss penalty ——multilevel caches, critical word first, read miss before write miss, merging write buffers, and victim caches
- Reduce the miss rate ——larger block size, large cache size, higher associativity, way prediction and pseudo-associativity, and compiler optimizations
- Reduce the miss penalty and miss rate via parallelism ——non-blocking caches, hardware prefetching, and compiler prefetching
- Reduce the time to hit in the cache ——small and simple caches, avoiding address translation, pipelined cache access, and trace caches
Virtual Memory
OS 做的一层抽象
- Unified cache
- Split Cache - 将 I cache 和 D chache 分开
实际上
- 程序用到不连续的内存
- 可能是主存 + disk
program think
- 连续内存
- 更大的物理存储
TLB - Translation lookside buffer
- a special cache
- TLB entry (full associated cache)
- tag: virtual page number of the virtual address;
- data: a physical page frame number, protection field, valid bit, use bit, dirty bit;
Page size
- 大页 - 页表小,访问快
- 小页 - 不浪费,更灵活
Process Protection
- keys & locks - 除非有 key,否则程序不能访问数据
Virtual Machine
- VMM/hyperviser
展示:pseudo-associative cache
A true set-associative cache tests all the possible ways simultaneously, using something like a content-addressable memory. A pseudo-associative cache tests each possible way one at a time. A hash-rehash cache and a column-associative cache are examples of a pseudo-associative cache.
In the common case of finding a hit in the first way tested, a pseudo-associative cache is as fast as a direct-mapped cache, but it has a much lower conflict miss rate than a direct-mapped cache, closer to the miss rate of a fully associative cache.
出处: Kozyrakis, C. "Lecture 3: Advanced Caching Techniques" (PDF). Archived from the original (PDF) on September 7, 2012.