Chapter3 Memory Hierarchy

alt text

By taking advantage of the principle of locality:

Present the user with as much memory as is available in the cheapest technology.
Provide access at the speed offered by the fastest technology.

3.1 Introduction

Cache

计组学的还行，放一放

Definition: Cache - a safe place for hiding or storing things.（一般是 SRAM）

地址离开 CPU 遇到的第一层存储
通过缓存重复利用经常出现的条目
Cache hit/miss - When the processor can/cannot find a requested data item in the cache
Block/Line Run - 固定大小的，从 memory 拿到 cache 的数据块
Cache Locality
- Temporal locality - need the requested word again soon
- Spatial locality - likely need other data in the block soon
Cache Miss Time 取决于
- Latency: the time to retrieve the first word of the block
- Bandwidth: the time to retrieve the rest of this block

来自 GPT，一直有点混淆的概念 - SRAM 和 DRAM

SRAM 与 DRAM 的比较

1. 速度

SRAM（Static RAM，静态随机存取存储器）比 DRAM（Dynamic RAM，动态随机存取存储器）更快
- SRAM 使用 触发器（flip-flops）来存储数据，每个存储单元由多个晶体管组成，数据存储稳定且不需要定期刷新。这使得 SRAM 在读取和写入数据时速度较快，访问时间通常在 1–2 纳秒 范围内。
- DRAM 使用电容来存储数据，电容会随着时间而放电，因此需要定期刷新以保持数据。由于电容的特性，DRAM 存储单元的访问时间较长，一般在 10–20 纳秒 范围内。

2. 功耗

SRAM 的功耗相对较高，因为每个存储单元需要更多的晶体管，导致更高的静态功耗。
DRAM 的功耗较低，因为每个存储单元只有一个晶体管和一个电容，电容的充放电过程比 SRAM 的复杂电路消耗更少的功率。

3. 在存储层级中的应用

SRAM 通常用于缓存（Cache）。由于其高速和低延迟的特点，SRAM 被广泛应用于处理器的 L1 缓存、L2 缓存、L3 缓存 中。
DRAM 通常用于 主内存（RAM）。因为其成本相对较低且存储密度较高，DRAM 是大多数计算机系统的主要内存类型。

辅助存储（Storage）：如硬盘（HDD）或固态硬盘（SSD），最慢，但容量最大。

3.2 Technology Trend and Memory Hierarchy

alt text

Cache Designers

Block Placement

Direct mapped

Fully-associative

Set-associative

Block Identification

Write back
Write through

Write miss

Write allocate（通常 write back）- 从主存拿块到 cache，再写 cache
Write around（通常 write around）- 直接写到主存

衡量

平均访存时间

\[ AMAT=HitTime+MissRate\times MissPenalty \]

提升效率的策略

Reduce the miss penalty ——multilevel caches, critical word first, read miss before write miss, merging write buffers, and victim caches
Reduce the miss rate ——larger block size, large cache size, higher associativity, way prediction and pseudo-associativity, and compiler optimizations
Reduce the miss penalty and miss rate via parallelism ——non-blocking caches, hardware prefetching, and compiler prefetching
Reduce the time to hit in the cache ——small and simple caches, avoiding address translation, pipelined cache access, and trace caches

alt text

Virtual Memory

OS 做的一层抽象

Unified cache
Split Cache - 将 I cache 和 D chache 分开

实际上

程序用到不连续的内存
可能是主存 + disk

program think

连续内存
更大的物理存储

TLB - Translation lookside buffer

a special cache
TLB entry （full associated cache）
- tag: virtual page number of the virtual address;
- data: a physical page frame number, protection field, valid bit, use bit, dirty bit;

alt text

Page size

大页 - 页表小，访问快
小页 - 不浪费，更灵活

Process Protection

keys & locks - 除非有 key，否则程序不能访问数据

Virtual Machine

VMM/hyperviser

展示：pseudo-associative cache

Cache 维基:

A true set-associative cache tests all the possible ways simultaneously, using something like a content-addressable memory. A pseudo-associative cache tests each possible way one at a time. A hash-rehash cache and a column-associative cache are examples of a pseudo-associative cache.

In the common case of finding a hit in the first way tested, a pseudo-associative cache is as fast as a direct-mapped cache, but it has a much lower conflict miss rate than a direct-mapped cache, closer to the miss rate of a fully associative cache.

出处： Kozyrakis, C. "Lecture 3: Advanced Caching Techniques" (PDF). Archived from the original (PDF) on September 7, 2012.

alt text

最早的一篇，后面好像都引的它（这个时候叫 column-associative cache

一篇分析不同 miss rate 的文章，提到了 pseudo-associative