Chapter 13: Garbage Collection

Garbage - Allocated but no longer used storage
没有指针链抵达如 a->target，一定是垃圾
垃圾不一定不可达

Mark-and-Sweep

Mark: 深度优先搜索找到所有可达的结点，并标记
Sweep: 线性扫描 heap，把没 mark 的结点放在链表中（freelist），并把标记删除，便于下一次垃圾收集
程序申请空间从 freelist 里获得
当 freelist 清空时重新收集垃圾

复杂度分析

mark time 与可到达的 data 数 R 成正比
sweep time 与堆区大小 H 成正比
total time = c1R + c2H
每个回收结点的均摊代价 total time/(H-R)

DFS 大小可能比堆区还大

递归 -> 迭代，需要用到栈，占空间
Pointer Reversal

function DFS()
    if(x is a pointer and record x is not marked)
        t <- nil
        mark x; done[x] <- 0// x 的几个域已经被处理了
        while(true)
            i <- done[x]
            // 访问 x 的第 i 个域
            if(i < # of fields in record x)// 还有域没被处理
                y <- x.fi
                if(y is a pointer and record y is not marked)
                    // 深度优先搜索 x.fi
                    x. fi ← t; t ← x; x ← y
                    mark x; done[x] ← 0
                else
                    // 访问过的域 + 1
                    done[x] ← i + 1
            else// 所有域都被处理了
                y ← x; x ← t // 回到父节点
                if x = nil then return
                i ← done[x]
                t ← x. fi; x. fi ← y// 翻转指针
                done[x] ← i + 1

Mark-and-Sweep 算法

优点：不用移动 node 的位置，可以处理循环引用
缺点：正在执行的程序需要暂停，Leads to fragmentation in the heap，cache miss 和复杂度的内存分配

External fragmentation：程序需要大小为 n 的空间，有很多更小的空间，没有刚好的大小

Internal fragmentation：程序需要大小为 n 的空间，却分配了比 n 大很多的空间，浪费了

Reference Counts

不在内存耗尽后再收集垃圾，而是 keep track

计数为 0 时收集垃圾

不能解决循环引用
代价昂贵

Copying Collection

basic idea: 把 memory 分成两部分，用复制收集

from-space：正在被使用的一半 memory

to-space：不被使用的另一半

用完 from-spcae 后遍历，把不是垃圾的复制到 to-space，交换
没有 external fragmentation

Pointer Forwarding

p 指向 from-space，怎么让他指向 to-space

Cheney's Algorithm

BFS
局部性不好（cache miss）

Hybird Algorithm

alt text

优点
- 简单
- 复杂度低
- 连续的空闲空间
缺点
- 浪费一半空间
- 局部性不好
- 需要精确的类型

Interface to the Compiler

面向对象

Implementation

Static Method

调用 c.f()，f() 是静态方法

找到 C 类的声明
在其中找 f，如果没有找到，找 C 类的父类 B
在 B 的声明中找 f，如果没有找到，找 B 类的父类 A
……

Dynamic Method

Private Fields and Method

保护方法不被外界调用

优化

Dominator

Dominator: 从 s0 到 n 的所有路径，都经过 d，则 d 是 n 的 dominator（必经结点）
n 一定是他本身的必经结点
如果 d 是 n 的所有前驱结点的必经结点，那么 d 是 n 的前驱结点

\[ D[n] = {n} \cup (\cap_{p \in pred[n]} D[p])\ for\ n \neq s_0 \]

公式单调减少，从全局开始迭代

Immediate Dominate

Theorem
Immediate Dominate: 离 n 最近的必经结点是直接必经结点
- d 不是 n 本身
- d 是 n 的必经结点
- d 到 n 不经过别的必经结点
Dominator Tree: 把 n 和直接必经结点连起来

Loops

BackEdge: 从 n 到他的必经结点的边是回边
Natural Loop: 对于回边 n->h，h 是头节点
Nested Loop
Loop-Nest Tree

Induction Variables

i 在循环中自增/自减（e.g. i++）- basic induction variable
j 可以由 i 算出（e.g. j = i*c+d） - derived induction variable in the family of i
可以不引用 i 直接计算 j（e.g. j = j+a*c）
三元组表示
- j = a+i*b -> (i, a, b)
- k
Linear Induction Variable - 每个循环的增量一直

Loop Unrolling

考察 18.1, 18.2

dominates？
Escaping variables？