Thread
Definition
- Thread is the basic unit of execution within a process
- Each thread has its own
- thread ID
- program counter
- register set
- Stack
- 共有的东西
- code section
- data section
- the heap (dynamically allocated memory)
- open files and signals
- 多线程进程可以同时做很多事情
线程的优势
- 创建一个线程比进程便宜
- 线程直接的 context-switch 比进程便宜
- 线程间共用内存,不需要 IPC
- Powerful(存在危险
- Responsiveness(可以响应很多活动
- Scalability(在多核机器上发挥更好
缺点
- Weak isolation - 如果一个线程 segfault,整个程序 fails
- leads to process-based concurrency
- memory-constrained
- memory protection 不起作用
多线程带来的挑战
- 数据依赖、同步
- 将任务和数据分成多个活动分给线程
- Balancing load among threads
- 测试和 debug
User Threads vs. Kernel Threads
- 可以只在用户模式下支持线程(managed by some user-level thread library (e.g., Java Green Threads)
- 也可以由 Kernel 支持(需要 data structure and functionality
- linux 没有对线程和进程的数据结构做出区分·
Many-to-One Model
- 一个 kernel thread 对应多个 user thread
- 缺点:不能很好地利用多核架构/如果一个线程被阻塞,所有都运行不了
- example - Java Green Threads/GNU Portable Threads
One-to-One Model
现在用的最多
简单+硬件便宜
- 一个 kernel thread 对应一个 user thread
- 可以消除 Many-to-One Model 的缺点
- 耗资源,创建新线程要创建新的 kernel 线程,需要更多时间
- context 切换时,存在对应的 kernel space(tk
- Linux,Windows,Solaris 9 and later 用的都是这种模式
其他不常见的模式
- Many-to-Many - 当一个线程被阻塞,创建新的 kernel 线程防止其他线程终止;一个线程的创建不一定需要新的 kernel 线程
- Two-Level - The user can say: “Bind this thread to its own kernel thread”
Thread Library
- 为用户提供在自己程序中创建进程的方式
- In C/C++: pthreads and Win32 threads - 由 kernel 实现
- pthreads - Specification, not implementation
- In C/C++: OpenMP
- Identifies parallel regions
pragma omp parallel
有多少核创建多少个进程
- Java Thread
- JVM 实现,追踪线程状态,进行 Schedule
- Old versions of the JVM used Green Threads(不能多核)
- The JVM now provides native threads(mapped to kernel threads)
Threading Issues
Semantics of fork()
& exec()
system calls
- 多线程的情况下,call
fork()
有两种可能- 一个新的进程被创建,只有一个线程(即 call
fork()
的那个线程的拷贝) - 一个新的进程被创建,源进程的所有线程都被 copy
- 一个新的进程被创建,只有一个线程(即 call
- 有些 OS 提供两种选项(Linux 使用第一种
- If one calls
exec()
afterfork()
, all threads are “wiped out” anyway
Signal handling
- 多线程的情况下,call
signal()
有很多种选择- Deliver the signal to the thread to which the signal applies
- Deliver the signal to every thread in the process
- Deliver the signal to certain threads in the process
- Assign a specific thread to receive all signals
- Most UNIX versions: a thread can say which signals it accepts and which signals it doesn’t accept
- On Linux - tricky
Thread cancellation of target thread
- Asynchronous - One thread terminates another immediately
- deferred - A thread periodically checks whether it should terminate
- If thread has cancellation disabled (off), cancellation remains pending until thread enables it
- Cancellation only occurs when thread reaches cancellation point(
pthread_testcancel()
in pthread), Then cleanup handler is invoked
Operating System Examples
- Windows Threads
- Linux Threads
clone()
syscall 来创建线程/进程- Shares execution context with its parent
- 多线程进程,PID 为 leading thread ID,其他存在链表中
- User thread to kernel thread mapping
最后一点没看懂啊,感觉不重要
Inter-Process Communications(IPCs)
process 可能是 independent or cooperating 的
为什么合作 - Information sharing/Computation speedup/Modularity/Convenience
real world example - chorme
Models of IPC
- Message passing
- 适合小数据
- 实现容易
- 有时对用户来说很麻烦,因为代码中散布着 send/recv 操作
- high-overhead - 每次都要 syscall
- Shared memory
- low-overhead - 只需要最开始 syscall 初始化,后续不需要
- 用户使用简单(只需要读写 RAM
- 难实现
- Signal
- Pipe
- Socket
大部分 os 两种都用
Shared Memory
- 进程建立一个共享内存区域(segment)然后 attatch to their address space(与内存隔离原则相悖
- 通过读写这个内存区域实现信息交流(进程自己负责不产生冲突,os 不负责
Example - Bounded buffer
POSIX
Message Passings
- Two fundamental operations:
- send (P, message) – send a message to process P
- receive(Q, message) – receive a message from process Q
- 过程
- 建立 “link”(可以用多种方式实现
- calls to send() and recv()
- 可以关掉 “link”
- Message passing is key for distributed computing(不同主机不能共享内存
Implementation of communication link
Physical:
- Shared memory
- Hardware bus
- Network
Logical:
- Direct or indirect
- Direct - 必须显式标明收发信息的线程
- indirect - mailbox (ports)
- Synchronous or asynchronous
- Blocking is considered synchronous
- Non-blocking is considered asynchronous
- Automatic or explicit buffering
- Zero capacity/Bounded capacity/Unbounded capacity
实例
- Signals are a UNIX form of IPC
- Pipes
- Ordinary pipes(anonymous pipes in WIN) - 需要是父亲和孩子才能通过管道传递信息
- Named pipes
- UNIX pipe 是单边的, The command
ls | grep foo
creates two processes that communicate via a pipe- The ls process writes on the write-end
- The grep process reads on the read-end
- Client-Server
- Socket = ip address + port number
- Remote Procedure Calls - done by a client stub
- RPCs
- RMI (JAVA)
Messages are directed and received from mailboxes (also referred to as ports)
- Each mailbox has a unique id
- Processes can communicate only if they share a mailbox
pipe - parent 传递给 child
- 一个进程至少有一个线程
- share memory