## 展开查看详情

1. CS162 Operating Systems and Systems Programming Lecture 17 Performance Storage Devices, Queueing Theory October 25, 2017 Prof. Ion Stoica http://cs162.eecs.Berkeley.edu

2. Review: Basic Performance Concepts • Response Time or Latency: Time to perform an operation • Bandwidth or Throughput: Rate at which operations are performed (op/s) – Files: NB/s, Networks: Mb/s, Arithmetic: GFLOP/s • Start up or “Overhead”: time to initiate an operation • Most I/O operations are roughly linear in n bytes – Latency(n) = Overhead + n/Bandwidth 10/25/17 CS162 © UCB Fall 2017 Lec 17.2

3. Solid State Disks (SSDs) • 1995 – Replace rotating magnetic media with non-volatile memory (battery backed DRAM) • 2009 – Use NAND Multi-Level Cell (2 or 3-bit/cell) flash memory – Sector (4 KB page) addressable, but stores 4-64 “pages” per memory block – Trapped electrons distinguish between 1 and 0 • No moving parts (no rotate/seek motors) – Eliminates seek and rotational delay (0.1-0.2ms access time) – Very low power and lightweight – Limited “write cycles” • Rapid advances in capacity and cost ever since! 10/25/17 CS162 © UCB Fall 2017 Lec 17.3

4. SSD Architecture – Reads NAND NAND NAND NAND NAND NAND NAND NAND Buffer Flash Manager NAND NAND Host Memory NAND NAND SATA (software NAND NAND Controller NAND NAND Queue) NAND NAND NAND NAND NAND NAND DRAM NAND NAND Read 4 KB Page: ~25 usec NAND NAND NAND NAND NAND NAND – No seek or rotational latency NAND NAND – Transfer time: transfer a 4KB page » SATA: 300-600MB/s => ~4 x103 b / 400 x 106 bps => 10 us – Latency = Queuing Time + Controller time + Xfer Time – Highest Bandwidth: Sequential OR Random reads 10/25/17 CS162 © UCB Fall 2017 Lec 17.4

5. SSD Architecture – Writes • Writing data is complex! (~200μs – 1.7ms ) – Can only write empty pages in a block – Erasing a block takes ~1.5ms – Controller maintains pool of empty blocks by coalescing used pages (read, erase, write), also reserves some % of capacity • Rule of thumb: writes 10x reads, erasure 10x writes https://en.wikipedia.org/wiki/Solid-state_drive 10/25/17 CS162 © UCB Fall 2017 Lec 17.5

6. Amusing calculation: is a full Kindle heavier than an empty one? • Actually, “Yes”, but not by much • Flash works by trapping electrons: – So, erased state lower energy than written state • Assuming that: – Kindle has 4GB flash – ½ of all bits in full Kindle are in high-energy state – High-energy state about 10-15 joules higher – Then: Full Kindle is 1 attogram (10-18gram) heavier (Using E = mc2) • Of course, this is less than most sensitive scale can measure (it can measure 10-9 grams) • Of course, this weight difference overwhelmed by battery discharge, weight from getting warm, …. • According to John Kubiatowicz (New York Times, Oct 24, 2011) 10/25/17 CS162 © UCB Fall 2017 Lec 17.6

7. SSD Summary • Pros (vs. hard disk drives): – Low latency, high throughput (eliminate seek/rotational delay) – No moving parts: » Very light weight, low power, silent, very shock insensitive – Read at memory speeds (limited by controller and I/O bus) No • Cons longer – Small storage (0.1-0.5x disk), expensive (3-20x disk) true! » Hybrid alternative: combine small SSD with large HDD – Asymmetric block write performance: read pg/erase/write pg » Controller garbage collection (GC) algorithms have major effect on performance – Limited drive lifetime » 1-10K writes/page for MLC NAND » Avg failure rate is 6 years, life expectancy is 9–11 years • These are changing rapidly! 10/25/17 CS162 © UCB Fall 2017 Lec 17.7

8. What Goes into Startup Cost for I/O? • Syscall overhead • Operating system processing • Controller Overhead Startup cost (fixed overhead) • Device Startup 18,000"" Performance)of)gbps)link)with)10)ms)startup) 50"" – Mechanical latency for a disk 16,000"" 45"" 40"" 14,000"" Bandwidth)(mB/s)) 35"" – Media Access + Speed of light + Routing 12,000"" Latency)(us)) 30"" 10,000"" for network 25"" 8,000"" 20"" 6,000"" 15"" 4,000"" 10"" 2,000"" 5"" • Queuing (next topic) 0"" 0"" 0"" 50,000""100,000""150,000""200,000""250,000""300,000""350,000""400,000""450,000""500,000"" Length)(b)) 10/25/17 CS162 © UCB Fall 2017 Lec 17.8

9. I/O Performance 300 Response Controller Time (ms) User I/O device 200 Thread Queue [OS Paths] 100 Response Time = Queue + I/O device service time • Performance of I/O subsystem 0 0% 100% – Metrics: Response Time, Throughput Throughput (Utilization) (% total BW) – Effective BW per op = transfer size / response time » EffBW(n) = n / (S + n/B) = B / (1 + SB/n ) # of ops time per op Fixed overhead 10/25/17 CS162 © UCB Fall 2017 Lec 17.9

10. I/O Performance 300 Response Controller Time (ms) User I/O device 200 Thread Queue [OS Paths] 100 Response Time = Queue + I/O device service time • Performance of I/O subsystem 0 0% 100% – Metrics: Response Time, Throughput Throughput (Utilization) (% total BW) – Effective BW per op = transfer size / response time » EffBW(n) = n / (S + n/B) = B / (1 + SB/n ) – Contributing factors to latency: » Software paths (can be loosely modeled by a queue) » Hardware controller » I/O device service time • Queuing behavior: – Can lead to big increases of latency as utilization increases – Solutions? 10/25/17 CS162 © UCB Fall 2017 Lec 17.10

11. A Simple Deterministic World arrivals Queue Server departures TQ TS TA TA TA Tq TS • Assume requests arrive at regular intervals, take a fixed time to process, with plenty of time between … • Service rate (μ = 1/TS) - operations per sec • Arrival rate: (λ = 1/TA) - requests per second • Utilization: U = λ/μ , where λ < μ • Average rate is the complete story 10/25/17 CS162 © UCB Fall 2017 Lec 17.11

12. A Ideal Linear World Saturation Delivered Throughput Delivered Throughput 1 1 Empty Queue Unbounded 0 1 0 1 Offered Load (TS/TA) Offered Load (TS/TA) Queue delay Queue delay time time • What does the queue wait time look like? – Grows unbounded at a rate ~ (Ts/TA) till request rate subsides 10/25/17 CS162 © UCB Fall 2017 Lec 17.12

13. A Bursty World arrivals Queue Server departures TQ TS Arrivals Q depth Server • Requests arrive in a burst, must queue up till served • Same average arrival time, but almost all of the requests experience large queue delays • Even though average utilization is low 10/25/17 CS162 © UCB Fall 2017 Lec 17.13

14. So how do we model the burstiness of arrival? • Elegant mathematical framework if you start with exponential distribution – Probability density function of a continuous random variable with a mean of 1/λ – f(x) = λe-λx – “Memoryless” 1 0.9 Likelihood of an event occurring 0.8 is independent of how long 0.7 mean arrival interval (1/λ) 0.6 we’ve been waiting 0.5 Lots of short arrival 0.4 intervals (i.e., high 0.3 instantaneous rate) 0.2 0.1 Few long gaps (i.e., low 0 0 2 4 6 8 10 instantaneous rate) x (λ) 10/25/17 CS162 © UCB Fall 2017 Lec 17.14

15. Background: General Use of Random Distributions • Server spends variable time (T) with customers Mean (m) – Mean (Average) m = Sp(T)´T s – Variance (stddev2) s2 = Sp(T)´(T-m)2 = Sp(T)´T2-m2 – Squared coefficient of variance: C = s2/m2 Distribution Aggregate description of the distribution of service times • Important values of C: – No variance or deterministic Þ C=0 – “Memoryless” or exponential Þ C=1 » Past tells nothing about future mean » Poisson process – purely or completely random process » Many complex systems (or aggregates) are well described as memoryless Memoryless – Disk response times C » 1.5 (majority seeks < average) 10/25/17 CS162 © UCB Fall 2017 Lec 17.15

16. Introduction to Queuing Theory Controller Disk Arrivals Queue Departures Queuing System • What about queuing time?? – Let’s apply some queuing theory – Queuing Theory applies to long term, steady state behavior Þ Arrival rate = Departure rate • Arrivals characterized by some probabilistic distribution • Departures characterized by some probabilistic distribution 10/25/17 CS162 © UCB Fall 2017 Lec 17.16

17. Little’s Law arrivals N departures λ L • In any stable system – Average arrival rate = Average departure rate • The average number of jobs/tasks in the system (N) is equal to arrival time / throughput (λ) times the response time (L) – N (jobs) = λ (jobs/s) x L (s) • Regardless of structure, bursts of requests, variation in service – Instantaneous variations, but it washes out in the average – Overall, requests match departures 10/25/17 CS162 © UCB Fall 2017 Lec 17.17

18. Example L=5 λ=1 L=5 N = 5 jobs Jobs 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 time A: N = λ x L • E.g., N = λ x L = 5 10/25/17 CS162 © UCB Fall 2017 Lec 17.18

19. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system at time t N(t) time L(1) T 10/25/17 CS162 © UCB Fall 2017 Lec 17.19

20. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system at time t N(t) time T What is the system occupancy, i.e., average number of jobs in the system? 10/25/17 CS162 © UCB Fall 2017 Lec 17.20

21. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system S(k) at time t S(i) = L(i) * 1 = L(i) N(t) S(2) S(1) time T S = S(1) + S(2) + … + S(k) = L(1) + L(2) + … + L(k) 10/25/17 CS162 © UCB Fall 2017 Lec 17.21

22. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system at time t S(i) = L(i) * 1 = L(i) N(t) S= area time T Average occupancy (Navg) = S/T 10/25/17 CS162 © UCB Fall 2017 Lec 17.22

23. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system S(k) at time t S(i) = L(i) * 1 = L(i) N(t) S(2) S(1) time T Navg = S/T = (L(1) + … + L(k))/T 10/25/17 CS162 © UCB Fall 2017 Lec 17.23

24. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system S(k) at time t S(i) = L(i) * 1 = L(i) N(t) S(2) S(1) time T Navg = (L(1) + … + L(k))/T = (Ntotal/T)*(L(1) + … + L(k))/Ntotal 10/25/17 CS162 © UCB Fall 2017 Lec 17.24

25. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system S(k) at time t S(i) = L(i) * 1 = L(i) N(t) S(2) S(1) time T Navg = (Ntotal/T)*(L(1) + … + L(k))/Ntotal = λavg × Lavg 10/25/17 CS162 © UCB Fall 2017 Lec 17.25

26. Little’s Theorem: Proof Sketch arrivals N departures λ L Job i L(i) = response time of job i N(t) = number of jobs in system S(k) at time t S(i) = L(i) * 1 = L(i) N(t) S(2) S(1) time T Navg = λavg × Lavg 10/25/17 CS162 © UCB Fall 2017 Lec 17.26

27. Administrivia • Regrade request deadline Tue 10/31 at midnight – Please only submit regrade requests for grading errors! • Ion out again on Monday (attending SOSP), 10/29 – Lecture will be given by Anthony Joseph 10/25/17 CS162 © UCB Fall 2017 Lec 17.27

28. BREAK 10/25/17 CS162 © UCB Fall 2017 Lec 17.28

29. A Little Queuing Theory: Some Results (1/2) • Assumptions: – System in equilibrium; No limit to the queue – Time between successive arrivals is random and memoryless Queue Server Arrival Rate Service Rate l μ=1/Tser • Parameters that describe our system: – l: mean number of arriving customers/second – Tser: mean time to service a customer (“m”) – C: squared coefficient of variance = s2/m2 – μ: service rate = 1/Tser – u: server utilization (0£u£1): u = l/μ = l ´ Tser • Parameters we wish to compute: – Tq: Time spent in queue – Lq: Length of queue = l ´ Tq (by Little’s law) 10/25/17 CS162 © UCB Fall 2017 Lec 17.29