利用分布式操作系统进行一致的WAN复制和metdata管理

本文主要讲述了分布式系统的应用,利用分布式操作系统进行一致的WAN复制和metdata管理。由文章的背景引入本文的主张。提出了其在OS中的主要贡献。包括可扩展的分布式数据库系统元数据管理,文件系统状态的强一致性复制。
展开查看详情

1. CalvinFS: Consistent WAN Replica5on and Scalable Metdata Management for Distributed File Systems Slide credits: Thomas Kao 1

2. Background •  Scalable solu5ons provided for data storage, why not file systems? 2

3. Mo5va5on •  OLen boMlenecked by the metadata management layer •  Availability suscep5ble to data center outages •  S5ll provides expected file system seman5cs 3

4. Key Contribu5ons •  Distributed database system for scalable metadata management •  Strongly consistent geo-replica5on of file system state 4

5. Calvin: Log •  Many front end servers •  Asynchronously-replicated distributed block store •  Small number of “meta-data” log servers •  Transac5on requests are replicated and appended, in order, by the “meta log” 5

6. Calvin: Storage Layer •  Knowledge of physical data store organiza5on and actual transac5on seman5cs •  Read/write primi5ves that execute on one node •  Placement manager •  Mul5version key-value store at each node, plus consistent hashing mechanism 6

7. Calvin: Scheduler •  Drives local transac5on execu5on •  Fully examines transac5on before execu5on •  Determinis5c locking •  Transac5on protocol: Execute Serve Collect Perform all transac5on remote remote read local reads to reads results comple5on •  No distributed commit protocol 7

8. CalvinFS Architecture •  Design Principles: •  Components –  Main-memory metadata •  Block store store •  Calvin database –  Poten5ally many small files •  Client library –  Scalable read/write throughput –  Tolerate slow writes –  Linearizable and snapshot reads –  Hash-par55oned metadata –  Op5mize for single-file opera5ons 8

9. CalvinFS Block Store •  Variable-size immutable blocks –  1 byte to 10 megabytes •  Block storage and placement –  Unique ID –  Block “buckets” –  Global Paxos-replicated config file –  Compacts small blocks 9

10. CalvinFS Metadata Management •  Key-value store –  Key: absolute path of file/directory –  Value: entry type, permissions, contents 10

11. Metadata Storage Layer •  Six transac5on types: –  Read(path) –  Create{File, Dir}(path) –  Resize(path, size) –  Write(path, file_offset, source, source_offset, num_bytes) –  Delete(path) –  Edit permissions(path, permissions) 11

12. Recursive Opera5ons on Directories •  Use OLLP •  Analyze phase –  Determines affected entries and read/write set •  Run phase –  Check that read/write set has not grown 12

13.Performance: File Counts and Memory Usage •  10 million files of varying size per machine •  Far less memory used per machine •  Handles many more files than HDFS 13

14.Performance: Throughput Linear Sub-linear scalability scalability Linear scalability 14

15.Performance: Latency Write/append latency dominated by WAN replica5on 15

16. Performance: Fault Tolerance •  Able to tolerate outages with liMle to no hit to availability 16

17. Discussion Pros Cons •  Fast metadata •  File crea5on is distributed management transac5on, doesn’t scale •  Deployments are well scalable on large •  Metadata opera5ons have to clusters recursively modify all entries •  Huge storage in affected subtree capabili5es •  High throughput •  File-fragmenta5on addressed of reads and using mechanism that updates en5rely rewrites files •  Resistant to datacenter outages 17

18. Discussion Ques5ons •  Unlimited number of files? •  What about larger files? 18