- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
利用分布式操作系统进行一致的WAN复制和metdata管理
展开查看详情
1 . CalvinFS: Consistent WAN Replica5on and Scalable Metdata Management for Distributed File Systems Slide credits: Thomas Kao 1
2 . Background • Scalable solu5ons provided for data storage, why not file systems? 2
3 . Mo5va5on • OLen boMlenecked by the metadata management layer • Availability suscep5ble to data center outages • S5ll provides expected file system seman5cs 3
4 . Key Contribu5ons • Distributed database system for scalable metadata management • Strongly consistent geo-replica5on of file system state 4
5 . Calvin: Log • Many front end servers • Asynchronously-replicated distributed block store • Small number of “meta-data” log servers • Transac5on requests are replicated and appended, in order, by the “meta log” 5
6 . Calvin: Storage Layer • Knowledge of physical data store organiza5on and actual transac5on seman5cs • Read/write primi5ves that execute on one node • Placement manager • Mul5version key-value store at each node, plus consistent hashing mechanism 6
7 . Calvin: Scheduler • Drives local transac5on execu5on • Fully examines transac5on before execu5on • Determinis5c locking • Transac5on protocol: Execute Serve Collect Perform all transac5on remote remote read local reads to reads results comple5on • No distributed commit protocol 7
8 . CalvinFS Architecture • Design Principles: • Components – Main-memory metadata • Block store store • Calvin database – Poten5ally many small files • Client library – Scalable read/write throughput – Tolerate slow writes – Linearizable and snapshot reads – Hash-par55oned metadata – Op5mize for single-file opera5ons 8
9 . CalvinFS Block Store • Variable-size immutable blocks – 1 byte to 10 megabytes • Block storage and placement – Unique ID – Block “buckets” – Global Paxos-replicated config file – Compacts small blocks 9
10 . CalvinFS Metadata Management • Key-value store – Key: absolute path of file/directory – Value: entry type, permissions, contents 10
11 . Metadata Storage Layer • Six transac5on types: – Read(path) – Create{File, Dir}(path) – Resize(path, size) – Write(path, file_offset, source, source_offset, num_bytes) – Delete(path) – Edit permissions(path, permissions) 11
12 . Recursive Opera5ons on Directories • Use OLLP • Analyze phase – Determines affected entries and read/write set • Run phase – Check that read/write set has not grown 12
13 .Performance: File Counts and Memory Usage • 10 million files of varying size per machine • Far less memory used per machine • Handles many more files than HDFS 13
14 .Performance: Throughput Linear Sub-linear scalability scalability Linear scalability 14
15 .Performance: Latency Write/append latency dominated by WAN replica5on 15
16 . Performance: Fault Tolerance • Able to tolerate outages with liMle to no hit to availability 16
17 . Discussion Pros Cons • Fast metadata • File crea5on is distributed management transac5on, doesn’t scale • Deployments are well scalable on large • Metadata opera5ons have to clusters recursively modify all entries • Huge storage in affected subtree capabili5es • High throughput • File-fragmenta5on addressed of reads and using mechanism that updates en5rely rewrites files • Resistant to datacenter outages 17
18 . Discussion Ques5ons • Unlimited number of files? • What about larger files? 18