随时随地写入文件系统

本文介绍了如何 随时随地写入文件系统。一般来说,系统都需要执行某种特定的功能, 但随着分布式系统的发展,一些电器将逐渐替代普通计算机,具备普通计算机的功能,这与随时随地都可以写入文件系统的发展密不可分。例如,思科和Avici的路由器。
展开查看详情

1. CS 202 Advanced OS WAFL: Write Anywhere File System

2.Presenta<on credit: Mark Claypool from WPI. WAFL: WRITE ANYWHERE FILE SYSTEMS

3. Introduc<on •  In general, an appliance is a device designed to perform a specific func<on •  Distributed systems trend has been to use appliances instead of general purpose computers. Examples: –  routers from Cisco and Avici –  network terminals –  network printers •  New type of network appliance is an Network File System (NFS) file server

4. Introduc<on : NFS Appliance •  NFS File Server Appliance file systems have different requirements than those of a general purpose file system –  NFS access paOerns are different than local file access paOerns –  Large client-side caches result in fewer reads than writes •  Network Appliance Corpora<on uses a Write Anywhere File Layout (WAFL) file system

5. Introduc<on : WAFL •  WAFL has 4 requirements –  Fast NFS service –  Support large file systems (10s of GB) that can grow (can add disks later) –  Provide high performance writes and support Redundant Arrays of Inexpensive Disks (RAID) –  Restart quickly, even aXer unclean shutdown •  NFS and RAID both strain write performance: –  NFS server must respond aXer data is wriOen –  RAID must write parity bits also

6. Outline •  Introduc<on (done) •  Snapshots : User Level (next) •  WAFL Implementa<on •  Snapshots: System Level •  Performance •  Conclusions

7. Introduc<on to Snapshots •  Snapshots are a copy of the file system at a given point in <me –  WAFL’s “claim to fame” •  WAFL creates and deletes snapshots automa<cally at preset <mes –  Up to 255 snapshots stored at once •  Uses Copy-on-write to avoid duplica<ng blocks in the ac<ve file system •  Snapshot uses: –  Users can recover accidentally deleted files –  Sys admins can create backups from running system –  System can restart quickly aXer unclean shutdown •  Roll back to previous snapshot

8. User Access to Snapshots •  Example, suppose accidentally removed file named “todo”: spike% ls -lut .snapshot/*/todo -rw-r--r-- 1 hitz 52880 Oct 15 00:00 .snapshot/nightly.0/todo -rw-r--r-- 1 hitz 52880 Oct 14 19:00 .snapshot/hourly.0/todo -rw-r--r-- 1 hitz 52829 Oct 14 15:00 .snapshot/hourly.1/todo -rw-r--r-- 1 hitz 55059 Oct 10 00:00 .snapshot/nightly.4/todo -rw-r--r-- 1 hitz 55059 Oct 9 00:00 .snapshot/nightly.5/todo •  Can then recover most recent version: spike% cp .snapshot/hourly.0/todo todo •  Note, snapshot directories (.snapshot) are hidden in that they don’t show up with ls

9. Snapshot Administra<on •  The WAFL server allows commands for sys admins to create and delete snapshots, but typically done automa<cally •  At WPI, snapshots of /home: –  7:00 AM, 10:00, 1:00 PM, 4:00, 7:00, 10:00, 1:00 AM –  Nightly snapshot at midnight every day –  Weekly snapshot is made on Sunday at midnight every week •  Thus, always have: 7 hourly, 7 daily snapshots, 2 weekly snapshots claypool 32 ccc3=>>pwd /home/claypool/.snapshot claypool 33 ccc3=>>ls hourly.0/ hourly.3/ hourly.6/ nightly.2/ nightly.5/ weekly.1/ hourly.1/ hourly.4/ nightly.0/ nightly.3/ nightly.6/ hourly.2/ hourly.5/ nightly.1/ nightly.4/ weekly.0/

10. Outline •  Introduc<on (done) •  Snapshots : User Level (done) •  WAFL Implementa<on (next) •  Snapshots: System Level •  Performance •  Conclusions

11. WAFL File Descriptors •  Inode based system with 4 KB blocks •  Inode has 16 pointers •  For files smaller than 64 KB: –  Each pointer points to data block •  For files larger than 64 KB: –  Each pointer points to indirect block •  For really large files: –  Each pointer points to doubly-indirect block •  For very small files (less than 64 bytes), data kept in inode instead of pointers

12. WAFL Meta-Data •  Meta-data stored in files –  Inode file – stores inodes –  Block-map file – stores free blocks –  Inode-map file – iden<fies free inodes

13. Zoom of WAFL Meta-Data (Tree of Blocks) •  Root inode must be in fixed loca<on •  Other blocks can be wriOen anywhere

14. Snapshots (1 of 2) •  Copy root inode only, copy on write for changed data blocks •  Over time, old snapshot references more and more data blocks that are not used •  Rate of file change determines how many snapshots can be stored on the system

15. Snapshots (2 of 2) •  When disk block modified, must modify indirect pointers as well •  Batch, to improve I/O performance

16. Consistency Points (1 of 2) •  In order to avoid consistency checks aXer unclean shutdown, WAFL creates a special snapshot called a consistency point every few seconds –  Not accessible via NFS •  Batched opera<ons are wriOen to disk each consistency point •  In between consistency points, data only wriOen to RAM

17. Consistency Points (2 of 2) •  WAFL use of NVRAM (NV = Non-Vola<le): –  (NVRAM has baOeries to avoid losing during unexpected poweroff) –  NFS requests are logged to NVRAM –  Upon unclean shutdown, re-apply NFS requests to last consistency point –  Upon clean shutdown, create consistency point and turnoff NVRAM un<l needed (to save baOeries) •  Note, typical FS uses NVRAM for write cache –  Uses more NVRAM space (WAFL logs are smaller) •  Ex: “rename” needs 32 KB, WAFL needs 150 bytes •  Ex: write 8KB needs 3 blocks (data, inode, indirect pointer), WAFL needs 1 block (data) plus 120 bytes for log –  Slower response <me for typical FS than for WAFL

18. Write Alloca<on •  Write <mes dominate NFS performance –  Read caches at client are large –  5x as many write opera<ons as read opera<ons at server •  WAFL batches write requests •  WAFL allows write anywhere, enabling inode next to data for beOer perf –  Typical FS has inode informa<on and free blocks at fixed loca<on •  WAFL allows writes in any order since uses consistency points –  Typical FS writes in fixed order to allow fsck to work

19. Outline •  Introduc<on (done) •  Snapshots : User Level (done) •  WAFL Implementa<on (done) •  Snapshots: System Level (next) •  Performance •  Conclusions

20. The Block-Map File •  Typical FS uses bit for each free block, 1 is allocated and 0 is free –  Ineffec<ve for WAFL since may be other snapshots that point to block •  WAFL uses 32 bits for each block

21. Crea<ng Snapshots •  Could suspend NFS, create snapshot, resume NFS –  But can take up to 1 second •  Challenge: avoid locking out NFS requests •  WAFL marks all dirty cache data as IN_SNAPSHOT. Then: –  NFS requests can read system data, write data not IN_SNAPSHOT –  Data not IN_SNAPSHOT not flushed to disk •  Must flush IN_SNAPSHOT data as quickly as possible

22. Flushing IN_SNAPSHOT Data •  Flush inode data first –  Keeps two caches for inode data, so can copy system cache to inode data file, unblocking most NFS requests (requires no I/O since inode file flushed later) •  Update block-map file –  Copy ac<ve bit to snapshot bit •  Write all IN_SNAPSHOT data –  Restart any blocked requests •  Duplicate root inode and turn off IN_SNAPSHOT bit •  All done in less than 1 second, first step done in 100s of ms

23. Outline •  Introduc<on (done) •  Snapshots : User Level (done) •  WAFL Implementa<on (done) •  Snapshots: System Level (done) •  Performance (next) •  Conclusions

24. Performance (1 of 2) •  Compare against NFS systems •  Best is SPEC NFS –  LADDIS: Legato, Auspex, Digital, Data General, Interphase and Sun •  Measure response <mes versus throughput •  (Me: System Specifica<ons?!)

25. Performance (2 of 2) (Typically, look for “knee” in curve)

26. NFS vs. New File Systems 14 10 MPFS Clients Response Time (Msec/Op) 12 5 MPFS Clients & 10 5 NFS Clients 10 NFS Clients 8 6 4 2 0 0 1000 2000 3000 4000 5000 Generated Load (Ops/Sec) •  Remove NFS server as bottleneck •  Clients write directly to device

27. Conclusion •  NetApp (with WAFL) works and is stable –  Consistency points simple, reducing bugs in code –  Easier to develop stable code for network appliance than for general system •  Few NFS client implementa<ons and limited set of opera<ons so can test thoroughly