03-DFS Desirable Features

Transparency;Concurrency and synchronization;File caching and replication;Heterogeneity;Fault tolerance;Consistency;Security

1.CSS434 Distributed File Systems Textbook Ch8, 13 Professor: Munehiro Fukuda CSS434 DFS 1

2. DFS Desirable Features  Transparency:  Access transparency: a single set of operations  Location transparency: uniform file name space  Mobility transparency: file mobility  Performance transparency: Comparable to a centralized file system  Concurrency and synchronization: should complete concurrent access requests consistently.  Forward/backward validation  File caching and replication:  Caching: at client/server for scalability  Replication: at multiple servers for availability  Heterogeneity: should allow a variety of nodes to share files in different storage media and OS  Similarity between Unix and NTFS: stream-oriented files, a tree-structured system  Difference between Unix and NFTS: CR char included in NTFS, file naming  Fault tolerance: at-most-once or at-least-once semantics  Consistency: Unix one-copy update semantics, session semantics, etc.  Security: should protect files from network intruders. CSS434 DFS 2

3. Consistency Maintenance in Various Storage Systems Sharing Persis- Distributed Consistency Example tence cache/replicas maintenance Main memory 1 RAM File system 1 UNIX file system Distributed file system Sun NFS Web Web server Distributed shared memory Ivy (Ch. 16) Remote objects (RMI/ORB) 1 CORBA Persistent object store 1 CORBA Persistent Object Service Persistent distributed object store PerDiS, Khazana CSS434 DFS 3

4. File Service Architecture Client computer Server computer Application Application Directory service program program Flat file service Client module (File caching/replication) (File caching) Consistency maintenance CSS434 DFS 4

5. DFS Services  Flat file service  File-accessing mechanism: deciding a place to manage remote files and unit to transfer data (at server or client? file, block or byte?)  File-sharing semantics: providing similar to Unix but weaker file update semantics  File-caching mechanism: improving performance/scalability  File-replication mechanism: improving performance/availability  Directory service  Mapping between text file names and reference to files, (i.e. file IDs) CSS434 DFS 5

6. Flat File Service Operations Read(FileId, i, n) -> Data If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items — throwsBadPosition from a file starting at item i and returns it in Data. Write(FileId, i, Data) If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a — throwsBadPosition file, starting at item i, extending the file if necessary. Create() -> FileId Creates a new file of length 0 and delivers a UFID for it. Delete(FileId) Removes the file from the file store. GetAttributes(FileId) -> AttrReturns the file attributes for the file. SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not shaded in ). CSS434 DFS 6

7. Directory Service Operations Lookup(Dir, Name) -> FileId Locates the text name in the directory and returns the — throwsNotFound relevant UFID. If Name is not in the directory, throws an exception. AddName(Dir, Name, File) If Name is not in the directory, adds (Name, File) to the — throwsNameDuplicate directory and updates the file’s attribute record. If Name is already in the directory: throws an exception. UnName(Dir, Name) If Name is in the directory: the entry containing Name is — throwsNotFound removed from the directory. If Name is not in the directory: throws an exception. GetNames(Dir, Pattern) -> NameSeqReturns all the text names in the directory that match the regular expression Pattern. host1 host2 host3 Name1 Name2 Name3 addName( Dir, Name, file) Dir file Ref count=3 if ref_count = 0, file deleted CSS434 DFS 7

8. File-Accessing Models  Accessing Remote Files File access Merits Demerits Remote service At a server A simple Communication model implementation overhead Data caching At a client that Reducing network Cache consistency model cached a file copy traffic problem NFS  Unit of Data Transfer Transfer Merits Demerits level File Simple, less communication A client required to have large overhead, and immune to storage space server Block A client not required to have More network traffic/overhead large storage space Byte Flexibility maximized Difficult cache management to handle the variable-length data Record Handling structured and More network traffic indexed files CSS434 DFS More overhead to re-construct 8a

9. File-Sharing Semantics  Define when modifications of the file data made by a user are observable by other users 1. Unix semantics 2. Session Semantics 3. Immutable shared-files semantics 4. Transaction-like semantics CSS434 DFS 9

10. File-Sharing Semantics Unix Semantics (One-copy Update Semantics) Absolute Ordering (seen to all clients as if only a single copy existed and is updated immediately) Client A Append(e) read delayed a b a b a b c a b c a b c d a b c d e a b c d e a b c t1 t2 t3 t4 t5 t6 delayed Append(c) Append(d) read Client B Network Delays (Inevitable to have a weaker semantics) CSS434 DFS 10

11. File-Sharing Semantics Session Semantics Client A Client B Client C Server a b Open(file) a b Append(c) a b c Open(file) a b Append(d) a b c d Append(x) a b x Append(e) a b c d e Append(y) a b x y Close(file) a b c d e Append(z) a b c y z Open(file) a b c d e Close(file) a b x y z Append(m) a b c d e m Close(file) a b c d e m File writes may overwrite previous updates. File lock is needed to prevent this overwrites. CSS434 DFS 11

12. File-Sharing Semantics Session Semantics with File Lock Client A Client B Server a b file Open(file) a b lockt Append(c) a b c Open(file) a b lockt Append(x) User need to choose: quit, steal, or proceed ^x^w Close(file) X a b x a b x Close(file) ^x^s User need to choose: a b x a b x Quit, save anyway, or type ^x^w file file2 X a b c a b c file3 CSS434 DFS 12

13. File-Sharing Semantics Transaction-Like Semantics (Concurrency Control) Backward validation Forward validation Client A Client B Client C Client D Client A Client B Client C Client D Trans_start Trans_start Compare reads with Compare write with R1 R1 later reads former writes R2 R2 Trans_start Trans_start W3 W3 R4 R1 R4 R1 W5 R2 Trans_start W5 R2 Trans_start W6 W6 validation R1 validation R1 R4 R4 Commitment W7 R2 Commitment W7 R2 Trans_start Trans_start Trans_end W9 Trans_end W9 R1 R1 R4 R4 R2 Trans_abort W8 R2 W8 Trans_end R6 Trans_restart R6 R8 R8 W8 W8 Trans_end Trans_end Abort itself or conflicting active transactions Trans_abort Trans_restart Trans_end Which validation is better? CSS434 DFS 13

14. File-Sharing Semantics Immutable Shared-Files Semantics Client A Client B Server Version 1.0 Tentative Tentative based on based on 1.0 1.0 Version 1.1 Version conflict Abort Depend on each file system. Version Version Abortion is simple (later, the client A can 1.2 1.2 Decide to overwrite it with its tentative 1.0 by changing the corresponding directory) Ignore conflict Merge CSS434 DFS 14

15. File-Caching Schemes Cache Location Node boundary Location Merits Demerits Client Server No caching No modifications Frequent disk access, Main Main Busy network traffic memory memory In server’s One-time disk Busy network traffic copy main access, copy memory Easy implementation, Unix-like file- copy sharing semantics Disk Disk In client’s One-time network Cache consistency file disk access, problem, No size restriction File access semantics, Frequent disk access, No Diskless workstation In client’s CSS434 Maximum DFS Size restriction, 15

16. File-Caching Schemes Modification Propagation Client 1 Client 2  Write-through scheme Main Main  Pros: Unix-like semantics and high reliability memory memory  Cons: Poor write performance copy new copy  Delayed-write scheme W W  Write on cache displacement Immediate write  Periodic write Disk  Write on close file W  Pros:  Write accesses complete quickly Client 1 Client 2  Some writes may be omitted by the Main Main following writes. memory memory  Gathering all writes mitigates network copy new W copy overhead. W  Cons: delayed write  Delaying of write propagation results in Disk file fuzzier file-sharing semantics. CSS434 DFS 16

17. File-Caching Schemes Cache Validation Schemes – Client-Initiated Approach Client 1 Client 2 Main Main  Checking before every access (Unix-like memory memory copy semantics but too slow) W copy  Checking periodically (better performance Write through but fuzzy file-sharing semantics) Disk Check before Delayed write?file every access  Checking on file open (simple, suitable for W session-semantics) Client 1 Client 2  Problem: High network traffic Main Main memory memory copy new W copy W W Disk Check-on-open Write-on-close file Check-on-close? W CSS434 DFS 17

18. File-Caching Schemes Cache Validation Schemes – Server-Initiated Approach Client 1 Client 2 Client 3 Client 4 Main Main Main Main memory memory memory memory copy copy copy W W Deny for a new open W Write through Notify (invalidate) Or Disk Delayed write? file W  Keeping track of clients having a copy  Denying a new request, queuing it, and disabling caching  Notifying all clients of any update on the original file  Problem:  violating client-server model  Stateful servers  Check-on-open still needed for the 2 nd file opening. CSS434 DFS 18

19. Homework Assignment 4 Client 1 Server Client 2 invalidate( ) download( ) invalidate( ) writeback( ) upload( ) writeback( ) name reader owner state s Name Acces Owne state file1 client2 client1 wShar s r e Name Acces Owne state file1 write true wOw s r file2 clien3 rShare n file1 read false rShar e chmod 600 chmod 400 file1 file1 file1 file2 /tmp cwd /tmp emacs emacs  Session semantics  Client-side/server-side caching  Server-initiated invalidation CSS434 DFS 19

20.File Access Improvements  Data sieving for a single client  Read a larger contiguous file portion  Extract actual file portions from it  Collective I/O for multiple clients  Read contiguous space, thereafter distribute sub spaces to each client  Disk-directed I/O  Server-directed I/O  Two-phase I/O (Clients-directed) CSS434 DFS 20

21.Data Sieving User’s request for non-contiguous file portions Read a larger contiguous block into memory Copy requested portions into user’s buffer (from R. Thakur’s Data Sieving and Collective I/O in ROMIO, 1998) CSS434 DFS 21

22. Two-Phase I/O Read contiguous Redistribute P0 P0 P1 Read contiguous Redistribute P1 Read contiguous Redistribute P2 P2 P3 Read contiguous Redistribute P3 CSS434 DFS 22

23.Hierarchy (from Fukuda/Miyauchi Journal of Supercomputing) key value read files 128_inputFile1_1 contents GU commander 52 Id: 0 8 52 32_inputFile1_0 contents I 8 32_inputFile2_0 contents 128_inputFile1_1 contents 528_inputFile2_7 contents root 528_inputFile1_7 contents 32_inputFile1_0 contents sentinel 32_inputFile2_0 contents Id: 2 32 528_inputFile2_7 contents 128 sentinel 528 32 128 528 sentinel 528_inputFile1_7 contents Id: 8 Id: 9 sentinel sentinel sentinel sentinel sentinel sentinel Id: 32 Id: 33 Id: 36 Id: 37 Id: 38 Id: 39 sentinel sentinel sentinel sentinel sentinel Id: 128 Id: 129 Id: 130 Id: 131 Id: 132 128_inputFile1_1 contents 32_inputFile1_0 contents sentinel Id:528 32_inputFile2_0 contents CSS434 DFS 23

24. DFS Example Sun NFS Client A Server Client B / / / bin usr bin usr bin opt shared org shared export export User process User process VFS VFS VFS Local FS NFS client Local FS NFS server Local FS NFS client RPC stub RPC stub RPC stub CSS434 DFS 24

25. Sun NFS Installation  Server:  Check if NFS is running: rpcinfo –p  Start NSF: /etc/rc.d/init.d/nfs start  Edit /etc/exports file: /dir/to/export client1(permissions), client2(…  Export dirs in /etc/exports: exportfs –a  Check exported directories: showmount –e  Client:  Import a server’s directory: mount –o options server_name:/dir /my_dir  bg: continue working on importing upon a failure,  intr: a process will be interupted if its I/O request to the server dir is pending.  soft: allowing a client to time out the connection after a number of retries  rw/ro: normal r/w or read only client portmapper  Underlying Connections: NFS mount service port permission mountd portmapper 2049 rpc nfs CSS434 DFS 25

26. Sun NFS Overviews  Communication  RPC: a compound procedure  Lookup, Open, and Read  Server status  Stateless: simple implementation in ver 3.  Statefull: allowing clients to cache files in ver 4.  RPC call back from a server to invalidate a client’s cache  Synchronization  Session semantics  File Locking in ver 4: lock, lockt, locku, and renew  Ex. Emacs: Tests with lockt when modifying buffer, locks a file with lockt, and unlock with locku after writing buffer contents to the file.  Share reservation: specify how to share a file (with ro, wo, or r/w) CSS434 DFS 26

27. SUN NFS Overviews (Cont’d)  Caching  In client’s memory  Session semantics  Revalidation of client’s cache upon re-opening the same file  Open delegation:  A server delegates a open decision to a writing client which can handle an open request from other clients on the same machine.  A server calls back the client when receiving an open request from another machine.  Fault Tolerance  RPC failure: use a duplicate-request cache  File locking failure: provide a grace period during which a client reclaim locks previously granted and the server builds up its previous state. CSS434 DFS 27

28. Sun NFS Duplicate Request Cache client server client server client server XID = 1234 XID = 1234 XID = 1234 XID = 1234 Too soon, ignore Too soon, ignore Transaction Transaction Transaction completed completed completed XID = 1234 reply reply reply Just replied, ignore XID = 1234 reply Then, when does the server delete this cached result? CSS434 DFS 28

29. DFS Example Andrew File System Workstations Servers User Venus program Vice UNIX kernel UNIX kernel User Venus Network program UNIX kernel Vice Venus User program UNIX kernel UNIX kernel CSS434 DFS 29