11-Distributed Transactions and Replication

Distributed transaction --Two-phase commitment protocol File replication --Group communication revisited --Primary copy replication --Active replication ----Read-any-write-all protocol ----Available copy protocol ----Quorum-based protocol

1.CSS434 Distributed Transactions and Replication Textbook Ch 14 - 15 Professor: Munehiro Fukuda CSS434 Replication 1

2.Outline  Distributed transaction  Two-phase commitment protocol  File replication  Group communication revisited  Primary copy replication  Active replication  Read-any-write-all protocol  Available copy protocol  Quorum-based protocol CSS434 Replication 2

3. Distributed Transaction Example: Banking Transaction openTransaction join participant closeTransaction A a.withdraw(4); . join BranchX T participant b.withdraw(T, 3); Client B b.withdraw(3); T = openTransaction join BranchY a.withdraw(4); c.deposit(4); participant b.withdraw(3); d.deposit(3); C c.deposit(4); closeTransaction D d.deposit(3); Note: the coordinator is in one of the servers, e.g. BranchX BranchZ CSS434 Replication 3

4.Transaction Commitment  How can all participant servers either com mit a transaction or abort it?  One-phase atomic commit protocol  The coordinator keep requesting all participants to co mmit until they return an acknowledgment.  No chance of a participant to initiate an abort.  Two-phase commit protocol  Phase 1: calls for participants’ vote.  Phase 2: Complete a commit or an abort according to output of vet. CSS434 Replication 4

5. Two-Phase Commit Protocol operations canCommit?(trans)-> Yes / No Call from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote. doCommit(trans) Call from coordinator to participant to tell participant to commit its part of a transaction. doAbort(trans) Call from coordinator to participant to tell participant to abort its part of a transaction. haveCommitted(trans, participant) Call from participant to coordinator to confirm that it has committed the transaction. getDecision(trans) -> Yes / No Call from participant to coordinator to ask for the decision on a transaction after it has voted Yes but has still had no reply after some delay. Used to recover from server crash or delayed messages. CSS434 Replication 5

6. Two-Phase Commit Protocol Communication Coordinator Participant step status step status canCommit? 1 prepared to commit (waiting for votes) Yes 2 prepared to commit 3 committed doCommit (uncertain) haveCommitted 4 committed done CSS434 Replication 6

7. Two-Phase Commit Protocol State Transition Coordinator Worker 1 Worker 2 INIT INIT INIT Client_wants_to_commit CanCommit? CanCommit? CanCommit? Vote-Yes Vote-Yes WAIT CanCommit? CanCommit? READY READY Vote-No Vote-No Vote-No Vote-Yes doAbort doCommit doAbort doCommit doAbort doCommit Ack Ack Ack Ack ABORT COMMIT ABORT COMMIT ABORT COMMIT Another possible cases: The coordinator didn’t receive all vote-Yes. → Time out and send a doAbort. A worker didn’t receive a CanCommit?. → All workers eventually receive a do Abort. A worker didn’t receive a doCommit. CSS434 → Time out and check the other work’s status. Replication 7

8. File Replication Concepts  Difference between replication and caching  A replica is associated with a server, whereas a cache with client.  A replicate focuses on availability, while a cache on locality  A replicate is more persistent than a cache is  A cache is contingent upon a replica  Advantages  Increased availability/reliability  Performance enhancement (response time and network traffic)  Scalability and autonomous operation  Requirements  Naming: no need to be aware of multiple replicas.  Consistency: data consistency among replicated files.  Replication control: explicit v.s. implicit/lazy replication  ACID: Atomicity, Consistency, Isolation, and Durability CSS434 Replication 8

9. File Replication Basic Architectural Model 1. Request: send a client request to a server. Front Replica 2. Coordination: deliver the Client End Manger request to each replica manger in some order. Replica 3. Execution: process a client Manger request but not permanently commit it. Front 4. Agreement: agree if the Client Replica End Manger execution will be committed (ex. Two-phase commit protocol) Ex: DNS Web server 5. Response: respond to the front end CSS434 Replication 9

10. Review: Group Communication  Group membership service  Create and destroy a group.  Add or withdraw a replica Replica Manger manager to/from a group.  Detect a failure. Replica  Notify members of group Client Manger membership changes.  Provide clients with a group Replica address. Manger  Message delivery  Absolute ordering Replica Manger  Consistent ordering group CSS434 Replication 10

11. Review: Group Communication Example: ISIS  Group view p1 Joins the group multicast multicast p2 crashed rejoins multicast p3 p4 Multicast to Deleted or delivered? available processes In ISIS, if P4 receives this partially multicast message at the same time when it knows p3 has been crashed, it forwards it to all the others and immediately sends a flush message. In other words, P1, P2, and P4 receive this multicast message as if P3 was still alive. CSS434 Replication 11

12. Review: Group Communicatio n Absolute Ordering - Linearizability  Rule: Ti < Tj  Mi must be delivered before mj if Ti < Tj  Implementation: Ti  A clock synchronized among machines mi  A sliding time window used to commit Tj message delivery whose timestamp is in this window. mi  Example:  Distributed simulation mj  Drawback mj  Too strict constraint  No absolute synchronized clock  No guarantee to catch all tardy messages CSS434 Replication 12

13. Review: Group Communication Total Ordering - Sequential Consistency  Rule: Ti < Tj  Messages received in the same Ti order (regardless of their timestamp). Tj  Implementation: mj  A message sent to a sequencer, assigned a sequence number, and mj finally multicast to receivers  A message retrieved in incremental mi order at a receiver mi  Example:  Replicated database update  Drawback:  A centralized algorithm CSS434 Replication 13

14. Multi-copy Update Problem  Keep in mind the basic architecture and group communication models, how can we update multiple copies over replica servers?  Read-only replication  Allow the replication of only immutable files.  Primary backup replication  Designate one copy as the primary copy and all the others as secondary copies.  Active backup replication  Access any or all of replicas  Read-any-write-all protocol  Available-copies protocol  Quorum-based consensus CSS434 Replication 14

15. Primary-Copy Replication 1. Request: The front end sends a request to the primary replica. 2. Coordination:. The primary takes the re quest atomically. Replica 3. Execution: The primary executes and st Front Client Manger ores the results. End Primary 4. Agreement: The primary sends the upd Backup ates to all the backups and receives an Replica ask from them. Manger 5. Response: reply to the front end. Front  Advantage: an easy implementation, lin Client Replica End Manger earizable, coping with n-1 crashes. Backup  Disadvantage: large overhead especiall y if the failing primary must be replaced Ex: Sun NIS (Yellow Page) with a backup. CSS434 Replication 15

16. Active Replication 1. Request: The front end multicasts to all replicas. 2. Coordination:. All replica take the request in the sequential order. Client Front Replica 3. Execution: Every replica executes t End Manger he request. 4. Agreement: No agreement neede Replica Manger d. 5. Response: Each replies to the fron Front t. Client Replica End Manger  Advantage: achieve sequential con sistency, cope with (n/2 – 1) byza ntine failures  Disadvantage: no more linearizabl e CSS434 Replication 16

17. Read-Any-Write-All Protocol Read from any one of them  Read  Lock any one of replicas for a Front Replica Client End Manger read  Write Write to all of them Replica  Lock all of replicas for a write Manger  Sequential consistency Front Client Replica  Intolerable for even 1 failing End Manger replica upon a write. CSS434 Replication 17

18. Available-Copies Protocol  Read  Lock any one of replicas for a read Read from any one of them  Write  Lock all available replicas Front Replica Client Manger for a write End Write to all available replicats  Recovering replica  Bring itself up to date by X Replica Manger coping from other servers Front before accepting any user Client Replica request. End Manger  Better availability  Cannot cope with network partition. (Inconsistency in two sub-divided network groups) CSS434 Replication 18

19. Available Copies Protocol Example 1: Gossip If (Tj > Tk) Categorized in lazy available update RMk copies protocol else RMk discard the gossip Tardy messages are ignored Gossip message RMi RMj (Ti) (Tj) Query, Tf Update, Tf Update id Value, Ti If (Tf < Ti) If (Tf > Tj) return value FE update RMj FE else { (Tf) else { waits for RMi to be Query Value Update update Client updated or or Client Client ignore and update RMj} query RMj/RMk} CSS434 Replication 19

20. Available Copies Protocol Example 2: Bayou Committed Tentative Categorized in lazy available copies protocol C0 C1 C2 CN T0 T1 T2 T3 Tn Tn+1 Primary Tardy messages are reordered or merged.  To make a tentative update RM RM committed: Sent first  Perform a dependency Sent later check  Check conflicts FE FE FE FE  Check priority  Merge Procedure Tn T0 T1 T3  Cancel tentative updates  Change tentative updates Client Client Client Client Secretary and other employees: Executive: book 3pm book 3pm CSS434 Replication 20

21. Network Partitions Well-known Solution: Quorum-Based Protocols #replicas in read quorum + #replicas in write quorum > n  Read  Retrieve the read quorum Read quorum  Select the one with the Replica Replica Replica Manger Manger Manger latest version. Front Client  Perform a read on it End  Write Replica Replica Replica  Retrieve the write quorum. Manger Manger Manger  Find the latest version and Front Client increment it. End Replica Replica  Perform a write on the Manger Manger entire write quorum. Write quorum  If a sufficient number of Read-any-write-all: r = 1, w = n replicas from read/write quorum, the operation must be aborted. CSS434 Replication 21

22. Network Partitions System example: Coda . Normal case: • Read-any, write-all protocol • Whenever a client writes back its file, it increments the file version at each server. Network disconnection: • A client writes back its file to only available servers. • Version conflicts are detected and resolved automatically when network is reconne 3. Client disconnection: • A client caches as many files as possible (in hoard walking). • A client works in local if disconnected (in emulation mode). • A client writes back updated files to servers (in reintegration mode). WW W Version[2,2,3] Version[3,3,2] Version[3,3,2] emulation Version[2,2,2] Version[2,2,2] Version[2,2,2] Version[1,1,1] Version[1,1,1] Version[1,1,1] hoard Server 3 Server 2 Server 1 reintegration CSS434 Replication 22

23.Paper Review by Students  ISIS System  Gossip Architecture  Bayou System  Coda  Discussions  What if a message is lost in ISIS group communication? What if another crash occurs when unstable/flush messages are exchanged?  What performance drawbacks does Gossip have?  What problems remain to users in Bayou?  Why doesn’t Coda use read/write quorum? CSS434 Replication 23

24. Non-Turn-In Exercises 1. The following state transition diagram describes the two-phase commitment pr otocol. Let’s assume that worker1 crashed when a coordinate sent a commit m essage. Trace this diagram. To be specific, make appropriate dashed arrows “th ick and solid arrows” with your pen or pencil. Coordinator Worker 1 Worker 2 INIT INIT INIT Client_wants_to_commit CanCommit? CanCommit? CanCommit? Vote-Yes Vote-Yes WAIT CanCommit? CanCommit? READY READY Vote-No Vote-No Vote-No Vote-Yes doAbort doCommit doAbort doCommit doAbort doCommit Ack Ack Ack Ack ABORT COMMIT ABORT COMMIT ABORT COMMIT CSS434 Replication 24

25.Non-Turn-In Exercises 2. Textbook p762, Q17.1: In a decentralized variant of the two-phase commit protocol, the participants communicate directly with one another instead of indirectly via the coordinator. In phase 1, the coordinator sends its vote to all the participants. In phase 2, if the coordinator’s vote is No, the participants just abort the transaction; if it is Yes, each participant sends its vote to the coordinator and the other participants, each of which decides on the outcome according to the vote and carries it out. Calculate the number of messages and the number of rounds it takes. What are its advantages or disadvantages in comparison with the centralized variant? 3. Textbook p816, Q18.10: Explain why allowing backups to process read operations directly, (i.e., without contacting a primary), leads to sequentially consistent rather than linearizable executions in a primary- copy replication. 4. Textbook p816, Q18.11: Could the gossip architecture be used for a distributed computer game as describe below? 5. The players move figures around a common scene. The state of the game is replicated at the players’ workstations and at a server, which contains services controlling the game overall, such as collision detection. Updates are multicast to all replicas. 6. The quorum-based replication protocol can address network partition problems. Why didn’t Coda use this protocol? Explain the reason. 7. What if a message is lost in ISIS group communication? Describe a solution. CSS434 Replication 25