1 .Implementing Consistency --
Some slides from Michael
2 . What do clients see?
n Distributed stores use replication
n Fault tolerance and scalability
n Does replication necessitate inconsistency?
n Harder to program, confusing for clients
n How to reach consensus/data
consistency in distributed system that
can tolerate non-malicious failures?
n We saw some consistency models –
how to implement them?
4 .Another perspective
n Lock is the easiest way to manage
n Mutex and semaphore.
n Read and write locks.
n In distributed system:
n No master for issuing locks.
5 .Recall, consistency models
6 .Implementing Linearizability
7 .Implementing Linearizability
8 . Ok, what to do?
n We want consistency and availability
n Two options
1. Master Replica Model
n All operations and ordering happen on a single master
n Replicates to secondary copies
2. Multi-master model
n Read/write anywhere
n Replicas order and replicate content before returning
9 .Coordination protocols
10 .Two phase commit (2PC)
11 .What about failures?
n If one or more acceptors fail:
n Still ensure linearizability if |R|+|W|>N+F
n Read and write quoroms of acceptors overlap
in at least one non-failed node
n Leader fails?
n Bye bye J: system no longer live
n Pick a new leader?
n How do we agree?
n Need to make sure that group is know
12 .Consensus protocol:
n One value accepted
n It was proposed by some node
n All N nodes agree on the same value
n Some proposed value is eventually chosen
n Each node eventually learns it
n Fault tolerance
n If <= F faults in a window, consensus reached
n Liveness not guaranteed: if >F no consensus
13 .Given desired F, what is N?
n Crash faults need 2F+1 processes
n Byzantine faults (malicious) need 3F+1
n i.e., some replicas are trying to
intentionally lie to prevent consensus or
change the value
14 .Why is agreement hard?
n What if more than one node is leader?
n What if network is partitioned?
n What if leader crashes in middle?
n What if new leader proposes different
values than those committed?
n Network is unpredictable, delays are
15 .Strawman solution I
n One node X designated as acceptor
n Each proposer sends its value to X
n X decides one value and announces it
n Failure of acceptor halts decision
n Breaks fault-tolerance requirement!
16 .Strawman II
n Each proposer (leader) proposes to all
n Acceptor accepts first proposal, rejects rest
n Acks proposer
n If leader receives acks from majority, picks
that value and sends it to replicas
n Multiple proposals – may not get a majority
n What if leader dies before chosing value?
n Widely used family of algorithms for
n Due to Leslie Lamport
n Basic approach
n One or more nodes decide to act like a
n Proposes a value, tries to get it accepted
n Announces value if accepted
18 .Paxos has three phases
20 .Paxos Properties
n Paxos is guaranteed safe.
n Consensus is a stable property: once
reached it is never violated; the agreed
value is not changed.
21 .Paxos Properties
n Paxos is not guaranteed live.
n Consensus is reached if “a large enough
subnetwork...is non-faulty for a long
n Otherwise Paxos might never terminate.
22 .Combining Paxos and 2pc
n Use paxos for view-change
n If anybody notices current master or one or more replicas
n Propose view change to paxos to establish new group
n Forms the new group for 2pc
n Use 2PC for actual data
n Writes go to master for 2pc
n Reads from any replica or master
n No liveness if majority of nodes from previous view
n What if a node comes back/joins?
23 .Example system
n Apache zookeeper
n Used by a large number of Internet
n Leader election
24 .CAP Conjecture
n System can have two of:
n C: Strong consistency
n A: Availability
n P: Tolerance to network partition
n 2PC: CA
n Paxos: CP
n Eventual consistency: AP