一致性与复制

本文介绍了分布式复制和一致性的基本概念,学习分析一致性模型,以数据为中心的一致性模型,以客户为中心的一致性模型,通过学习深入探讨了其对于OS的影响以及在计算机领域的应用。
展开查看详情

1.Consistency and Replication Some slides modified from Martin Klepmann and anonymous author

2.Replication q  Motivation Ø  Performance Enhancement Ø  Enhanced availability Ø  Fault tolerance Ø  Scalability §  tradeoff between benefits of replication and work required to keep replicas consistent q  Requirements Ø  Consistency §  Depends upon application §  In many applications, we want that different clients making (read/ write) requests to different replicas of the same logical data item should not obtain different results Ø  Replica transparency §  desirable for most applications 2

3.Consistency Models q  Consistency Model is a contract between processes and a data store Ø  if processes follow certain rules, then store will work “correctly” q  Needed for understanding how concurrent reads and writes behave with respect to shared data q  Relevant for shared memory multiprocessors Ø  cache coherence algorithms q  Shared databases, files Ø  independent operations §  our main focus in the rest of the lecture Ø  transactions 3

4.Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Each process interacts with its local copy, which must be kept ‘consistent’ with the other copies. 4

5.Client-centric Consistency Models A mobile user may access different replicas of a distributed database at different times. This type of behavior implies the need for a view of consistency that provides guarantees for single client regarding accesses to the data store. 5

6.What is consistency? q  Consistency model: Ø  A constraint on the system state observable by applications q  Examples: Single object consistency, Ø  Local/disk memory : also called “coherence” write x=5 read x (should be 5) time Ø  Database: x:=x+1; y:=y-1 assert(x+y==const) Consistency across >1 time objects

7. Consistency challenges q  No right or wrong consistency models Ø  Tradeoff between ease of programmability and efficiency Ø  E.g. what’s the consistency model for web pages? q  Consistency is hard in (distributed) systems: Ø  Data replication (caching) Ø  Concurrency Ø  Failures

8.Example application program CPU 0 CPU 1 READ/WRITE READ/WRITE Memory system x=1 y=1 If y==0 If x==0 critical section critical section q  Is this program correct?

9.Example x=1 y=1 If y==0 If x==0 critical section critical section q  CPU0’s instruction stream: W(x) R(y) q  CPU1’s instruction stream: W(y) R(x) q  Enumerate all possible inter-leavings: Ø  W(x)1 R(y)0 W(y)1 R(x)1 Ø  W(x)1 W(y)1 R(y)1 R(x)1 Ø  W(x)1 W(y)1 R(x)1 R(y)1 Ø  …. q  None violates mutual exclusion

10. An example distributed shared memory q  Each node has a local copy of state q  Read from local state q  Send writes to the other node, but do not wait

11.Consistency challenges: example W(x)1 W(y)1 x=1 y=1 If y==0 If x==0 critical section critical section

12.Does this work? W(x)1 W(y)1 R(y)0 R(x)0 x=1 y=1 If y==0 If x==0 critical section critical section

13.What went wrong? W(x)1 W(y)1 R(y)0 R(x)0 CPU0 sees: CPU1 sees: W(x)1 W(y)1 R(y)0 R(x)0

14.Aside: Relationship to transactions? q  Serializability? q  ACID? 14

15.Figure from Martin Klepmann 15

16.Strict consistency q  Each operation is stamped with a global wall- clock time q  Rules: 1.  Each read gets the latest write value 2.  All operations at one CPU have time-stamps in execution order

17.Strict consistency gives “intuitive” results q  No two CPUs in the critical section q  Proof: suppose mutual exclusion is violated CPU0: W(x)1 R(y)0 CPU1: W(y)1 R(x)0 W must have timestamp later than R q  Rule 1: read gets latest write CPU0: W(x)1 R(x)0 CPU1: W(y)1 R(x)0 Contradicts rule 1: R must see W(x)1

18. Strict Consistency Any read on a data item x returns a value corresponding to the result of the most recent write on x. “All writes are instantaneously visible to all processes” time A strictly consistent store A store that is not strictly consistent. Behavior of two processes, operating on the same data item. The problem with strict consistency is that it relies on absolute global time and is impossible to implement in a distributed system. 18

19.Data-centric Consistency Models q  Strict consistency q  Sequential consistency q  Linearizability q  Causal consistency q  FIFO consistency q  Weak consistency q  Release consistency use explicit synchronization q  Entry consistency operations Notation: Ø  Wi(x)a à process i writes value a to location x Ø  Ri(x)a à process i reads value a from location x 19

20.Linearizability q  Definition of sequential consistency says nothing about time Ø  there is no reference to the “most recent” write operation q  Linearizability Ø  weaker than strict consistency, stronger than sequential consistency Ø  operations are assumed to receive a timestamp with a global available clock that is loosely synchronized Ø  “The result of any execution is the same as if the operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program. In addition, if tsop1(x) < tsop2(y), then OP1(x) should precede OP2(y) in this sequence.“ [Herlihy & Wing, 1991] 20

21.Linearizable Client 1 Client 2 X = X + 1; A = X; B = Y; Y = Y + 1; If (A > B) print(A) else …. 21

22. Sequential Consistency - 1 Sequential consistency: the result of any execution is the same as if the read and write operations by all processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program [Lamport, 1979]. Note: Any valid interleaving is legal but all processes must see the same interleaving. P3 and P4 disagree on the order of the writes a)  A sequentially consistent data store. b)  A data store that is not sequentially consistent. 22

23. Sequential Consistency - 2 Process P1 Process P2 Process P3 x = 1; y = 1; z = 1; print ( y, z); print (x, z); print (x, y); x = 1; x = 1; y = 1; y = 1; print (y, z); y = 1; z = 1; x = 1; y = 1; print (x,z); print (x, y); z = 1; print (x, z); print(y, z); print (x, z); print (x, z); z = 1; z = 1; x = 1; print (y, z); print (x, y); print (x, y); print (y, z); print (x, y); Prints: 001011 Prints: 101011 Prints: 010111 Prints: 111111 (a) (b) (c) (d) (a)-(d) are all legal interleavings. 23

24.Not linearizable but sequentially consistent Client 1 Client 2 X = X + 1; A = X; B = Y; Y = Y + 1; If (A > B) print(A) else 24

25.Sequential consistency vs. Linearizability q  Linearizability has proven useful for reasoning about program correctness but has not typically been used otherwise. q  Sequential consistency is implementable and widely used but has poor performance. q  To get around performance problems, weaker models that have better performance have been developed. 25

26.Aside: is there a practical advantage to sequential consistency? q  Yes! q  Paper by Hagit and Welch in 1994 Ø  Showed that the lower bound on linearizable algorithms is higher (often much higher) than the upper bound on latency of sequential consistency Ø  Theoretical paper, but big practical implications 26

27.Linearizability vs. Sequential Consistency 27

28.CAP, ACID and BASE q  Result more useful (and precise!) than the CAP theorem Ø  CAP: 8 years later, proposed by Brewer (2000), and proved/formalized by Gilbert and Lynch (2002) Ø  Brewer in 2012: “Misleading because it tended to oversimplify the tension between the properties” https://www.infoq.com/articles/cap-twelve-years- later-how-the-rules-have-changed q  Still CAP was influential Ø  BASE vs. ACID, NoSQL, … 28

29. Causal Consistency - 1 Necessary condition: Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines. concurrent since no causal relationship This sequence is allowed with a causally-consistent store, but not with sequentially or strictly consistent store. Can be implemented with vector clocks. 29