分布式系统和算法:绪论

来自美国一流公立高等学府爱荷华大学(The University of Iowa)的课件,课程主题为分布式系统和算法。第一章首先介绍了什么是分布式系统和应用实例,概述分布式系统的优点、实现所面临的问题、实施方案,模型介绍,分布式算法的复杂性等内容
展开查看详情

1. CS 5620 Distributed Systems and Algorithms Sukumar Ghosh Department of Computer Science University of Iowa Spring 2015

2.What is a distributed system? 1

3. What is a distributed system? 2 0 1 11 5 4 8 3 7 10 6 9 A channel may be physical (wired, wireless) or logical Abstract view: It is a network of processes. (The nodes are processes, and the edges are communication 1 channels.)

4. Facts  It is now hard to find system that are not distributed. Technology has dramatically reduced the cost of processors, so their population is exploding.  User demands for services have increased the scale of systems (Facebook has more than 600 million users)  We live in a networked society. 3

5. Examples Large networks are very commonplace these days. Think of the world wide web. A few examples of distributed systems are: - eBay for internet-based auction - Sensor networks - BitTorrent (P2P network) for downloading video / audio - Skype for making free audio and video communication - Facebook (the oxygen of many people) - Process control networks in engineering factories - Computational grids (OSG, Teragrid, SETI@home) What are - Network of mobile robots collectively doing a job these? - Distance education, net-meeting etc. - Netbanking - Vehicular networking 4

6. Sensor Network The sensor network is checking the structural integrity of the bridge 5

7. Mobile robots I-Swarm Robot The I-Swarm project, consisting of 10 research institutes, (See a video of the I-Swarm is coordinated by Professor Heinz Wörn and Jörg Seyfried of the University of Karsruhe in Germany. Robots on YouTube) 6

8. Goal of a distributed system The computers coordinate their activities and to share hardware and software and data, so that users perceive it as a single, integrated computing service with a well-defined goal. Downloading music in Bittorrent 7

9. Goal continued Distributed computing relies on inter-process communication, P which involves the various layers of networking. Distributed computing helps create simple abstractions for these layers to facilitate program writing. Examples: (1)TCP implements a reliable end-to-end communication channel, Q (2) Media Access protocol used in Ethernet LAN or Wireless networks helps resolve network access conflict.Create a reliable channel between P and Q that are 10,000 miles away 8

10. Why distributed systems • Geographic distribution of processes • Resource sharing (example: P2P networks, grids) • Computation speed up (as in a grid or cloud) • Fault tolerance and uncertainty management 9

11.Distributed computation Not distributed Distributed Computation 9

12. Important challenges • Knowledge is local • Clocks are not synchronized • No globally shared address space • Topology and routing : everything is dynamic • Scalability: what is this • Processes and links fail: Fault tolerance and system availability 11

13.Some common subproblems • Leader election • Mutual exclusion • Time synchronization • Distributed snapshot • Reliable multicast • Replica management • Consensus 12

14. Implementation Most of the practical distributed systems have a real network as its backbone. However, such systems can also be simulated on a shared-memory multiprocessor, or even on a single processor, or in the cloud. (How will you do it? Think of simulating multiple processes, and mailboxes between pairs of communicating processes) 13

15. Implementation Clouds are attractive platforms for the implementation of distributed systems. Processes are mapped to virtual machines. Communication channels between virtual machines are implemented using different kinds of tools (like virtual serial ports). These solutions easily scale with no investment on the infrastructure. 13

16. Models We will reason about distributed systems using models. There are many dimensions of variability in distributed systems. Examples: - types of processors - inter-process communication mechanisms - timing assumptions - failure classes - security features, etc 14

17. Models Models are simple abstractions that help algorithms overcome the variability -- abstractions that preserve the essential features, but hide the implementation models details and simplify writing Implementation distributed algorithms for of models problem solving Real hardware Optical or radio communication? PC or Mac? Are clocks perfectly synchronized? 15

18. A classification Server Clients Client-server model Peer-to-peer model Server is the coordinator No unique coordinator 16

19. Parallel vs Distributed In both parallel and distributed systems, the events are partially ordered. The distinction between parallel and distributed is not always very clear. In parallel systems, the primarily issues are speed-up and increased data handling capability. In distributed systems the primary issues are fault-tolerance, synchronization, uncertainty management etc. Grid P2P Parallel Distributed 17

20. The Case of Facebook The new Facebook data center in Prineville, Oregon. The new servers have been redesigned are networked, for energy efficiency, speed-up and for fault-tolerance. user The set up mimics client-server kind of operation, with the servers having a high level of parallelism. However, the user network of servers also form a distributed system. user 30,000 servers 17

21. Objective of the course With some knowledge of networking and its associated tools, it is not difficult to put together a distributed system. It is however, much more difficult guarantee that it behaves the way we want it to behave. Here lies the challenge. Remember that a system that “sometimes work” is no good. We will study what are the critical issues, why a system fails, and how we can guarantee our design. 18

22.Understanding models and abstractions algorithms models Implementation of models Real hardware

23.Message passing vs. shared memory Difference between two inter-process communication models

24. Modeling Communication System topology is a graph G = (V, E), where V = set of nodes (sequential processes) E = set of edges (links or channels, bi/unidirectional). Four types of actions by a process: - internal action - input action - communication action - output action

25. Example: A Message Passing Model A Reliable FIFO Channel P Axiom 1. Message m sent ⇔ message m received Axiom 2. Message propagation delay is arbitrary but finite. Q Axiom 3. m1 sent before m2 ⇒ m1 received before m2.

26. Life of a process When a message m arrives A m B 1. Receive it 2. Evaluate a predicate (with message m and the local variables); 3. if predicate = true then C D update zero or more internal variables; send zero or more messages; E end if

27. Example: Shared memory model Address spaces of processes overlap M1 M2 Processes 1 3 2 4 Concurrent operations on a shared variable are serialized

28.Variations of shared memory models 0 1 2 State reading model Each process can read 3 the states of its neighbors Link register model 0 1 2 Each process can read from and write to adjacent 3 registers. The entire local state is not shared.

29. What is the difference between a synchronous distributed system and an asynchronous distributed system?