01-System Models

Parallel versus distributed systems Service layers Platform models Middleware models Reasons for distributed systems

1.CSS434 System Models Textbook Ch2 Professor: Munehiro Fukuda CSS434 System Models 1

2.Outline  Parallel versus distributed systems  Service layers  Platform models  Middleware models  Reasons for distributed systems CSS434 System Models 2

3. Parallel v.s. Distributed Systems Parallel Systems Distributed Systems Memory Tightly coupled shared Distributed memory memory Message passing, RPC, and/or UMA, NUMA used of distributed shared memory Control Global clock control No global clock control SIMD, MIMD Synchronization algorithms needed Processor Order of Tbps Order of Gbps interconnectio Bus, mesh, tree, mesh of tree, Ethernet(bus), token ring and SCI n and hypercube (-related) (ring), myrinet(switching network network) Main focus Performance Performance(cost and Scientific computing scalability) Reliability/availability Information/resource sharing CSS434 System Models 3

4.Service Layers in Distributed Systems Applications, services Middleware Operating system Platform Computer and network hardware CSS434 System Models 4

5. Distributed Computing Environment DCE Applications Threads RPC Distributed Time Service Security Distributed File Service Name Platforms CSS434 System Models 5

6.Platform Milestones in Distributed Systems 1945-1950s Loading monitor 1950s-1960s Batch system 1960s Multiprogramming 1960s-1970s Time sharing systems Multics, IBM360 1969-1973 WAN and LAN ARPAnet, Ethernet 1960s- Minicomputers PDP, VAX early1980s Early 1980s Workstations Alto 1980s – Workstation/Server Sprite, V-system present models 1990s Clusters Beowulf Late 1990s Grid computing Globus, Legion CSS434 System Models 6

7.Platforms  Minicomputer model  Workstation model  Workstation-server model  Processor-pool model  Cluster model  Grid computing CSS434 System Models 7

8. Minicomputer Model Mini- computer ARPA Mini- net Mini- computer computer  Extension of Time sharing system  User must log on his/her home minicomputer.  Thereafter, he/she can log on a remote machine by telnet.  Resource sharing  Database  High-performance devices CSS434 System Models 8

9. Workstation Model Workstation Workstation 100Mbps Workstation LAN Workstation Workstation  Process migration  Users first log on his/her personal workstation.  If there are idle remote workstations, a heavy job may migrate to one of them.  Problems:  How to find am idle workstation  How to migrate a job  What if a user log on the remote machine CSS434 System Models 9

10. Workstation-Server Model  Client workstations Workstation  Diskless  Graphic/interactive applications processed in local Workstation Workstation  All file, print, http and even cycle computation requests are sent to servers.  Server minicomputers 100Gbps LAN  Each minicomputer is dedicated to one or more different types of services.  Client-Server model of communication  RPC (Remote Procedure Call) Mini- Mini- Mini-  RMI (Remote Method Invocation) Computer Computer Computer  A Client process calls a server process’ file server http servercycle server function.  No process migration invoked  Example: NFS CSS434 System Models 10

11. Processor-Pool Model  Clients:  They log in one of terminals (diskless workstations or X terminals)  All services are dispatched to 100Mbps servers. LAN  Servers:  Necessary number of processors are allocated to Server 1 Server N each user from the pool.  Better utilization but less interactivity CSS434 System Models 11

12. Cluster Model Workstation  Client  Takes a client-server Workstation Workstation model  Server 100Mbps  Consists of many LAN http server2 PC/workstations http server1 http server N connected to a high- speed network. Master Slave Slave Slave  Puts more focus on node 1 2 N performance: serves for requests in 1Gbps SAN parallel. CSS434 System Models 12

13. Grid Computing  Goal Workstation  Collect computing power of supercomputers and clusters sparsely located over the nation and make it available as if it were the electric grid Super- Mini-  Distributed Supercomputing computer computer  Very large problems needing lots of CPU, memory, etc. Cluster  High-Throughput Computing High-speed Information high way  Harnessing many idle resources  On-Demand Computing  Remote resources integrated with Super- local computation Cluster computer  Data-intensive Computing  Using distributed data  Collaborative Computing  Support communication among multiple Workstation Workstation parties CSS434 System Models 13

14. Middleware Models Middleware Models Platforms Client-server model Workstation-server model Services provided by multiple Cluster model servers Proxy servers and caches ISP server Cluster model Peer processes Workstation model Mobile code and agents Workstation model Workstation-server model Thin clients Processor-pool model Cluster model CSS434 System Models 14

15. Client-Server Model File server DNS server Client invocation HTTP server Server invocation result result Server Client Key: Process: Computer: Workstation Workstation Workstation 100Gbps LAN Mini- Mini- Mini- Computer Computer Computer file serverhttp server cycle server CSS434 System Models 15

16. Services Provided by Multiple Servers Service Replication • Availability Server • Performance Client Server Workstation Workstation Workstation Client 100Gbps LAN Server MasterSlaveSlave Slave node 1 2 N Ex. altavista.digital.com DB server 1Gbps SAN CSS434 System Models 16

17.Proxy Servers and Caches Client Web server Proxy server Client Web Ex. Internet Service Provider server Workstation Workstation Workstation 100Gbps LAN MasterSlaveSlave Slave node 1 2 N 1Gbps SAN CSS434 System Models 17

18.Peer Processes Application Application Coordination Coordination code code Application Distributed whiteboard application Coordination code Workstation Workstation 100Gbps Workstation LAN Workstation Workstation CSS434 System Models 18

19. Mobile Code and Agents a) client request results in the downloading of applet code Client Web Applet code server b) client interacts with the applet Web Client Applet server Workstation Workstation Workstation 100Gbps LAN Mini- Mini- Mini- Computer Computer Computer file serverhttp server cycle server CSS434 System Models 19

20. Network Computers and Thin Clients X11 Compute server Network computer or PC Diskless workstations Thin network Application Client Process Workstation Workstation Workstation 100Gbps LAN 100Gbps LAN MasterSlaveSlave Slave node 1 2 N Server 1 Server N 1Gbps SAN CSS434 System Models 20

21. Reasons for Distributed Computing Systems  Inherently distributed applications  Distributed DB, worldwide airline reservation, banking system  Information sharing among distributed users  CSCW or groupware  Resource sharing  Sharing DB/expensive hardware and controlling remote lab. devices  Better cost-performance ratio / Performance  Emergence of Gbit network and high-speed/cheap MPUs  Effective for coarse-grained or embarrassingly parallel applications  Reliability  Non-stopping (availability) and voting features.  Scalability  Loosely coupled connection and hot plug-in  Flexibility  Reconfigure the system to meet users’ requirements CSS434 System Models 21

22. Network v.s. Distributed Operating Systems Features Network OS Distributed OS SSI NO YES (Single System Ssh, sftp, no view of remote Process migration, NFS, Image) memory DSM (Distr. Shared memory) Autonomy High Low Local OS at each A single system-wide OS computer Global job coordination No global job coordination Fault Tolerance Unavailability grows as Unavailability remains faulty machines increase. little even if fault machines increase. CSS434 System Models 22

23. Issues in Distributed Computing System Transparency (=SSI)  Access transparency  Memory access: DSM  Function call: RPC and RMI  Location transparency  File naming: NFS  Domain naming: DNS (Still location concerned.)  Migration transparency  Automatic state capturing and migration  Concurrency transparency (See the next page)  Event ordering: Message delivery and memory consistency  Other transparency:  Failure, Replication, Performance, and Scaling CSS434 System Models 23

24. Issues in Distributed Computing System Event Ordering send receive receive X 1 m1 4 m2 send receive Y 2 3 Physical receive time send Z receive receive m3 m1 m2 A receive receive receive t1 t2 t3 CSS434 System Models 24

25. Issues in Distributed Computing System Reliability  Faults  Omission failure (See the next page.)  Byzantine failure  Fault avoidance  The more machines involved, the less avoidance capability  Fault tolerance  Redundancy techniques  K-fault tolerance needs K + 1 replicas  K-Byzantine failures needs 2K + 1 replicas.  Distributed control  Avoiding a complete fail stop  Fault detection and recovery  Atomic transaction  Stateless servers CSS434 System Models 25

26. Omission and Arbitrary Failure Class of failure Affects Description Fail-stop Process Process halts and remains halted. Other processes may detect this state. Crash Process Process halts and remains halted. Other processes may not be able to detect this state. Omission Channel A message inserted in an outgoing message buffer never arrives at the other end’s incoming message buffer. Send-omission Process A process completes a send, but the message is not put in its outgoing message buffer. Receive-omission Process A message is put in a process’s incoming message buffer, but that process does not receive it. Arbitrary Process or Process/channel exhibits arbitrary behaviour: it may (Byzantine) channel send/transmit arbitrary messages at arbitrary times, commit omissions; a process may stop or take an incorrect step. CSS434 System Models 26

27. Flexibility  Ease of modification  Ease of enhancement User User User User User User applications applications applications applications applications applications Daemons Daemons Daemons Monolithic Monolithic Monolithic (file, name, (file, name, (file, name, Kernel Kernel Kernel Paging) Paging) Paging) (Unix) (Unix) (Unix) Microkernel Microkernel Microkernel (Mach) (Mach) (Mach) Network Network CSS434 System Models 27

28. Performance/Scalability Unlike parallel systems, distributed systems involves OS intervention and slow network medium for data transfer  Send messages in a batch:  Avoid OS intervention for every message transfer.  Cache data  Avoid repeating the same data transfer  Minimizing data copy  Avoid OS intervention (= zero-copy messaging).  Avoid centralized entities and algorithms  Avoid network saturation.  Perform post operations on client sides  Avoid heavy traffic between clients and servers CSS434 System Models 28

29. Heterogeneity  Data and instruction formats depend on each machine architecture  If a system consists of K different machine types, we need K–1 translation software.  If we have an architecture-independent standard data/instruction formats, each different machine prepares only such a standard translation software.  Java and Java virtual machine CSS434 System Models 29