24 April 2002 Operating Systems Distributed Computing Chapter 17 ------------ Collection of computers that appear to users as a single commputer dist op sys: make computers seem this way network OS: no hiding, explicit telnet/ftp, just support from the os distributed vs. parallel why distribute - economics: original mainfraims price/performance ratio - speed : parallelize - inherent distribution: some apps inherently distributed e.g., bank, CSCW, dist. games - reliability: failure detection reconfiguration after failure recovery critical apps - e.g., nuclear reactor - more is better - incremental growth - flexibility at corporate level Not just independent PCs - need network share peripherals/ data comunication at person to person level spread load over resources - CONDOR Negatives: - building software is hard how much should be hidden from the user (transparency) how much should user/OS do? - communication network not perfect messageslost, overloaded, rewiring is expensive - data sharing - consistency problems - resource allocation, deadlock - secuirty Distributed Operating System ---------------------------- migration - data - code - process/computation why - load balancing - better n/w utilization - natural distribution - resource access (h/w, s/w) + data Data migration: move/copy files - NFS (to some degree) replication of database Code migration: RMI stubs, Java applets flexibility Process migration: a.k.a. mobile agents move processing closer to data. e.g. database access issues: communication with moving processes Computation migration - subtle difference from process RPC - transparent where computation is preformed msg passing - explicit - synchronous or asynchronous Transparency os appears to be time sharing e.g., AMOEBA hide distribution - SAME semantics Location transparency - hide location of resources migration transparency - resources can move without changing names replication transparency - OS can make copies. e.g., files concurrency transparency - users do not interfere with one another parallelism transparency - users do not see parallelization of progs - handled by compiler - runtime system - os do you always want full transparency? e.g., printer Flexibility how much in kernel - microkernel arguments IPC, mem mgmt, low level process mgmt (scheduling), low level I/O rest user level services Reliability Lamport definition of a distributed system: "One on which I cannot get any work done because some machine I have never heard of has crashed" Secuirty fault tolerance - communication faults - machine faults - disk crashes Performance response timme, throughput, system utilization, network capacity consumption message passing overhead balance overhead with gain in granularity scalability don't degrade too much as add nodes bottlenecks - centralization points ------------------- Supporting Distributed Computation -- both dist os and nw os creation/destruction remotely create a file, move it(?), kill it Scheduling explicit or transparent Condor, transparent synchronization/coordination/concurrencty control general interaction semaphores - assume shared memory to store lock * event ordering, mutual exclusion, atomic transactions deadlock management prevention ------------------- ------------------- Models of distributed computing Message passing/remote procedure call/distributed objects/shared memory