25 March 2002 Operating Systems -------- Read Nutt, Chapter 12 Virtual Memory, Paging, Segmentation Exam Stats: (not including extra credit) 456: average 56.2 median 53 high 68 low 44 256: average 44.5 median 45 high 58 low 10 -------- Page replacement policies When OS runs out of available frames choose a victim page: if victim has been written, use a dirty bit, and only write back changed pages global vs. local - choose any page - can only page own pages - can manage system performance - can manage process perf. Goal: lowest page fault rate Optimal: replaces pages that will not be used for longest period of time with 4 frames 1 2 3 4 1 2 5 1 2 3 4 5 ----------------------- *1 1 1 1 1 1 1 1 1 1*4 4 6 page faults *2 2 2 2 2 2 2 2 2 2 2 *3 3 3 3 3 3 3 3 3 3 *4 4 4*5 5 5 5 5 5 Cannot achieve optimal, but can use for measuring how well the replacement scheme performs (similar to shortest job first in scheduling) Random: + simple - does not perform well FIFO + simple - no correlation between time in memory and use freq. 3 Frames 1 2 3 4 1 2 5 1 2 3 4 5 ----------------------- *1 1 1*4 4 4*5 5 5 5 5 5 *2 2 2*1 1 1 1 1*3 3 3 9 page faults *3 3 3*2 2 2 2 2*4 4 4 Frames 1 2 3 4 1 2 5 1 2 3 4 5 ----------------------- *1 1 1 1 1 1*5 5 5 5*4 4 10 page faults *2 2 2 2 2 2*1 1 1 1*5 *3 3 3 3 3 3*2 2 2 2 *4 4 4 4 4 4*3 3 3 Belady's anamoly more frames does not necessarily mean fewer page faults Least Frequently used count number of references to a page LRU replace page not used for longest period of time an instance of a stack algorithm stacks cannot suffer from Belady's anomaly set of pages in memory for n frames is a subset of pages in memory for n+1 frames n pages in mmeory those still htere if n_1 frames available 4 Frames 1 2 3 4 1 2 5 1 2 3 4 5 ----------------------- 8 page faults how to implement: COUNTER each page has counter, each time page referenced, copy clock into counter replace counter with lowest clock STACK keep stack of page numbers in doubly lnked list when referenced, remove from stack, put on top when replace, use tail pointer generally, difficult to implement efficiently, requires hardware support keeping "time of access is difficult in hardware Approximate LRU Reference bit with each page - not used recently (approximation) initially 0 set to 1 when referenced cleared periodically by OS replace page with a 0 reference bit we don't know which one among those to choose General - keep pages in a linked list, scan for a victim 1. scan when run out of free frames 2. keep free fram pool (1/4 of memory) mainly useful for dirty page writing can page a frame without blocking processes need not wait while dirty pages written "effectively" done in background how to choose frames for free frame pool use (ref bit, dirty bit) pairs (0,0) (0,1) (1,0) (1,1) best->worst performance Pick up where left off in scan, find enough victims, then stop CLOCK: flip ref bit if 1, take first 0 or (0,0) scheduling the scans too infrequent - all ref bits set too frequent - too many frames to choose from, may not make good choice Can tune the victim rate by using a TWO HANDED CLOCK introduced, Berkely Unix two hands separated by some constant number of frames lead hand clears reference bits trailing hand inspects ref bits intuition: with a really big memory and only one hand, most bits will always be on. Two hands shrinks the memory to the # frames between the hands. outside, bits are mostly on Useful: 1. if you keep track of what is in the free frame pool, can use without reloading. 2. write dirty pages in background - with temporarily reduced frame free pool, restart faulting process as soon as possible - or so that don't have to write later, they will be clean later. ------------------------- What if not-helpful hardware: cannot restart instructions in hardware - no demand paging no dirty bit - assume all dirty, write all - not common to see this in systems no reference bit - simulate in software - slow! ------------------- ------------------- Segmentation internal fragmentation in paging schemes if overestimate stack/heap user view: collection of variable size segments special registers for offset of (in Intel 80x86) code segment data segment stack segment extra segment - if needed identify elements within segment by offset address= segname/num:offset segment table maintained by each program seg num -> base, length (limit) for each accessible segment problem: memory allocation leads to exernal fragmentation fixed by paged segmentation segmented paging - start with paging, then logically segment pure paging: user thinks contiguous, internal fragmentation from overestimation of dynamic structs seg pag 1. logically distinct, dynamically sized portion of address space given virtual addr long way apart - will never run into one another 2. page table implemeted s.t. big unused sections do not cost much do not use linear array, use trees, linked list paged segmentation - start with segmentation, then page it seg review: dynamic space mgmnt to allocate physical mem external fragmentation idea: page the segments seg table entries: indicate how to find page table for segment multics: separate page table for each segment i386: single page table for each process segment table lookup provides offset into page table ------- Shared libraries save disk space - no copies in every executable save memory space - no copies in each running process upgrade libs w/o recompliation - upgrade auto position independent compilation 1. all internal code refs are pc relative data refs are base register relative 2. external refs made though indirection tables private to each proc tables placed with proc specific internal data data/code separate - share code among processes no data in code refs from shared lib to main prog not allowed ------- Swap space: provide best throughput for memory management system use: hold entire image of swapped out process hold pages not in memory optioN: unix: place swap space on multiple disks, spread out i/o requests, increase throughput location: separate partition - optimize storage for speed, not storage efficiency same partition as user files - inefficient - long access times management allocate neough when process starts kernel tracks allocation in swap map two per proc text/data text - fixed size all but last chunk ssame size e.g., 512, last 70k data - grows over time map is of fixed size blocks of varying size block i is 2^i x 16k (min size set by operator 16 32 64 128 256 grow by bounds to a max solaris option: allocate swap space only when paged out of memory ------- caches sit between processor and memory tlb is one form of cache fully associative - expensive find index x less than 100 entries cache thousands of entries set associative find index x - might be in one of several locations cache entry : line > word < page when cache miss, bring in line works because of 1. temporal locality 2. spatial locality typiclaly data cache separte from instr cache want 2 cycle access time 8-64k primary cache secondary cache, 10 cycles, 256k-1M off chip cache, 50 cycles, a few MB main memory, 100 cycles or more cache concerns write to cache needs to get back to memory write-through: immediate to memory, simple, more bus traffic write-back: dirty bits, written when cache entry replaced coherence among multiprocessors snoop for writes write to line x, evicts x from other caches only on secondary cache if primary cache is write-through 2nd cache evicts items from primary cache