25 March 2002
Operating Systems
--------
Read Nutt, Chapter 12
Virtual Memory, Paging, Segmentation

Exam Stats: (not including extra credit)

456:  average 56.2
      median  53
      high    68 
      low     44

256:  average 44.5
      median  45
      high    58
      low     10

--------


Page replacement policies
  When OS runs out of available frames
  choose a victim page:
    if victim has been written, use a dirty bit, and only write back changed
      pages 
    global vs.                              local
      - choose any page                      - can only page own pages
      - can manage system performance        - can manage process perf.

  Goal: lowest page fault rate

  Optimal: replaces pages that will not be used for longest period of time
           with 4 frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------
          *1 1 1 1 1 1 1 1 1 1*4 4         6 page faults
            *2 2 2 2 2 2 2 2 2 2 2
              *3 3 3 3 3 3 3 3 3 3
                *4 4 4*5 5 5 5 5 5
                
         Cannot achieve optimal, but can use for measuring how well the
         replacement scheme performs
         (similar to shortest job first in scheduling)

Random:  + simple
         - does not perform well

FIFO
         + simple
         - no correlation between time in memory and use freq.
         
           3 Frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------
          *1 1 1*4 4 4*5 5 5 5 5 5 
            *2 2 2*1 1 1 1 1*3 3 3     9 page faults
              *3 3 3*2 2 2 2 2*4 4


           4 Frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------
          *1 1 1 1 1 1*5 5 5 5*4 4    10 page faults 
            *2 2 2 2 2 2*1 1 1 1*5
              *3 3 3 3 3 3*2 2 2 2
                *4 4 4 4 4 4*3 3 3

          Belady's anamoly
            more frames does not necessarily mean fewer page faults

Least Frequently used 
   count number of references to a page

LRU
        replace page not used for longest period of time
        an instance of a stack algorithm
        stacks cannot suffer from Belady's anomaly
          set of pages in memory for n frames is a subset of pages in memory
           for n+1 frames
          n pages in mmeory those still htere if n_1 frames available
          
          
           4 Frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------   8 page faults
           
           how to implement:
           COUNTER
             each page has counter, each time page referenced,  
                  copy clock into counter
             replace counter with lowest clock

           STACK
             keep stack of page numbers in doubly lnked list
             when referenced, remove from stack, put on top
             when replace, use tail pointer
           
           generally, difficult to implement efficiently, 
             requires hardware support
             keeping "time of access is difficult in hardware

Approximate LRU
  Reference bit with each page - not used recently (approximation)
    initially 0
    set to 1 when referenced
    cleared periodically by OS
    replace page with a 0 reference bit
       we don't know which one among those to choose



General - keep pages in a linked list, scan for a victim
        1. scan when run out of free frames
        2. keep free fram pool (1/4 of memory)
           mainly useful for dirty page writing
           can page a frame without blocking
             processes need not wait while dirty pages written
             "effectively" done in background

 how to choose frames for free frame pool
   use (ref bit, dirty bit) pairs
   (0,0) (0,1) (1,0) (1,1)  best->worst performance

   Pick up where left off in scan, find enough victims, then stop
     CLOCK: flip ref bit if 1, take first 0 or (0,0)

   scheduling the scans
     too infrequent - all ref bits set
     too frequent   - too many frames to choose from, may not make good choice

   Can tune the victim rate by using a TWO HANDED CLOCK
     introduced, Berkely Unix
     two hands separated by some constant number of frames
     lead hand clears reference bits
     trailing hand inspects ref bits
     intuition: with a really big memory and only one hand, most bits will
       always be on.  Two hands shrinks the memory to the # frames between the
       hands.  outside, bits are mostly on

 Useful:

   1. if you keep track of what is in the free frame pool, can use without
   reloading.  

   2. write dirty pages in background
      - with temporarily reduced frame free pool, restart faulting process as
        soon as possible
      - or so that don't have to write later, they will be clean later.

-------------------------

What if not-helpful hardware:
  cannot restart instructions in hardware
    - no demand paging
  no dirty bit
    - assume all dirty, write all
    - not common to see this in systems
  no reference bit
    - simulate in software - slow!



-------------------
-------------------
Segmentation
  internal fragmentation in paging schemes if overestimate stack/heap
  user view: collection of variable size segments
     special registers for offset of (in Intel 80x86)
       code segment 
       data segment
       stack segment
       extra segment - if needed
       
     identify elements within segment by offset
     address= segname/num:offset
  segment table maintained by each program
    seg num -> base, length (limit)
    for each accessible segment

  problem: memory allocation leads to exernal fragmentation
           fixed by paged segmentation

segmented paging - start with paging, then logically segment
  pure paging: user thinks contiguous, 
               internal fragmentation from overestimation of dynamic structs
  seg pag 
    1. logically distinct, dynamically sized portion of address space
      given virtual addr long way apart - will never run into one another
    2. page table implemeted s.t. big unused sections do not cost much
       do not use linear array, use trees, linked list

paged segmentation - start with segmentation, then page it
  seg review: dynamic space mgmnt to allocate physical mem
              external fragmentation
  idea: page the segments 
  seg table entries: indicate how to find page table for segment
  multics: separate page table for each segment
  i386: single page table for each process
        segment table lookup provides offset into page table
      


       
-------
Shared libraries
       save disk space - no copies in every executable
       save memory space - no copies in each running process
       upgrade libs w/o recompliation - upgrade auto
  position independent compilation
  1. all internal code refs are pc relative
                  data refs are base register relative
  2. external refs made though indirection tables
     private to each proc
     tables placed with proc specific internal data
  
  data/code separate - share code among processes
  no data in code

  refs from shared lib to main prog not allowed

-------
Swap space:

provide best throughput for memory management system
use: hold entire image of swapped out process 
     hold pages not in memory
     optioN: unix: place swap space on multiple disks,
             spread out i/o requests, increase throughput

location: separate partition - optimize storage for speed, 
                             not storage efficiency
          same partition as user files - inefficient - long access times

management
  allocate neough when process starts
  kernel tracks allocation in swap map
    two per proc text/data
    text - fixed size
         all but last chunk ssame size e.g., 512, last 70k
    data - grows over time
           map is of fixed size
           blocks of varying size
           block i is 2^i x 16k (min size set by operator
              16 32 64 128 256
           grow by bounds to a max
  solaris option: allocate swap space only when paged out of memory
-------

caches
  sit between processor and memory
  tlb is one form of cache
     fully associative - expensive
     find index x
     less than 100 entries
  cache 
     thousands of entries
     set associative
     find index x - might be in one of several locations
     
  cache entry : line   > word  < page
  when cache miss, bring in line
  works because of
    1. temporal locality
    2. spatial locality
    
  typiclaly data cache separte from instr cache
    want 2 cycle access time
    8-64k primary cache

  secondary cache, 10 cycles, 256k-1M

  off chip cache, 50 cycles, a few MB

  main memory, 100 cycles or more

cache concerns
  write to cache needs to get back to memory
    write-through: immediate to memory, simple, more bus traffic
    write-back: dirty bits, written when cache entry replaced

  coherence among multiprocessors
   snoop for writes
   write to line x, evicts x from other caches
     only on secondary cache if primary cache is write-through
     2nd cache evicts items from primary cache