20 March 2002
Operating Systems
--------
Read Nutt, Chapter 12
Virtual Memory, Paging, Segmentation

--------

Virtual Memory

When to do address mapping between virtual address and physical address  


Paging

Typical page sizes: 4, 8, 16 k  
        On Solaris systems in department: 8
        smaller pages have less internal fragmentation
                but more overhead

Keep a page table per process
  map from page to frame
  each memory access is divided into two parts:
     frame : offset
  
Need fast page table access: hardware support
  have some dedicated registers
      + quick
      - expensive for large page tables 
          (typical page tables may have 1 million entries)
  keep in main memory
      have a page table base register point to page table
      changing page tables only changes this pointer
      - increases translation time:
           index into page table
           then go to memory
        so each memory access is two memory accesses
        
  Translation lookaside buffers
      aka. associative registers
      each entry is key : value
            key - page number
            value - frame number
      TLB does search in parallel, outputs key if found
      fast when found, only about 10% overhead, not 2X as before
      when not found, update TLB for next time
      flush TLB each time swap processes 

What if page table gets really big?
  suppose logical address sizes of 2^32
  if page size is 4k, page table has 1 million entries
  if each page entry is 4 bytes, page table is 4MB!!


Simple Solution: Multi-level paging
  page the page table
  logical address (on a 32 bit machine with 4K page size)
    divide into three parts:
    10 bit | 10 bit | 12 bit
       a       b        c
    ab is the page number
    c  is the page offset
    a is index into "outer page table"
    b is index into "inner page table"

  Can extend the number of levels arbitrarily, but with cost
    each access to a different level is a memory access

Another idea: Inverted page tables
  solving the resource requirements of the page tables
  in reality, in a large page table, most entries are empty     
    most pages are not loaded
  big idea: keep a table with only the mapped entries
  Total space is one entry for each real page of memory
  entry is virtual address of the page stored in that memory location
      (with info about which  process owns it)
  less space, more search time for a page reference
  use hash table to limit the search time to "a few" entries
     - maybe 2x size of memory

Paging and memory protection
  can protect/share each page of information
  associate protection bits in page table with each frame
  specify read/write access
  illegal accesses are trapped by OS
  
Shared pages
  shared code
    one copy of read-only code
    e.g., text editors, compilers, window systems
  private code and data
    each process keeps separate copy of the code and data

------------
Now, how do we get pages into memory?  
     two places: initialization and on a page miss

We can treat initialization as a page miss:
   first access to the program generates a page fault
     we bring the page into memory
     doesn't load any pages which are not needed
   pure demand paging

PrePaging: preload certain pages
           guess initially
           can guess well when you swap a process back into memory

Note: paging is NOT same as swapping

   
Page replacement policies
  When OS runs out of available frames
  choose a victim page:
    if victim has been written, use a dirty bit, and only write back changed
      pages 
    global vs.                              local
      - choose any page                      - can only page own pages
      - can manage system performance        - can manage process perf.

  Goal: lowest page fault rate

  Optimal: replaces pages that will not be used for longest period of time
           with 4 frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------
          *1 1 1 1 1 1 1 1 1 1*4 4         6 page faults
            *2 2 2 2 2 2 2 2 2 2 2
              *3 3 3 3 3 3 3 3 3 3
                *4 4 4*5 5 5 5 5 5
                
         Cannot achieve optimal, but can use for measuring how well the
         replacement scheme performs
         (similar to shortest job first in scheduling)

Random:  + simple
         - does not perform well

FIFO
         + simple
         - no correlation between time in memory and use freq.
         
           3 Frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------
          *1 1 1*4 4 4*5 5 5 5 5 5 
            *2 2 2*1 1 1 1 1*3 3 3     9 page faults
              *3 3 3*2 2 2 2 2*4 4


           4 Frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------
          *1 1 1 1 1 1*5 5 5 5*4 4    10 page faults 
            *2 2 2 2 2 2*1 1 1 1*5
              *3 3 3 3 3 3*2 2 2 2
                *4 4 4 4 4 4*3 3 3

          Belady's anamoly
            more frames does not necessarily mean fewer page faults

Least Frequently used 
   count number of references to a page

LRU
        replace page not used for longest period of time
        an instance of a stack algorithm
        stacks cannot suffer from Belady's anomaly
          set of pages in memory for n frames is a subset of pages in memory
           for n+1 frames
          n pages in memory those still there if n_1 frames available
          
          
           4 Frames
           1 2 3 4 1 2 5 1 2 3 4 5
           -----------------------   8 page faults
           
           how to implement:
           COUNTER
             each page has counter, each time page referenced,  
                  copy clock into counter
             replace counter with lowest clock

           STACK
             keep stack of page numbers in doubly linked list
             when referenced, remove from stack, put on top
             when replace, use tail pointer
           
           generally, difficult to implement efficiently, 
             requires hardware support
             keeping "time of access is difficult in hardware