Project 4/5 - Shared Memory

The goal of these projects is to expose you to both memory mapped files and communication using shared memory both on a single host and across multiple hosts.

Due Dates

This is a five-week, two-part assignment. Each part will count as a separate, full project grade.

Part I is due on Monday, April 14, 2001. 11:59pm.

Part II is due on Friday, May 2 at 11:59pm.

Read the project description CAREFULLY to see the differences between the 256 and 456 versions.

You can work either alone or in a group of 2. I STRONGLY encourage you to work in pairs. It will make your life easier, and working with someone is a good experience for you to have.

Reminder: NO LATE ASSIGNMENTS. Do not turn in an assignment late. Turn in whatever you have working. Partial work does count, as long as you make it clear in your write-up what you have accomplished, and where you got stuck.

Instructions for submitting the assignment are available online. If you encounter problems, contact the TA.

Resources

There are several resources you can access to make your life a little bit easier during this project.

Gautam Altekar has provided (with no guarantees) a library of synchronization primitives to use on the Linux systems. synch32.cpp and synch32.h
The library has several copies of the Stevens book UNIX Network Programming, Vol. 2
Both of the ug labs as well as the grad lab have copies three chapters from the Stevens book. Please use these as reference, and DO NOT remove them from the lab. You may temporarily remove the copy from the lab in order to make your own personal copy, but please do not take a set to make a copy unless a copy remains in the lab. If you find that there is NOT a copy in the lab, notify the TA or instructor.
man mmap
(for Part II) A Socket class in C++ with all the basic functionality (and a small TCP echo server for testing). Thanks to Grigoris Magklis for these routines.

Project Description

Part I (project 4)

In this part of the project, you will have multiple processes on a single host sharing information through a shared memory mapped segment. Your system will support one block of shared memory, which will be logically divided into multiple pages, each of which will contain multiple words. Your implementation should be flexible enough to support any (reasonable) power of two for the number of pages and number of words per page. You will provide two user-callable functions for access to the shared memory, int DSM_read(int i, int *v, int size) and int DSM_write(int i, int *v, size). Both functions will take an integer address i (index into the shared block, from which you can extract both the page and the offset in that page) and an integer pointer parameter v which will either receive an array of the read values or contain the values to be written, and size which will contain the number of values to be read/written. Both functions should return a success integer (1=success, 0=failure). An error should be thrown if the read/write crosses a page boundary. To work incrementally on this project, I suggest that you start with functions that read/write ONLY ONE word. The extension for multiple words amounts to obtaining a lock on each page prior to the performing the operation.

You will also provide two calls to create and destroy a block of shared memory (int DSM_create(int numPages, int pageSize) and int DSM_destroy()). As before, these calls should return an integer noting success/failure. In the create call, if the mapped segment already exists, the process simply binds to it, otherwise it is created fresh. For destroy, you can assume that only one of the sharing processes will call this function.

Your system must support strict consistency. That is, the value returned by each call DSM_read(i,v,s) must be the values v, of the most recent call DSM_write(i,v,s). In the case of a write, the writing process must first acquire ownership of the page being accessed, and ensure that no other process is concurrently reading the page.

For 456, you must allow multiple, concurrent readers. This is a place for extra credit for 256 students.

For 256 and 456, your system must be fair; meaning that a process that requests a read or a write, it must eventually be allowed to complete it's operation.

How you manage the exclusive access is up to you. Here are a couple of options. You can use a separate memory mapped segment to store shared variables which are manipulated using the atomic test-and-set or compare-and-swap operations which were used in the first programming assignment (or the library provided by Gautam above). These variables should be used to implement a user-level semaphore (be careful that you do maintain the fairness requirement). Another option is to figure out how to actually put a semaphore into shared memory so that it is available to multiple processes. There are examples of this in the Stevens book chapters.

All memory should be initialized to 0 (zero).

You must support up to 8 processes. IMPORTANT NOTE: All students must support up to 8 processes (not just 456). If you have significant problems with this, come talk with the TA or instructor. As always, start simple, and add the more complex features later.

You need not support any sort of persistence of the memory segment after the processes complete. For simplicity, you can also assume that the process which creates the memory segment is the last one to complete execution.

You should provide a test case which reads and writes the data in some pattern which shows that your have the correct access guarantees. Test cases provided by the TA are available!!! Instructions are included with the files. Note, these do not test all possible error conditions. Also, they do not test writing of multiple words with a single function call.

For extra credit, consider implementing fine grained locking. This means that if one process is writing to words 1..3 and another process is writing to words 5..10 of the same page, they could both proceed in parallel. You must clearly describe the technique that you employed, and how mutual exclusion is guaranteed.

For a SMALL amount of extra credit, consider implementing two additional operations, namely DSM_lock(i) and DSM_unlock(i) which have the effect of getting an exclusive lock on the segment indicated by the parameter i. To get extra credit for this, you MUST provide BOTH documentation and a test case which clearly shows that these operations are functioning correctly. You need not do any form of deadlock prevention, although this is something interesting to think about.

Part II. (project 5)

The main idea with this project is to extend your single host shared memory mechanism to allow processes on multiple hosts to share the same information. Basically, this is a simplified form of Distributed Shared Memory (DSM).

A DSM_write operation now must consider the fact that there may be multiple copies of the data on multiple hosts. Should the write operation invalidate those caches? Should the cache be updated immediately after the write? Make a choice, implement it, document it, and motivate it in your writeup.

For the distributed version, it will be convenient for each page to have a single owner process. Initially, a single process can own all of the pages. Each page may have a different owner, which can change during the execution. To simplify the task of locating the current owner of a particular page (to request access), you can assume a known location server, which is always informed of the current owner of each page, and is responsible for coordinating transfers of page ownership. The server needs to keep track of multiple concurrent request for page ownership (possibly for the same page), and ensure fairness (i.e., any process that requests to write some page eventually is able to do so).

For 256, you need only support two hosts. For 456, you must support up to four hosts. In both cases, you can assume that the identities of all hosts are known in advance.

You must still allow two processes on the same host to memory map the same piece of memory.

As always, your writeup should contain a description of the design of your project, and any assumptions you had to make in order to get the project working. If you have not completed the project, include a description of where you had problems, and what you would have done to fix them if you had more time.

Additionally, answer the following questions in your writeup. Note: the following questions will be worth 30% of your project grade. Answering them completely and well is in your best interest.

What gave you the biggest headache during this project? Describe what the problem was, why it was difficult, and what you did to overcome it and make your headache go away.
In the Interweave DSM system (developed by some members of the systems group here), processes can decide on many different models for data coherence (in other words, for deciding if the version of the data they have cached is "recent enough"). The simplest model, and the one you implemented in this project, is full coherence where every time a client reads from the memory, that memory is up to date. The opposite end of the spectrum is null coherence where the client uses whatever data is available, without regard for how up to date it is. If they want an up-to-date version, it must be explicitly requested by the client. Three models which fit somewhere "in between" are delta coherence (client data is no more than x versions out of date), temporal coherence (client data is no more than x time units out of date), and diff-based (client data is no more than x% different). All of these models are w.r.t. the version of the data on the server.
Why would a programmer want to use these different models? Briefly describe an application which would use each of these models. To give you a hint, you can think about caching web pages with a DSM system, keeping a calendar for scheduling a classroom, using DSM to share the output of a simulation back end with a visualization front end (e.g., a mobile ad hoc network simulation where you periodically monitor the traffic going through a specific node), etc. Be creative.
As already mentioned, your coherence model is "full coherence". In this model, you had a choice in your implementation of whether to invalidate any remote versions of a page when the write completed (essentially an active invalidation --- a push model) or to validate a page's accuracy when it was to be read. (essentially a lazy approach --- a pull model). Which model did you choose? Under what circumstances will one model out-perform the other? Would it ever be beneficial in a push model to also push the data? Why/why not? Would any of the test cases provided by the TA have benefited from this approach? If so, which ones and why. If not, why not.
Suppose we have a process (P1) which steps through shared pages, always making its next change to the data based on the previous values it has seen. This effectively creates an inherent dependence among the versions of the pages. For example, if P1 writes a value to page A, creating page (A2 -- page A, version 2), then to page B creating page B2, B2's data is dependent on the data in A2.

Suppose we have a second process (P2) which goes around reading pages. It has version A1 in its cache which is older than the one written by the above process. When P2 reads page B2, it gets a new value of the page (because P2 has specified full coherence for page B), and does some computation on the new data. When P2 goes to read page A, A1 is effectively old because the data already read in page B2 was based on data written since the cached copy of page A1. In other words, B2 is based on A2, but we only have A1, and it would be "wrong" to read A1 because it has been made invalid by the write to B2 done by P1.

How can our server can help us with this problem? Think about the fact that when a process writes a page, the most other data that page can depend on is other pages which that client has cached. Clearly if a process has not cached a page, then no data it writes can depend on that cache. Another way to phrase this question is: What information can the server keep so that when a process requests a page, the server will know the maximal set of page versions which should be invalidated? Hint: You will need to keep information with each page about the other pages it "might" depend on. The main question here is how you will keep this data up to date.
Suppose I'm a reckless programmer and I want to put a pointer into the segment that I share. We know this is a bad idea because we can have the memory segment mapped into different places in each process. But suppose instead of putting a raw pointer I put a relative pointer which says how far away from my 'current' location I should jump. What restrictions should be put on this relative pointer? Can it cross page boundaries? Why or why not? This is probably dependent on your implementation.

murphy@cs.rochester.edu