Project 1 - Working with Threads

The goal of this assignment is to expose you to thread creation and management as well as to understand thread synchronization issues. You are may choose to work in teams of 1 or 2. Your code should be written in C or C++ and will use the pthreads library.

Due Date

*SECOND* NEW DUE DATE Wed, Feb 19. NO EXTENSIONS. The multiprocessor machines are expected to be back up on Saturday afternoon. Leave only your testing until then. NEW DUE DATE Fri, Feb 14. The multiprocessor machines are expected to be back up on Wed. Leave only your testing until then.

Wed, February 12, 2003. 11:59pm (one minute before midnight). Instructions for turning in are available online. If you encounter problems, contact the TA.

Reminder: NO LATE ASSIGNMENTS. Do not turn in an assignment late. Turn in whatever you have working. Partial work does count, as long as you make it clear in your write-up what you have accomplished, and where you got stuck.

Don't forget to watch the newsgroup ur.cs456 for general communication about the project. Post your questions there and your peers, the TA and the instructor will post answers there as well.

Resources

There are several resources you can access to make your life a little bit easier during this project.

man -k pthread is a good place to start.

An Introduction to Programming with Threads. by A.D. Birrell. Has some nice, practical information.

Note: this link is notoriously not-always up. If you have trouble, Just do a web search to find an "up" version.

Getting Started With POSIX Threads by Tom Wagner and Don Towsley.

Project Description

You will write a properly synchronized, multi-threaded program to solve the traveling salesman problem (TSP).

256 version

Given an undirected graph with edge costs, the problem is to find the least cost tour (path) in a graph of N nodes that visits every node exactly once and ends at the same node at which it started.

As many of you know, finding an optimal solution is an NP-hard problem, and you will be solving it in a brute force way by trying all possible tours, making sure they are valid, and finding the one with the shortest length. Just because the problem is NP-hard does not mean that a solution cannot be found, just that your program that finds it will run very slowly! Take heart, you will not be asked to solve very big instances.

Your program will take two inputs: an ASCII file describing the graph and an integer T specifying the number of threads to be created. If no command-line options are provided, the user should be prompted for the correct parameters. The T threads that you generate will work together to find the best tour. You should use the version of the thread creation calls that give each new thread its own ``LWP'' (kernel-implemented lightweight process).

The input file will contain the integer N, followed by (N²-N)/2 integers that constitute the upper triangle of the adjacency matrix of the graph. Your program must accept arbitrary white space separation between integers. It is important that you use this input format. Two test cases are available: test1, test2

I suggest you solve the problem using a branch and bound algorithm. Think of the graph as defining a tree-structured search space, where every node of the tree represents a visited node of the graph, and the children of a node of the tree represent the nodes of the graph that might be visited next. Without loss of generality, assume we label the root of the tree node 0, indicating that our tour begins at node 0 of the graph. The root of the tree will have N-1 children, indicating the possible nodes to visit next:

The key to branch and bound is as follows. Suppose we keep track at each node of the tree of the cost of the path down from the root. If we have already (in some other part of the tree) found a complete tour of cost C, there is no point in exploring portions of the tree below a node whose cost from the root already exceeds C. We can therefore prune that part of the tree.

Perhaps the simplest way to parallelize the search is for the initial thread to explore the tree down to level 3 or 4 (at which there will be about N² or N³ nodes), and to place these nodes in a queue. (Note that you won't be able to keep the entire tree in memory: just the portion(s) you're working on.) The initial thread then creates T worker threads, each of which repeatedly removes a node from the queue and fleshes out the portion of the tree beneath it. (Clarification: There should be at most T threads operating at a single time, including the initial thread. If T=1, then either you may create a new thread to do the processing outlined above, or use the initial thread to do the processing. If T=2, you should create one thread, and optionally create a second (only if the initial thread does not do any processing).) A global structure keeps track of the cheapest tour found so far by any thread. Both the queue and the cheapest tour structure require synchronization for concurrent access.

Note: we have not yet covered synchronization issues in class, and it is likely that we will not cover the topic before this assignment is due. However, you only need to know user-level synchronization tools to complete this assignment. You should be able to glean this information from the available online documentation. If you do not have much experience with these issues, be sure to start early and ask questions.

You should use the pthreads library to create and synchronize threads. This interface is a Posix standard; code that uses it is portable across a wide variety of systems, including NT and all variants of Unix.

For 256 you should test your code on one of the cycle servers, as these systems have dual processors. Interestingly, you may discover that your multi-threaded program runs faster for some values of T > 1 than it does for T = 1, even on a uniprocessor. This is because the multi-threaded program explores the search space (tree) in a different order than the sequential algorithm does, and may (by sheer coincidence) discover a very low-cost path earlier than the sequential algorithm does, allowing it to prune the tree more effectively.

256 students, see below for information about the writeup.

456 version

In addition to completing the 256 assignment, you are to run your code on a large multiprocessor (aorta.cs.rochester.edu), experiment with a variety of program parameters, and attempt to maximize parallel speedup. Specifically, you are to vary the level of concurrency T, the level of multiprogramming (threads or processes per processor), and the type of synchronization. Warning: aorta runs solaris. While posix threads are supposed to be uniform across platforms, you may run into problems. You may choose to do all of your work on Solaris machines (e.g., heart.cs). In your writeup be sure to indicate what types of machines you have tested on. For full credit you need only work on one platform (solaris).

For synchronization, you must try at least the following two alternatives: (1) the mutex objects of the threads or pthreads library, (2) test-and-set spin locks, You may want to look at the following documentation from the SPARC v9 Architecture Manual:

NB: This documentation is copyright (c) 1994, SPARC International. It is available from the CS and CSUG networks only. Also note that several of the atomic instructions, including CAS are new in version 9 of the SPARC instruction set. You will need to specify the -xarch=v8plus and -Wa command-line switches to the compiler. (Thanks to M.S. for compiling this information). For an example of using TAS and CAS, see atomicTest.c

The level of multiprogramming on a processor depends both on the number of threads in your program and the number of unrelated processes on the machine at the same time. Ideally, you would want to run your final timing experiments when nobody else is using the machine, so more or less all of the activity on the machine would consist of programs you started yourself. This may not be feasible, given concurrent use by the research groups, so use the uptime, ps, and top commands to get an estimate of how much else is going on.

Multiprogramming is related in an important way to synchronization: if a thread is preempted while holding a lock, no other thread can access the protected data until the preempted thread is rescheduled. Intuitively, you would probably expect that spin locks would cause the most serious problems in this regard, and that spin-then-wait locks and lock-free data structures would ameliorate the problem. Do you find this to be the case in practice? How does scheduler-based (mutex) synchronization compare?

256 and 456 Writeup

You will turn in both your code, and a writeup of your project.

In the directory you turn in, you must have a text-only file called README, in which you will cover AT LEAST the following:

Your name (and your partner's name, if applicable) and the project number. If you interacted significantly with others but are not partners, indicate this as well.
A list of all files in the directory and a short description of each.
HOW TO COMPILE your program.
HOW TO USE (execute) your program.
A description of the structure of your program and the algorithms that you used.
In case you have not completed the project, you should mention in significant detail:
- what you have and have not done,
- why you did not manage to complete your project (e.g., greatest difficulties)
This will allow us to give you partial credit for the things you have completed.
Document any bugs of your program that you know of. Run-time errors will cost you fewer points if you document them and you show that you know their cause. Also describe what you would have done to correct them, if you had more time to work on your project.
A clear description of any extensions or special features of your project. This will be used for assigning extra credit.

Note: The last four items need not be contained in the README itself. Instead, they can be included in a ps, or pdf file contained in the directory. Do not submit a MS Word document. 456 students: Your writeup is expected to be a thourough discussion of the project, and such a discussion is likely to need more than pure text. i.e., expect to turn in a .ps or .pdf file.

Specifically for this project, your discussion should contain the results of your experiments, with clear presentation and explanation. Be sure to explain not only what happened, but why.

Also, be sure to indicate whether you did your development on Linux or Solaris. This is important as we WILL compile and execute your program.

Remember: No extensions.

Minor Hints

First, do not start at the last minute.
Do not start by building a large, monolithic system. Design your project carefully, so that your development is incremental. Test as you go. As questions if you get stuck. Talk with others, but do not copy code!

Extra Credit Suggestions

If you're in 256, try the 456 assignment. If you want to try your code on a system with 8 processors, AFTER you get your project working (and working on a solaris system, e.g., harley.csug), come talk with me, and I will help get you access to aorta.cs.
Develop heuristics to improve the efficiency of your search. For example, you might want to consider a greedy algorithm that sorts the children of a node to encourage faster pruning. Another example: the algorithm described above will explore a given sub-path through the graph a very large number of times. Can you cache information for some of these, to reduce redundant work?

Last Change: 23 January 2003 / murphy@cs.rochester.edu