The goal of this assignment is to expose you to thread creation and
management as well as to understand thread synchronization issues.
You are may choose to work in teams of 1 or 2. Your code should be
written in C or C++ and will use the pthreads
library.
Wed, February 12, 2003. 11:59pm (one minute before midnight). Instructions for turning in are available online. If you encounter problems, contact the TA.
Reminder: NO LATE ASSIGNMENTS. Do not turn in an assignment late. Turn in whatever you have working. Partial work does count, as long as you make it clear in your write-up what you have accomplished, and where you got stuck.
Don't forget to watch the newsgroup ur.cs456
for general
communication about the project. Post your questions there and your peers,
the TA and the instructor will post answers there as well.
man -k pthread
is a good place to start.
Note: this link is notoriously not-always up. If you have trouble, Just do a web search to find an "up" version.
Given an undirected graph with edge costs, the problem is to find the least cost tour (path) in a graph of N nodes that visits every node exactly once and ends at the same node at which it started.
As many of you know, finding an optimal solution is an NP-hard problem, and you will be solving it in a brute force way by trying all possible tours, making sure they are valid, and finding the one with the shortest length. Just because the problem is NP-hard does not mean that a solution cannot be found, just that your program that finds it will run very slowly! Take heart, you will not be asked to solve very big instances.
Your program will take two inputs: an ASCII file describing the
graph and an integer T
specifying the number of threads
to be created. If no command-line options are provided, the user
should be prompted for the correct parameters. The T
threads that you generate will work together to find the best tour.
You should use the version of the thread creation calls that give
each new thread its own ``LWP'' (kernel-implemented lightweight
process).
The input file will contain the integer N, followed by (N2-N)/2 integers that constitute the upper triangle of the adjacency matrix of the graph. Your program must accept arbitrary white space separation between integers. It is important that you use this input format. Two test cases are available: test1, test2
I suggest you solve the problem using a branch and bound algorithm. Think of the graph as defining a tree-structured search space, where every node of the tree represents a visited node of the graph, and the children of a node of the tree represent the nodes of the graph that might be visited next. Without loss of generality, assume we label the root of the tree node 0, indicating that our tour begins at node 0 of the graph. The root of the tree will have N-1 children, indicating the possible nodes to visit next:
The key to branch and bound is as follows. Suppose we keep track at each node of the tree of the cost of the path down from the root. If we have already (in some other part of the tree) found a complete tour of cost C, there is no point in exploring portions of the tree below a node whose cost from the root already exceeds C. We can therefore prune that part of the tree.
Perhaps the simplest way to parallelize the search is for the initial thread to explore the tree down to level 3 or 4 (at which there will be about N2 or N3 nodes), and to place these nodes in a queue. (Note that you won't be able to keep the entire tree in memory: just the portion(s) you're working on.) The initial thread then creates T worker threads, each of which repeatedly removes a node from the queue and fleshes out the portion of the tree beneath it. (Clarification: There should be at most T threads operating at a single time, including the initial thread. If T=1, then either you may create a new thread to do the processing outlined above, or use the initial thread to do the processing. If T=2, you should create one thread, and optionally create a second (only if the initial thread does not do any processing).) A global structure keeps track of the cheapest tour found so far by any thread. Both the queue and the cheapest tour structure require synchronization for concurrent access.
Note: we have not yet covered synchronization issues in class, and it is likely that we will not cover the topic before this assignment is due. However, you only need to know user-level synchronization tools to complete this assignment. You should be able to glean this information from the available online documentation. If you do not have much experience with these issues, be sure to start early and ask questions.
You should use the pthreads
library to create and
synchronize threads. This interface is a Posix standard; code that
uses it is portable across a wide variety of systems, including NT
and all variants of Unix.
For 256 you should test your code on one of the cycle servers, as these systems have dual processors. Interestingly, you may discover that your multi-threaded program runs faster for some values of T > 1 than it does for T = 1, even on a uniprocessor. This is because the multi-threaded program explores the search space (tree) in a different order than the sequential algorithm does, and may (by sheer coincidence) discover a very low-cost path earlier than the sequential algorithm does, allowing it to prune the tree more effectively.
256 students, see below for information about the writeup.
In addition to completing the 256 assignment, you are to run
your code on a large multiprocessor
(aorta.cs.rochester.edu
), experiment with a variety of
program parameters, and attempt to maximize parallel speedup.
Specifically, you are to vary the level of concurrency T, the level
of multiprogramming (threads or processes per processor), and the
type of synchronization. Warning: aorta runs solaris. While
posix threads are supposed to be uniform across platforms, you
may run into problems. You may choose to do all of your work on
Solaris machines (e.g., heart.cs). In your writeup be sure to
indicate what types of machines you have tested on. For full credit
you need only work on one platform (solaris).
For synchronization, you must try at least the following two
alternatives: (1) the mutex
objects of the
threads
or pthreads
library, (2)
test-and-set spin locks, You may want to look at the following
documentation from the SPARC v9 Architecture Manual:
Test-and-set
instruction (actually LDSTUB
:
load-store unsigned byte)
Compare-and-swap
(CAS
) instruction
MEMBAR
instruction
CAS
are new in version 9 of the
SPARC instruction set. You will need to specify the
-xarch=v8plus
and -Wa
command-line switches to the
compiler. (Thanks to M.S. for compiling this information). For an example of
using TAS and CAS, see atomicTest.c
The level of multiprogramming on a processor depends both on the
number of threads in your program and the number of unrelated
processes on the machine at the same time. Ideally, you would want
to run your final timing experiments when nobody else is using the
machine, so more or less all of the activity on the machine would
consist of programs you started yourself. This may not be feasible,
given concurrent use by the research groups, so use the
uptime
, ps
, and top
commands
to get an estimate of how much else is going on.
Multiprogramming is related in an important way to
synchronization: if a thread is preempted while holding a lock, no
other thread can access the protected data until the preempted
thread is rescheduled. Intuitively, you would probably expect that
spin locks would cause the most serious problems in this regard, and
that spin-then-wait locks and lock-free data structures would
ameliorate the problem. Do you find this to be the case in
practice? How does scheduler-based (mutex
)
synchronization compare?
You will turn in both your code, and a writeup of your project.
In the directory you turn in, you must have a text-only file called README, in which you will cover AT LEAST the following:
Specifically for this project, your discussion should contain the results of your experiments, with clear presentation and explanation. Be sure to explain not only what happened, but why.
Also, be sure to indicate whether you did your development on Linux or Solaris. This is important as we WILL compile and execute your program.
Remember: No extensions.