CSC 173 Thurs. Nov 14, 2002 --------------------------------------- Project 4 due on Friday ======================================================================== ------------------------------------------------------------------------ Comparison of Implementations Consider a graph G with N nodes and E edges (0 <= E <= N^2) * Adjacency matrix o Is (u,v) an edge? -- O(1) o Successors (u) -- O(N) o Predecessors (u) -- O(N) o Space -- O(N^2) (bits) to store the matrix o Best for dense graphs (E ~= N^2) * Adjacency lists o Is (u,v) an edge? -- O(E/N) on average o Successors (u) -- O(E/N) on average o Predecessors (u) -- O(N+E) o Space -- O(N + E) o Best for sparse graphs (E << N^2) ======================================================================== Searching a Graph Many problems can be described using graphs, where the solution to the problem requires that we search the graph, looking for nodes (or paths) with a certain property. Two important graph exploration techniques are * breadth-first search: like breadth-first search in a tree, we search as broadly as possible by visiting a node, and then immediately visiting all nodes adjacent to that node. * depth-first search: like depth-first search in a tree, we search as deeply as possible by visiting a node, and then recursively performing depth-first search on each adjacent node. In both algorithms we keep track of the nodes we've already seen, and decline to visit a node twice. If the graph is not connected we may need to employ multiple starting points in order to explore it all. Depth-first search is naturally recursive. Breadth-first search requires a queue. ------------------------------------------------------------------------ Breadth-First Search Algorithm BFS(vertex u) queue Q u.marked = true // do whatever is appropriate upon first visiting u Q.enqueue(u) while not Q.empty() v = Q.dequeue() for all neighbors w of v if not w.marked w.marked = true // do whatever is appropriate upon first visiting w Q.enqueue(w) main for all nodes u u.marked = false for all nodes u if not u.marked BFS(u) ------------------------------------------------------------------------ Analysis of the Algorithm Each vertex is placed in the queue once, so the while loop in BFS is executed at most N times. Same for the two loops in main. Each edge is examined once in the for all neighbors loop, whose body is therefore executed at most E times. Assuming we maintain head and tail pointers for the queue, enqueue, dequeue, and empty are all O(1). The algorithm requires O(N + E). ------------------------------------------------------------------------ Depth-First Search Algorithm DFS(vertex u) u.marked = true // do whatever is appropriate upon first visiting u for all neighbors v of u if not v.marked DFS(v) main for all nodes u u.marked = false for all nodes u if not u.marked DFS(u) ------------------------------------------------------------------------ Analysis of the Algorithm The number of calls to DFS is O(N), since we never call DFS on a marked node, and we mark a node on entering DFS. The total time spent traversing adjacency lists in the for loop of DFS is O(E). The algorithm requires O(N + E). ------------------------------------------------------------------------ Depth-First Search Trees Since we never visit a node twice, our exploration of a graph using DFS resembles a tree. * if DFS(v) causes a recursive call DFS(u) then u is a child of v in the tree. * the children of v appear left to right in the tree in the order they are marked. DFS(v) produces a depth-first search tree with node v at the root. In some graphs, it isn't possible to reach all nodes from a given start node. That is, a single call to DFS may not visit all nodes in the graph. This is why the main program given above calls DFS for every unmarked node in the graph. Each such call produces a different depth-first search tree. In an undirected graph, these are the connected components. The main program thus produces a depth-first search *forest* for the graph. A DFS forest allows us to classify all edges in the graph. Tree edges end up in the DFS forest. Back edges point from a node to an ancestor in the forest. Cross edges point from a node to something to the left. ------------------------------------------------------------------------ It is sometimes handy to assign *postorder numbers* to the nodes of a graph. These are induced by a DFS forest: nodes are given numbers in the order in which they are *last* visited by the DFS algorithm: postorder_DFS(vertex u, ref int nextnum) u.marked = true for all neighbors v of u if not v.marked DFS(v, nextnum) u.ponum = nextnum++ postorder_main for all nodes u u.marked = false int nextnum = 1 for all nodes u if not u.marked postorder_DFS(u, nextnum) ------------------------------------------------------------------------ Testing for Cycles We can test for cycles during DFS, by keeping two marks. One says whether a node has been visited. The other says whether the node is on the path from the root to the current node. Alternatively, if we already have postorder numbers, we can use them to test for cycles. In either case, we look for an edge (u,v) in the graph such that v is an ancestor of u in the search tree. Such an arc represents a cycle created by following the tree edges from v to u (which must be possible since v is an ancestor of u), and then the back edge u to v to complete the cycle. (In an undirected graph, we have to pay attention only to back edges that go more than one level up -- going from u to v and then back to u over the same edge doesn't constitute a cycle.) Here's the direct algorithm: cycle_test_DFS(vertex u, p) // p is parent; needed only for undirected case u.marked = true u.onpath = true for all neighbors v of u if v.onpath and (graph is directed or v != p) announce cycle halt if not v.marked cycle_test_DFS(v, u) u.onpath = false cycle_test_main for all nodes u u.marked = false u.onpath = false for all nodes u if not u.marked cycle_test_DFS(u, nil) announce no cycle -------- If there is an edge (u,v) in E, and the postorder number of u is less than or equal to the postorder number of v, the graph has a cycle. * If you visited u first, you would have numbered v first (in postorder), and u.postorder > v.postorder. So you must have visited v first. * If DFS(v) does not visit u, then v.postorder < u.postorder. * If DFS(v) does visit u, the cycle consists of the path produced by DFS(v) to u, then u to v using the edge (u,v). Here's the alternative algorithm: Testing for Cycles: The Algorithm cycle_test_alternate_main postorder_main() for all nodes u for all neighbors v of u if u.ponum <= v.ponum // = catches self-loops announce cycle halt announce no cycle ------------------------------------------------------------------------ Minimum Cost Spanning Tree Let G=(V,E) be a connected graph where for all (u,v) in E there is a cost vector weight[u,v]. * A graph is connected if every pair of vertices is connected by a path. A spanning tree for G is a free (unrooted) tree that connects all vertices in G. The cost of the spanning tree is the sum of the cost of all edges in the tree. We usually want to find a spanning tree of minimum cost. Example applications: * Computer networks - vertices in the graph might represent computer installations, edges represent connections between computers. We want to allow messages from any computer to get to any other, possibly with routing thru an intermediate computer, with minimum cost in connections. * Trucking routes - vertices in the graph are cities, and edges are courier delivery routes between cities. We want to service a connected set of cities, with minimum cost. -------------- Kruskal's Algorithm - Minimum Cost spanning tree Kruskal's algorithm constructs a MCST incrementally. Initially, each node is in its own MCST, consisting of that node and no edges. At each step in the algorithm, the two MCST's that can be connected together with the least cost are combined, adding the lowest cost edge that links a vertex in one tree with a vertex in another tree. When there is only one MCST that includes all vertices, the algorithm terminates. Since we must consider the edges in order of their cost, we must sort them, which requires O(E log E). Merge can be implemented in O(E log E). The key is being able to tell which tree a node is in. We do this using what are often called UNION-FIND trees. These are maintained separately from the trees we're gluing together to make the MCST. The UF tree for an initial, one-node set is trivial. Non-trivial trees have parent pointers but no child pointers. The root of a UF tree has a null parent pointer. When merging two UF trees we make the root of the shorter tree a child of the root of the taller tree. This guarantees that the height of a tree is worst-case logarithmic in the number of nodes. To tell if two nodes are already connected by the partially completed MCST, we follow UF parent pointers as far as we can and see if we end up at the same root. If not, we add the edge between the nodes to our MCST and merge the UF trees. Kruskal's algorithm is O(E log E), which is better than Prim's algorithm if the graph is not dense (ie, E << N^2). ------- Prim's Algorithm Initially, Prim's algorithm has one node in the spanning tree, and no edges. The algorithm adds nodes to the spanning tree one at a time, in order of the edge cost to connect to the nodes already in the tree. Note the superficial similarity to Dijkstra's SSSP algorithm. PrimMCST (ref edge_set T) // T = set of edges in spanning tree int closest[N] // closest[v] = vertex u in U closest to v int lowcost[N] // lowcost[v] = weight[v,u] // U is, implicitly, node 0 and the nodes connected to it // by edges of T T = empty for i in 1..N-1 lowcost[i] = weight[0,i] closest[i] = 0 N-1 times do // find the node closest to U and add it to U min = lowcost[1] k = 1 for j in 2..N-1 // consider all nodes other than 0 if lowcost[j] < min // including (needlessly) all those min = lowcost[j] // already in U k = j // k is now the node outside U closest to something in U; // add it to U T += {(closest[k], k)} lowcost[k] = infinity // k is now in U; make sure we never chose it again for j in 1..N-1 if weight[k,j] < lowcost[j] && lowcost[j] < infinity lowcost[j] = weight[k,j] closest[j] = k ------------------------------------------------------------------------ Analysis of Prim's Algorithm Prim's algorithm is O(N^2). The while loop is executed n-1 times, requiring O(N) * We add one vertex to U each iteration * We exit the loop when U = V We find the lowest cost edge from U to V-U in O(N) time and, similarly, update lowcost in O(N) time. We might be able to reduce constant overhead by using a slightly more complicated data structure to keep track of which nodes are in V-U, so we don't consider them in the two 'for' loops. This would not change the asymptotic complexity of the algorithm, however, because 1 + 2 + 3 + ... + N is still O(N^2).