CSC 173
Thurs. Nov 14, 2002
---------------------------------------

Project 4 due on Friday

  ========================================================================


  ------------------------------------------------------------------------

Comparison of Implementations

Consider a graph G with N nodes and E edges (0 <= E <= N^2)

   * Adjacency matrix

        o Is (u,v) an edge? -- O(1)

        o Successors (u) -- O(N)

        o Predecessors (u) -- O(N)

        o Space -- O(N^2) (bits) to store the matrix

        o Best for dense graphs (E ~= N^2)

   * Adjacency lists

        o Is (u,v) an edge? -- O(E/N) on average

        o Successors (u) -- O(E/N) on average

        o Predecessors (u) -- O(N+E)

        o Space -- O(N + E)

        o Best for sparse graphs (E << N^2)

  ========================================================================

Searching a Graph

Many problems can be described using graphs, where the solution to the
problem requires that we search the graph, looking for nodes (or paths) with
a certain property.

Two important graph exploration techniques are

   * breadth-first search: like breadth-first search in a tree, we search as
     broadly as possible by visiting a node, and then immediately visiting
     all nodes adjacent to that node.

   * depth-first search: like depth-first search in a tree, we search as
     deeply as possible by visiting a node, and then recursively performing
     depth-first search on each adjacent node.

In both algorithms we keep track of the nodes we've already seen, and
decline to visit a node twice.  If the graph is not connected we may
need to employ multiple starting points in order to explore it all.

Depth-first search is naturally recursive.  Breadth-first search
requires a queue.

  ------------------------------------------------------------------------

Breadth-First Search Algorithm

    BFS(vertex u)
        queue Q
        u.marked = true
        // do whatever is appropriate upon first visiting u
        Q.enqueue(u)
        while not Q.empty()
            v = Q.dequeue()
            for all neighbors w of v
                if not w.marked
                    w.marked = true
                    // do whatever is appropriate upon first visiting w
                    Q.enqueue(w)
    main
        for all nodes u
            u.marked = false
        for all nodes u
            if not u.marked
                BFS(u)

  ------------------------------------------------------------------------

Analysis of the Algorithm

Each vertex is placed in the queue once, so the while loop in BFS is
executed at most N times.  Same for the two loops in main.

Each edge is examined once in the for all neighbors loop, whose body is
therefore executed at most E times.

Assuming we maintain head and tail pointers for the queue, enqueue, dequeue,
and empty are all O(1).

The algorithm requires O(N + E).

  ------------------------------------------------------------------------


Depth-First Search Algorithm

    DFS(vertex u)
        u.marked = true
        // do whatever is appropriate upon first visiting u
        for all neighbors v of u
            if not v.marked
                DFS(v)
    main
        for all nodes u
            u.marked = false
        for all nodes u
            if not u.marked
                DFS(u)

  ------------------------------------------------------------------------

Analysis of the Algorithm

The number of calls to DFS is O(N), since we never call DFS on a marked
node, and we mark a node on entering DFS.

The total time spent traversing adjacency lists in the for loop of DFS is
O(E).

The algorithm requires O(N + E).


  ------------------------------------------------------------------------

Depth-First Search Trees

Since we never visit a node twice, our exploration of a graph using DFS
resembles a tree.

   * if DFS(v) causes a recursive call DFS(u) then u is a child of v in the
     tree.

   * the children of v appear left to right in the tree in the order they
     are marked.

DFS(v) produces a depth-first search tree with node v at the root.

In some graphs, it isn't possible to reach all nodes from a given start
node. That is, a single call to DFS may not visit all nodes in the
graph.  This is why the main program given above calls DFS for every
unmarked node in the graph.  Each such call produces a different
depth-first search tree.  In an undirected graph, these are the
connected components.  The main program thus produces a depth-first
search *forest* for the graph.

A DFS forest allows us to classify all edges in the graph.
    Tree edges end up in the DFS forest.
    Back edges point from a node to an ancestor in the forest.
    Cross edges point from a node to something to the left.

  ------------------------------------------------------------------------

It is sometimes handy to assign *postorder numbers* to the nodes of a graph.
These are induced by a DFS forest: nodes are given numbers in the order
in which they are *last* visited by the DFS algorithm:

    postorder_DFS(vertex u, ref int nextnum)
        u.marked = true
        for all neighbors v of u
            if not v.marked
                DFS(v, nextnum)
        u.ponum = nextnum++

    postorder_main
        for all nodes u
            u.marked = false
        int nextnum = 1
        for all nodes u
            if not u.marked
                postorder_DFS(u, nextnum)

  ------------------------------------------------------------------------

Testing for Cycles

We can test for cycles during DFS, by keeping two marks.  One says
whether a node has been visited.  The other says whether the node is on
the path from the root to the current node.

Alternatively, if we already have postorder numbers, we can use them to
test for cycles.

In either case, we look for an edge (u,v) in the graph such that v is an
ancestor of u in the search tree.  Such an arc represents a cycle
created by following the tree edges from v to u (which must be possible
since v is an ancestor of u), and then the back edge u to v to complete
the cycle.  (In an undirected graph, we have to pay attention only to
back edges that go more than one level up -- going from u to v and
then back to u over the same edge doesn't constitute a cycle.)

Here's the direct algorithm:

    cycle_test_DFS(vertex u, p)
            // p is parent; needed only for undirected case
        u.marked = true
        u.onpath = true
        for all neighbors v of u
            if v.onpath and (graph is directed or v != p)
                announce cycle
                halt
            if not v.marked
                cycle_test_DFS(v, u)
        u.onpath = false

    cycle_test_main
        for all nodes u
            u.marked = false
            u.onpath = false
        for all nodes u
            if not u.marked
                cycle_test_DFS(u, nil)
        announce no cycle

  --------

If there is an edge (u,v) in E, and the postorder number of u is less than
or equal to the postorder number of v, the graph has a cycle.

   * If you visited u first, you would have numbered v first (in postorder),
     and u.postorder > v.postorder. So you must have visited v first.

   * If DFS(v) does not visit u, then v.postorder < u.postorder.

   * If DFS(v) does visit u, the cycle consists of the path produced by
     DFS(v) to u, then u to v using the edge (u,v).

Here's the alternative algorithm:

Testing for Cycles: The Algorithm

    cycle_test_alternate_main
        postorder_main()
        for all nodes u
            for all neighbors v of u
                if u.ponum <= v.ponum       // = catches self-loops
                    announce cycle
                    halt
        announce no cycle

  ------------------------------------------------------------------------

Minimum Cost Spanning Tree

Let G=(V,E) be a connected graph where for all (u,v) in E there is a cost
vector weight[u,v].

   * A graph is connected if every pair of vertices is connected by a path.

A spanning tree for G is a free (unrooted) tree that connects all
vertices in G.

The cost of the spanning tree is the sum of the cost of all edges in the
tree.  We usually want to find a spanning tree of minimum cost.

Example applications:

   * Computer networks - vertices in the graph might represent computer
     installations, edges represent connections between computers.  We
     want to allow messages from any computer to get to any other,
     possibly with routing thru an intermediate computer, with minimum
     cost in connections.

   * Trucking routes - vertices in the graph are cities, and edges are
     courier delivery routes between cities.  We want to service a
     connected set of cities, with minimum cost.

--------------
Kruskal's Algorithm - Minimum Cost spanning tree

Kruskal's algorithm constructs a MCST incrementally.

Initially, each node is in its own MCST, consisting of that node and no
edges.

At each step in the algorithm, the two MCST's that can be connected together
with the least cost are combined, adding the lowest cost edge that links a
vertex in one tree with a vertex in another tree.

When there is only one MCST that includes all vertices, the algorithm
terminates.

Since we must consider the edges in order of their cost, we must sort them,
which requires O(E log E).  Merge can be implemented in O(E log E).  The
key is being able to tell which tree a node is in.  We do this using
what are often called UNION-FIND trees.  These are maintained separately
from the trees we're gluing together to make the MCST.

The UF tree for an initial, one-node set is trivial.
Non-trivial trees have parent pointers but no child pointers.
The root of a UF tree has a null parent pointer.
When merging two UF trees we make the root of the shorter tree a child
of the root of the taller tree.  This guarantees that the height of a
tree is worst-case logarithmic in the number of nodes.

To tell if two nodes are already connected by the partially completed
MCST, we follow UF parent pointers as far as we can and see if we end up
at the same root.  If not, we add the edge between the nodes to our MCST
and merge the UF trees.

Kruskal's algorithm is O(E log E), which is better than Prim's algorithm if
the graph is not dense (ie, E << N^2).

-------
Prim's Algorithm

Initially, Prim's algorithm has one node in the spanning tree, and no
edges.

The algorithm adds nodes to the spanning tree one at a time, in order of
the edge cost to connect to the nodes already in the tree.  Note the
superficial similarity to Dijkstra's SSSP algorithm.

PrimMCST (ref edge_set T)   // T = set of edges in spanning tree
    int closest[N]          // closest[v] = vertex u in U closest to v
    int lowcost[N]          // lowcost[v] = weight[v,u]
    // U is, implicitly, node 0 and the nodes connected to it
    // by edges of T

    T = empty
    for i in 1..N-1
        lowcost[i] = weight[0,i]
        closest[i] = 0
    N-1 times do
        // find the node closest to U and add it to U
        min = lowcost[1]
        k = 1
        for j in 2..N-1             // consider all nodes other than 0
            if lowcost[j] < min     // including (needlessly) all those
                min = lowcost[j]    // already in U
                k = j
        // k is now the node outside U closest to something in U;
        // add it to U
        T += {(closest[k], k)}
        lowcost[k] = infinity
            // k is now in U; make sure we never chose it again
        for j in 1..N-1
            if weight[k,j] < lowcost[j] && lowcost[j] < infinity
                lowcost[j] = weight[k,j]
                closest[j] = k

  ------------------------------------------------------------------------

Analysis of Prim's Algorithm

Prim's algorithm is O(N^2).

The while loop is executed n-1 times, requiring O(N)

   * We add one vertex to U each iteration

   * We exit the loop when U = V

We find the lowest cost edge from U to V-U in O(N) time and, similarly,
update lowcost in O(N) time.  We might be able to reduce constant
overhead by using a slightly more complicated data structure to keep
track of which nodes are in V-U, so we don't consider them in the two
'for' loops.  This would not change the asymptotic complexity of the
algorithm, however, because 1 + 2 + 3 + ... + N is still O(N^2).