CSC 173
Tues. Nov 12, 2002
---------------------------------------

Project 4 due on Friday

Unary ops:
  look at prev token:
  a = -4
  a = a - 4 + 4
  a = (a + a) - a
  look at length
  a = ++a
  a = --a


========================================================================

CFGs vs Regular Expressions

Context-free grammars are strictly more powerful than regular
expressions.

   * Any language that can be generated using regular expressions can be
     generated by a context-free grammar.

   * There are languages that can be generated by a context-free grammar
     that cannot be generated by any regular expression.

As a corollary, CFGs are strictly more powerful than DFAs and NDFAs.

The proof is in two parts:

   * Given a regular expression R , we can generate a CFG G such that
     L(R) == L(G).

   * We can define a grammar G for which there there is no FA F such
     that L(F) == L(G).

---------------------------------------

Simulating a Regular Expression with a CFG

To show that CFGs are at least as powerful as regular expressions, we
show how to simulate a RE using a CFG. The construction is similar to
the one used to simulate a regular expression with a FA; we build the
CFG G in pieces, where each piece corresponds to the operands and
operators in the regular expression.

   * Assume the RE is a single operand. Then if RE is epsilon or a
     character in the alphabet, add to G the production

         <RE> --> RE

     If RE is null, don't add a production.

   * Assume the RE is R1R2. Add to G the production

         <RE> --> <R1> <R2>

     and create productions for regular expressions R1 and R2.

   * Assume the RE is R1 | R2. Add to G the production

         <RE> --> <R1> | <R2>

     and create productions for regular expressions R1 and R2.

   * Assume the RE is R1*. Add to G the production

         <RE> --> <R1> <RE> | epsilon

     and create productions for regular expression R1.

---------------------------------------

Example: RE to CFG

We will build a CFG G for the RE (0|1)*111.

First the operands:

    <0> --> 0
    <1> --> 1

Now the innermost operator, union:

    <R1> --> <0> | <1>

Now the closure operator:

    <R2> --> <R1> <R2> | epsilon

Now the concatenation operators:

    <RE> --> R2 R3 R4 R5
    <R3> --> <1>
    <R4> --> <1>
    <R5> --> <1>

The final grammar G is:

    <RE> --> R2 R3 R4 R5
    <R2> --> <R1> <R2> | epsilon
    <R1> --> <0> | <1>
    <R3> --> <1>
    <R4> --> <1>
    <R5> --> <1>
    <0> --> 0
    <1> --> 1

---------------------------------------

A CFG with no Corresponding RE

Recall that FA cannot count. Thus, no FA can recognize the language
{0^n 1^n | n >= 1} (i.e., the set of strings containing one or more
zeros followed by an equal number of ones).

Assume such an FA exists, and it has N states. What happens when the
input string has N+1 zeros in it, followed by N+1 ones?

   * Since the FA only has N states, we must visit some state sT twice
     on seeing N+1 zeros.
   * The FA cannot know whether we are entering sT for the first time,
     when we've seen i < N zeros, or the second time, when we've seen
     j > i zeros.
   * There must be a path from sT to an accepting state, since the input
     string is in the language.
   * The FA will accept an input string without an equal number of zeros
     and ones, since i != j, and there is a path to an accepting state
     from sT on the remaining input.

This language is generated by the following CFG:

  1. S --> 0 S 1
  2. S --> 01

We can prove that this grammar generates the language by induction on n,
the number of zeros and ones in the string.

  1. For the basis step, n = 1, and the string is 01. This string is
     generated by applying the second production once.
  2. For the inductive step, assume we can generate O^n1^n. The last
     production applied must have been production 2, so the string must
     have been 0^(n-1)S1^(n-1). If we apply production 1 and then
     production 2, we get 0^nS1^n, and then 0^(n+1)1^(n+1). Thus, we
     can generate all strings of the form {0^n 1^n|n>=1}.
  3. Since we can only apply production 1 some number of times followed
     by production 2, these are the only strings generated by the grammar.


  -------------------------------------------------

    READING: Aho & Ullman chapter 9

  -------------------------------------------------

Graphs

A graph is a set of nodes (or points) connected by edges (or arcs).

A simple example of a graph is a map of cities connected by roads. The
cities are nodes; the roads are edges. Most questions you might pose about
such a map can be posed in terms of operations on a graph.  For example,
finding the shortest route between two cities, or the shortest route that
visits all cities, are common graph problems.

In fact, a wide variety of problems can be posed as operations on graphs,
including network routing, city planning, VLSI layout, deadlock detection,
and register allocation in a compiler.

As with most data structures, there are several different implementations of
graphs, with different tradeoffs in time and space.

Several classical graph problems (and their solutions) have been studied
extensively because they arise frequently in practical settings, including

   * breadth-first and depth-first search
   * single-source-shortest-path problem
   * all-pairs-shortest-path problem
   * transitive closure
   * minimum spanning tree problem

  ========================================================================

Definitions

A graph is a set of N nodes (or vertices) and E edges.

Each element of E is a pair of nodes (u,v), which means there is an edge (or
arc) between u and v.

   * In a directed graph (or digraph), each edge (u,v) is an ordered pair,
     and there is an arc from u to v.

        o u is a predecessor of v

        o v is a successor of u

   * In an undirected graph, each edge is an unordered pair, and there is an
     undirected arc between u and v.

        o u and v are said to be adjacent

   * In a weighted graph each edge has an associated value (weight).

   * A path in a directed graph is a list of nodes (v1, v2,...,vn) such that

        o there is an arc from v(i) to v(i+1), for all 1 <= i < n.

        o the length of the path is the number of arcs in the path (n-1)

   * A simple path visits no node more than once

   * A cycle in a directed graph is a path of length >= 1 that begins and
     ends with the same node.

        o a path of length 0 is not a cycle

        o if there is an arc from a node v to itself, there is a cycle v ->
          v

        o A cyclic graph is a graph that has at least one cycle; an acyclic
          graph has no cycles.

        o Notice that directions matter.  Edges a->b, b->c, and a->c do
          NOT make a cycle.

  ========================================================================

Operations on Graphs

   * Breadth-first and depth-first search: Visit every node in a graph.

   * Finding cycles: Does the graph have a cycle?

        EX: Given a list of processes currently executing on the system,
        the resources each holds, and the resources each needs,
        determine whether or not progress is possible.  (This example
        works on a so-called bipartite graph, in which nodes belong to
        classes, and every edge connects a node in one class to a node
        in the other class.)

   * Connected components of undirected graph: Separate nodes into
     equivalence classes, so that there is a path between any two nodes in
     any class.

        EX: I just lost a link in the company network.  Is it still
        possible for everybody to reach everybody else?

   * Minimal spanning tree: Find a tree that connects all the nodes in a
     weighted graph with minimal cost.

        EX: Design the layout of the resnet backbone, minimizing the
        amount of fibre that must be laid.

   * Topological sorting: Assign a linear ordering to nodes in a directed
     acyclic graph (DAG) in such a way that if there is a path from u to
     v in the graph then v comes after u in the linear order.

        EX: Given a list of courses required for the major, and the
        prerequisite list for each course, find a schedule for taking
        the courses that obeys the prerequisite list.

   * Single-source-shortest-path: Find the shortest (lowest weight) path
     from a given node to all other reachable nodes.

   * All-pairs-shortest-path: Find the shortest paths between all pairs of
     nodes.

        EX: Find the shortest distance between all pairs of cities in
        the country, for publication in a traveler's guidebook.

     ------

   * Minimal graph coloring: Assign a color to each node so that no two
     nodes sharing an edge have the same color, and the total number of
     distinct colors is as small as possible.

        EX: Assign the fewest numbers of registers needed to store
        variables and temporary results in a procedure.

   * Maximal clique: find the largest subset of the graph in which every
     pair of nodes shares an edge.

   * Maximal independent set: find the largest subset of the graph in
     which no pair of nodes shares an edge.

   * Hamiltonian circuit: find a cycle, if there is one, on which every
     node appears exactly once.

   * Euler circuit: find a cycle, if there is one, on which every edge
     appears exactly once.

   * Traveling sales circuit: find a minimum-cost cycle (not necessarily 
     a proper one) that visits every node at least once.

        EX: minimum-cost circuit board drilling.

   * Planar subgraph: find the largest subgraph of a given graph that
     can be rendered in the plane without any edge crossings.

        EX: circuit board layout.

These are just a sampling of some of the most important problems.  There
are many many others.  The ones above the short horizontal line have
polynomial time solutions.  The ones below the line are NP-complete.
(Technically, the decision-problem versions of them are NP-complete; the
general versions are NP-hard.  Don't worry about the difference for
now.)
-------------
  ========================================================================

Graph Implementations

There are two common implementations for graphs

   * An adjacency matrix represents a graph G of N nodes and E edges using
     an NxN boolean matrix A, where A[i,j] is true if (i,j) is an edge in
     G.

   * An adjacency list represents a graph G of N nodes and E edges as an
     array A of size N, where A[i] is a pointer to a list of vertices that
     are successors to vertex i.

If we are implementing an undirected graph

   * the adjacency matrix is symmetric, which means A[i,j] = A[j,i]

   * each edge (u,v) appears on the adjacency list for both node u and v