CSC 173 Tues. Nov 12, 2002 --------------------------------------- Project 4 due on Friday Unary ops: look at prev token: a = -4 a = a - 4 + 4 a = (a + a) - a look at length a = ++a a = --a ======================================================================== CFGs vs Regular Expressions Context-free grammars are strictly more powerful than regular expressions. * Any language that can be generated using regular expressions can be generated by a context-free grammar. * There are languages that can be generated by a context-free grammar that cannot be generated by any regular expression. As a corollary, CFGs are strictly more powerful than DFAs and NDFAs. The proof is in two parts: * Given a regular expression R , we can generate a CFG G such that L(R) == L(G). * We can define a grammar G for which there there is no FA F such that L(F) == L(G). --------------------------------------- Simulating a Regular Expression with a CFG To show that CFGs are at least as powerful as regular expressions, we show how to simulate a RE using a CFG. The construction is similar to the one used to simulate a regular expression with a FA; we build the CFG G in pieces, where each piece corresponds to the operands and operators in the regular expression. * Assume the RE is a single operand. Then if RE is epsilon or a character in the alphabet, add to G the production --> RE If RE is null, don't add a production. * Assume the RE is R1R2. Add to G the production --> and create productions for regular expressions R1 and R2. * Assume the RE is R1 | R2. Add to G the production --> | and create productions for regular expressions R1 and R2. * Assume the RE is R1*. Add to G the production --> | epsilon and create productions for regular expression R1. --------------------------------------- Example: RE to CFG We will build a CFG G for the RE (0|1)*111. First the operands: <0> --> 0 <1> --> 1 Now the innermost operator, union: --> <0> | <1> Now the closure operator: --> | epsilon Now the concatenation operators: --> R2 R3 R4 R5 --> <1> --> <1> --> <1> The final grammar G is: --> R2 R3 R4 R5 --> | epsilon --> <0> | <1> --> <1> --> <1> --> <1> <0> --> 0 <1> --> 1 --------------------------------------- A CFG with no Corresponding RE Recall that FA cannot count. Thus, no FA can recognize the language {0^n 1^n | n >= 1} (i.e., the set of strings containing one or more zeros followed by an equal number of ones). Assume such an FA exists, and it has N states. What happens when the input string has N+1 zeros in it, followed by N+1 ones? * Since the FA only has N states, we must visit some state sT twice on seeing N+1 zeros. * The FA cannot know whether we are entering sT for the first time, when we've seen i < N zeros, or the second time, when we've seen j > i zeros. * There must be a path from sT to an accepting state, since the input string is in the language. * The FA will accept an input string without an equal number of zeros and ones, since i != j, and there is a path to an accepting state from sT on the remaining input. This language is generated by the following CFG: 1. S --> 0 S 1 2. S --> 01 We can prove that this grammar generates the language by induction on n, the number of zeros and ones in the string. 1. For the basis step, n = 1, and the string is 01. This string is generated by applying the second production once. 2. For the inductive step, assume we can generate O^n1^n. The last production applied must have been production 2, so the string must have been 0^(n-1)S1^(n-1). If we apply production 1 and then production 2, we get 0^nS1^n, and then 0^(n+1)1^(n+1). Thus, we can generate all strings of the form {0^n 1^n|n>=1}. 3. Since we can only apply production 1 some number of times followed by production 2, these are the only strings generated by the grammar. ------------------------------------------------- READING: Aho & Ullman chapter 9 ------------------------------------------------- Graphs A graph is a set of nodes (or points) connected by edges (or arcs). A simple example of a graph is a map of cities connected by roads. The cities are nodes; the roads are edges. Most questions you might pose about such a map can be posed in terms of operations on a graph. For example, finding the shortest route between two cities, or the shortest route that visits all cities, are common graph problems. In fact, a wide variety of problems can be posed as operations on graphs, including network routing, city planning, VLSI layout, deadlock detection, and register allocation in a compiler. As with most data structures, there are several different implementations of graphs, with different tradeoffs in time and space. Several classical graph problems (and their solutions) have been studied extensively because they arise frequently in practical settings, including * breadth-first and depth-first search * single-source-shortest-path problem * all-pairs-shortest-path problem * transitive closure * minimum spanning tree problem ======================================================================== Definitions A graph is a set of N nodes (or vertices) and E edges. Each element of E is a pair of nodes (u,v), which means there is an edge (or arc) between u and v. * In a directed graph (or digraph), each edge (u,v) is an ordered pair, and there is an arc from u to v. o u is a predecessor of v o v is a successor of u * In an undirected graph, each edge is an unordered pair, and there is an undirected arc between u and v. o u and v are said to be adjacent * In a weighted graph each edge has an associated value (weight). * A path in a directed graph is a list of nodes (v1, v2,...,vn) such that o there is an arc from v(i) to v(i+1), for all 1 <= i < n. o the length of the path is the number of arcs in the path (n-1) * A simple path visits no node more than once * A cycle in a directed graph is a path of length >= 1 that begins and ends with the same node. o a path of length 0 is not a cycle o if there is an arc from a node v to itself, there is a cycle v -> v o A cyclic graph is a graph that has at least one cycle; an acyclic graph has no cycles. o Notice that directions matter. Edges a->b, b->c, and a->c do NOT make a cycle. ======================================================================== Operations on Graphs * Breadth-first and depth-first search: Visit every node in a graph. * Finding cycles: Does the graph have a cycle? EX: Given a list of processes currently executing on the system, the resources each holds, and the resources each needs, determine whether or not progress is possible. (This example works on a so-called bipartite graph, in which nodes belong to classes, and every edge connects a node in one class to a node in the other class.) * Connected components of undirected graph: Separate nodes into equivalence classes, so that there is a path between any two nodes in any class. EX: I just lost a link in the company network. Is it still possible for everybody to reach everybody else? * Minimal spanning tree: Find a tree that connects all the nodes in a weighted graph with minimal cost. EX: Design the layout of the resnet backbone, minimizing the amount of fibre that must be laid. * Topological sorting: Assign a linear ordering to nodes in a directed acyclic graph (DAG) in such a way that if there is a path from u to v in the graph then v comes after u in the linear order. EX: Given a list of courses required for the major, and the prerequisite list for each course, find a schedule for taking the courses that obeys the prerequisite list. * Single-source-shortest-path: Find the shortest (lowest weight) path from a given node to all other reachable nodes. * All-pairs-shortest-path: Find the shortest paths between all pairs of nodes. EX: Find the shortest distance between all pairs of cities in the country, for publication in a traveler's guidebook. ------ * Minimal graph coloring: Assign a color to each node so that no two nodes sharing an edge have the same color, and the total number of distinct colors is as small as possible. EX: Assign the fewest numbers of registers needed to store variables and temporary results in a procedure. * Maximal clique: find the largest subset of the graph in which every pair of nodes shares an edge. * Maximal independent set: find the largest subset of the graph in which no pair of nodes shares an edge. * Hamiltonian circuit: find a cycle, if there is one, on which every node appears exactly once. * Euler circuit: find a cycle, if there is one, on which every edge appears exactly once. * Traveling sales circuit: find a minimum-cost cycle (not necessarily a proper one) that visits every node at least once. EX: minimum-cost circuit board drilling. * Planar subgraph: find the largest subgraph of a given graph that can be rendered in the plane without any edge crossings. EX: circuit board layout. These are just a sampling of some of the most important problems. There are many many others. The ones above the short horizontal line have polynomial time solutions. The ones below the line are NP-complete. (Technically, the decision-problem versions of them are NP-complete; the general versions are NP-hard. Don't worry about the difference for now.) ------------- ======================================================================== Graph Implementations There are two common implementations for graphs * An adjacency matrix represents a graph G of N nodes and E edges using an NxN boolean matrix A, where A[i,j] is true if (i,j) is an edge in G. * An adjacency list represents a graph G of N nodes and E edges as an array A of size N, where A[i] is a pointer to a list of vertices that are successors to vertex i. If we are implementing an undirected graph * the adjacency matrix is symmetric, which means A[i,j] = A[j,i] * each edge (u,v) appears on the adjacency list for both node u and v