CSC 173, Fall 2002

Assignment 1: Relational Database


DUE DATE:

Monday September 23. 11:59 pm

NO EXTENSIONS.


Project Overview

You are to design and build a simple database system in Java. Your system must support databases with a variety of schemes. We will make several simplifying assumptions to keep the project managable:

  1. All data will fit in main memory.
  2. The columns of a relation will be numbered (and thus ordered), but not named. To make life simple in Java column numbers will start at zero.
  3. All data can be stored as character strings.

Relation abstract data type

You should begin by creating a Java class (ADT) to represent a relation. Your relation class should have two constructors. The first will create an empty relation with a given number of columns. The second will take a second argument that specifies a file name from which initial data for the relation should be read. You may assume that tuples in the file are separated by newline characters and that fields (attributes) within a tuple are separated by tabs. You should deal gracefully with invalid file contents (e.g. a line with too few or too many attributes), printing an appropriate error message and producing a well-defined result.

Internally, you may represent a relation as an array of Java strings.

In addition to the constructors, your ADT should provide the following methods:

print
print the relation as lines of tab-separated fields on the screen.
insert
insert a given tuple into the relation.
delete
remove from the relation all tuples matching a given predicate.
lookup
(a.k.a. select) return a new relation containing all tuples from the original relation that match a given predicate.
project
return a new relation that is the projection of the given relation onto a given list of fields. If the original relation has six fields, for example, and [1, 4, 2] is the list onto which to project, tuple (a, b, c, d, e, f) in the original relation would become tuple (b, e, c) in the new.
join
return the join of the original relation and a relation passed as a argument, equating the last field of the original relation with the first field of the argument, in such a way that the fields of the original relation precede those of the argument in the result. Note that this very simplistic notion of join (without attribute names) will generally need to be used in conjunction with projection operations that reorder the fields of the arguments and result.
union
return the union of the original relation and a relation passed as an argument.
intersect
return the intersection of the original relation and a relation passed as an argument.

Several of these methods will be easier to implement if your relation class supports the standard Enumeration or Iterator interface.

For the purposes of delete and lookup methods, a predicate is a triple consisting of

  1. attribute identifier (field number)
  2. comparison test (<=, <, >=, >, ==, !=) — to be performed lexicographically on strings
  3. attribute identifier or string constant
Note that lexicographic comparisons of the strings representing integers will do the "right" thing numerically if (and only if) you ensure that all tuples use the same number of digits to represent a given attribute. For example, "00123" < "01200", but "123" > "1200".

Query language

To allow a user to manipulate the database, you will need a command language and a way to refer to relations. You should maintain, internally, a mapping from single-character relation names (lower case letters will suffice) to the relation (if any) named by that character. This mapping can be a simple array. Your system should then read a sequence of commands from standard input, one command per line. Valid commands are as follows:

Your database need not necessarily be super efficient (but see the extra credit suggestions below). You may, if you wish, store the tuples of a relation in a list, and peruse the entire list when necessary to implement relational operations.

In the interests of compact maintainable code, you are encouraged to make maximum use of the Java standard libraries, including the collections framework.

You should of course detect and handle invalid commands. You should print a helpful message, ignore the command, and continue.

If you are not familiar with the Java StringTokenizer class, take a look. It might be useful for parsing your command line or other strings.


Tools

You are encouraged to work on the CSUG systems (especially exploiting remote access) to become familiar with the environment.

That said, you may work on any platform you like, but (1) your final code must compile and run correctly using the javac compiler and java virtual machine on the csug Linux machines, and (2) you must hand in your work using the appropriate turnin script on those machines (watch the newsgroup for details).

javac is Sun's version 1.4.0 compiler for Linux. The jikes compiler from IBM is also available for the adventurous, but again, your code must compile and run with javac. (Note to the adventurous: to use jikes you must set your JIKESPATH environment variable to ".:/usr/staff/lib/java/latest/jre/lib/rt.jar".) Extensive documentation for javac is available on-line in HTML. On the csug machines, the root file is /usr/staff/lib/java/latest/docs/index.html.


What/how to turn in

The TAs will shortly be creating test data that you can use to exercise your code. Watch the newsgroup.

They will also be posting instructions on

Watch the newsgroup for details.

Extra Credit Suggestions

For extra credit, you might consider the following options:

If you do any extra credit, remember to document it. If you do not, you will likely not get credit for the work. You should also provide test data, and a description of how to use the test data to show that your extensions do what you claim.