Lecture notes for CS 173, Tues. Sept. 17, 2002 =============================== Expressing Queries in Relational Algebra relational algebra: unambiguous notation (or formalism) for queries queries: expressions in relational algebra. use rel. alg. to simplify queries, easier to compute - early selection reduces the number of tuples - early projecion reduces the number of domains Notation: Infix operators are easier to manipulate than prefix (function-call) style operators. Traditional to use lc sigma for selection, lc pi for projection, and bowtie for join. Since I'm typing in ascii, I'll use + for union * for intersection - for difference $P for selection on predicate P #A,B,C for projection onto attributes A, B, and C >< for join, with stacked equate for non-natural join [Note: please DO NOT feed dollar signs and pound signs back at me on an exam. The real symbols are lower-case sigma and pi!] remind yourself the the algebraic laws for set union/intersection/difference (e.g., associativitiy, commutativity, etc) See section 7.3 of the text ------------------------------------------------------------------------ Algebraic Laws for Join (merging multiple relations) Commutativity yes, if order of columns does not matter NOTE: join in assignment is more simple! R1 >< R2 == R2 >< R1 A=B B=A Associativity Not always associative ({AB} >< {BC}) >< {AD} {AB} >< ({BC} >< {AD}) B A B A {ABC} >< {AD} {AB} >< {???} --------- (R1 >< R2) >< R3 == R1 >< (R2 >< R3) A=B C=D A=B C=D associative if B and C are distinct attributes of R2 in above example, A (the second join att.) is not an element of R2 must be careful with naming if keep attr. name of left (e.g., A), then second expr. may be undefined ---------------- Algebraic Laws for Selection Commutativity $P1 ($P2 (R)) == $P2 ($P1 (R)) Selection pushing want to have early selection push selection below the join, fewer rows $P (R >< S) == $P (R) >< S iff all attributes mentioned in P are members of R. Similarly $P (R >< S) == R >< $P (S) iff all attributes mentioned in P are members of S. And $P (R >< S) == $P (R) >< $P (S) iff P mentions only the attribute on which we are performing the (natural) join. Similarly, $P (R + S) == $P (R) + $P (S) $P (R * S) == $P (R) * $P (S) $P (R - S) == $P (R) - $P (S) Selection splitting, where P == A and B $P (R) == $A ($B (R)) == $B ($A (R)) the predicate must contain AND Can split into different arguments of an underlying join If A mentions only attr of R and B mentions only attr of S $P (R >< S) == $A (R) >< $B (S) ---------------- Example from the book: Consider the following 4 relation database * CSG: Course-StudentID-Grade * SNAP: StudentID-Name-Address-Phone * CDH: Course-Day-Hour * CR: Course-Room Implement the query "Where is Chris at Noon on Monday?" Let P be (Name="Chris" and Day="Monday" and Hour="Noon") Brute force: join all data into one BIG relation select tuples that meet query project to get the answer #Room ($P (((CSG >< SNAP) >< CDH) >< CR)) [fig 8.18] 1: Note: selection using only Name/Day/Hour right-most relation does not contain any of these Opt: push selection inside outermost join #Room (($P ((CSG >< SNAP) >< CDH)) >< CR) [fig 8.19a] cannot push selection further 2: Note: Predicate uses AND some attr in left/right joins Opt: Spilt predicate into Name and Day/Hour P1=Name="Chris" P2=Day="Monday" AND Hour="Noon" #Room (($P1 ($P2 ((CSG >< SNAP) >< CDH))) >< CR) [fig 8.19b] 3: Note: P2 involves only attr. from CDH Opt: push selection of P2 inside the join #Room (($P1 ((CSG >< SNAP) >< $P2 (CDH))) >< CR) 4: Note: P1 involves no attr. from CDH Opt: Push selection of P1 inside join #Room (($P1 (CSG >< SNAP) >< $P2 (CDH)) >< CR) [fig 8.19c] 5. Note: P1 involves only attr of SNAP Opt: Push selection of P1 inside join #Room (((CSG >< $P1 (SNAP)) >< $P2 (CDH)) >< CR) [fig 8.18d] ---------------- Algebraic Laws for Projection Idempotence #SR (R) = R, where SR is the entire scheme of R Projection pushing Always push down for UNION UNION: #A (R + S) == #A (R) + #A (S) Sometimes push into Intersection and Difference Problem: projection removes columns, so some tuples that used to be different look the same after the projection Solution: need to keep the attributes used in the join AND, get rid of them in the final step Ex: given #W (R >< S) A=B Let WR be the subset of W found in R, plus A. Let WS be the subset of W found in S, plus B. Then #W (R >< S) == #W (#WR (R) >< #WS (S)) + gets rid of unnecessary columns early - adds many operations (two additional projections!) ? when to do projection pushing? - Do not do it if you eliminate an index which may be useful - Do it if you don't need the column later ---------------- Continuing the example from the book: Originally we had #Room ($P (((CSG >< SNAP) >< CDH) >< CR)) [fig 8.18] After selection pushing, we had #Room (((CSG >< $P1 (SNAP)) >< $P2 (CDH)) >< CR) [fig 8.19d] Prob: still carries many extra attributes 1. Note: Need course and room from left and right of outermost join Opt: right: CR is course/room: push not necessary left: room does not appear, push only with course #Room (#Course ((CSG >< $P1 (SNAP)) >< $P2 (CDH)) >< CR) [fig 8.20a] 2. Opt: push below next join. No "higher level" clean-up join necessary: course is only attribute, and it is pushed in both directions #Room ((#Course (CSG >< $P1 (SNAP)) >< #Course ($P2 (CDH))) >< CR) [fig 8.20b] 3. Opt: Push on left, course and student id do not need course on the right, not an attr #Room ((#Course (#Course,StudentID (CSG) >< #StudentID ($P1 (SNAP))) >< #Course ($P2 (CDH))) >< CR) [fig 8.20c] 4. Note: projection on CDH on Course and StudID requieres looking at whole relation Immediately after this project, do a join on StudentID StudentID is probably an index, so we don't need to look at all the tuples!!! Opt: Do not get rid of the index, so do not do the project on left #Room ((#Course (CSG >< #StudentID ($P1 (SNAP))) >< #Course ($P2 (CDH))) >< CR) [fig 8.20d] The book goes through both the selection and the projection "optimizations" with tree-based diagrams. There are pretty helpful. Take a look at figures 8.19 and 8.20, pp. 445 and 447.