Related Entries

Useful SQL*Plus headers
Dino eats frog
Oracle with Active Record
When is he free?
Gentle dive into analytic functions

« Bitflux XML Editor
» Seven Tenets for Business Prosperity

Spreadsheet Design!

Why not to design your databases like a spreadsheet.

I wanted to make some notes about a common mistake people make when making database enabled websites. Mind you, this has nothing to with web, it is really ignorance with principles of RDBMS design.

Let us take an example. Say, there needs to be a database to track employees and their performance levels on a yearly basis. Every year, an employee is given a rating from 1 to 5. This scenario can be repeated for example - may be every year employees get a hike. You need to track their salaries too.

Typical design

I’ve seen this very often from web developers who think database design is a simple matter of creating tables and queries in MSAccess. It is indeed that simple, but you need to know underlying principles.

TABLE EMPLOYEES
employeeid NUMBER(10) NOT NULL PRIMARY KEY,
employee_name VARCHAR(100) NOT NULL,
...
grade_1998 NUMBER(1),
grade_1999 NUMBER(1),
grade_2000 NUMBER(1),
grade_2001 NUMBER(1),
grade_2002 NUMBER(1)

This is typically what I call a spreadsheet design. To call this a relational database design is a really big stretch.

Note: The code below is for illustrative purposes only and does not particularly follow syntax for any RDBMS. Think of it like pseudo-code.

Good Design

A good design in this case, could be:

TABLE EMPLOYEES
employeeid NUMBER(10) NOT NULL PRIMARY KEY,
employee_name VARCHAR(100) NOT NULL,

TABLE GRADES
employeeid NUMBER(10) NOT NULL,
grade_year NUMBER(4) NOT NULL,
grade NUMBER(1),
PRIMARY KEY (employeeid, grade_year)
CONSTRAINT fk_grades_employeeid 
(employeeid FOREIGN KEY employees.employeeid)

To get the information, you just do a join.

SELECT 
e.employee_id, e.employee_name,
g.grade_year, g.grade
FROM grades G, employees E
WHERE G.employee_id = E.employee_id 

Oracle tip: For better performance in a join, ALWAYS put the table that returns the least number of records after the WHERE clause is considered, to the right of the join. For example, if you write FROM employees E, grades G this query is going to be much slower.

Better Design

Very simple. Instead of keeping only the year, we keep a date column. Let us call this effective_date. What this means is that, the grade or salary you should consider for any given date, is determined by the record with an effective_date nearest (in the past) to your given date.

TABLE EMPLOYEES
employeeid NUMBER(10) NOT NULL PRIMARY KEY,
employee_name VARCHAR(100) NOT NULL,

TABLE GRADES
employeeid NUMBER(10) NOT NULL,
effective_date DATE NOT NULL,
grade NUMBER(1),
PRIMARY KEY (employeeid, effective_date)
CONSTRAINT fk_grades_employeeid
(employeeid FOREIGN KEY employees.employeeid)

TABLE SALARIES
employeeid NUMBER(10) NOT NULL,
effective_date DATE NOT NULL,
salary NUMBER(10,2) NOT NULL,
PRIMARY KEY (employeeid, effective_date)
CONSTRAINT fk_salaries_employeeid
(employeeid FOREIGN KEY employees.employeeid)

So, how do you get an employee’s current grade and salary? Assuming we are using Oracle, this can be done with the following query. SYSDATE is a pseudo-column in Oracle that returns current date and time.

SELECT 
e.employee_id,
e.employee_name,
g.grade,
s.salary, 
MAX(g.effective_date) grade_effective_since,
MAX(s.effective_date) salary_effective_since
FROM salaries S, grades G, employees E
WHERE G.employee_id = E.employee_id
AND S.employee_id = E.employee_id
AND S.effective_date <= SYSDATE
AND G.effective_date <= SYSDATE
GROUP BY 
e.employee_id,
e.employee_name,
g.grade,
s.salary

Motto: Put the RELATIONAL back into Relational Database Design.

So, next time you get completely wow'ed by full-color, animated presentations from "web development firms" about their ability to make database driven sites, ask these questions:

  1. Why does this need a database?
  2. If yes, how are our staff going to maintain the database? Do you give us some forms to maintain it?
  3. If yes, are these forms secure enough so that crackers won’t mess up the site?
  4. Can the database system you chose (Access, Oracle etc.) handle my site load? Is it reasonable? Do we need such an expensive database system for this task? Is it portable to other database systems?
  5. What are the benefits of using a database? If all I get is dynamically sorted lists, wouldn’t a solution that generated multiple static pages for each sort a better and more cost-effective choice?
  6. How much will it cost me if I’ve to add new fields? To change the layout?
//-->