From Chaos to Order: Designing Databases
By John
Paul Ashenfelter
From www.webreview.com
In my last
article, I laid out rules for good database design. Now we'll
design two databases using the structured design process presented
in that article.
The first is a personal information manager
(PIM), a simple example that lets us focus on the process of
designing a flat-file database. The second is a more complex media
management system that requires some serious thought about the data
and relationships between data. Throughout the discussion, I try to
present design alternatives and the reasons I made particular
choices.
Designing a Personal Information Manager
Most people are familiar with personal
information managers in one form or another, either electronic or
paper-based, that manage their business or personal contacts. These
are basic databases, and a great starting point for design.
The first step in designing a database is to clearly
state the problem or objective the database is being created to
address. In this case, my problem is keeping track of the names,
mailing addresses, and e-mail addresses of people I deal with on a
regular basis. This could be succinctly summarized in the following
statement:
"The main objective of this database is
to maintain and organize contact data about people involved in my
personal and professional lives for my day-to-day use."
This definition doesn't include what specific
kinds of data will be stored in the database, nor does it mention
what kinds of queries will be performed on the data. It is simply a
statement of why the database is being created.
The next step of the design process entails
both researching the way data is collected in existing databases and
finding out how that data is turned into useful information.
Preliminary fields and rough ideas about typical kinds of queries
should also be determined during this step.
My current database of contact information
takes a variety of forms. I have all of the e-mail addresses I
commonly use stored in my e-mail software. I have a text file
sitting on my virtual desktop at the office that contains phone
numbers I use on a regular basis. I use a dry-erase board for phone
numbers near the phone in my home. I have a paper address book with
all sorts of contact information scribbled in it. There are also a
number of scraps of paper and Post-It notes scattered around my
office and home with phone numbers and e-mail addresses. These
sources contain a number of possible fields for the database, as
shown in Table 1.
Field
|
Example
|
Name
|
John
Smith
|
Home
address
|
2201
East Park St. #303, Alexandria, VA, 22303
|
Work
address
|
1717
J Street NW, Washington, D.C., 20023
|
Directions
|
Exit
221 off Interstate 66. Take a left at the third light on
Route 221. Take the first right. House is third one on left
with green shutters and a garage
|
Home
phone number
|
703-555-1234
|
Work
phone number
|
202-555-9876
|
Fax
number
|
202-555-2222
|
E-mail
address
|
jsmith@misc.org
|
Web
page
|
http://www.provider.net~jsmith/home.html
|
Birthday
|
June
5, 1966
|
Table 1. Preliminary field list for PIM
database.
At this point, I can also see that a typical
way I use this data is to search for phone numbers and e-mail
addresses by the contact's last name. I also occasionally send mail
through the postal system, so I will also need to look up addresses
for individuals. Neither of these specific queries should constrain
the design of the database, but it should be possible to perform
these functions or the database will be useless to me.
Now that we've defined the focus of the
database, and done some research on existing databases, we can
design the data structures. The tables in the database should
reflect objects or events that are inherent in the data, so both the
focus of the database and the preliminary list of fields should be
used to help define the data structures.
We're designing this database to store data on
people that I contact on a regular basis. I'll start by assuming
that the database consists of a single table named Contacts.
The next step is to examine the preliminary
field list (Table 1) to determine if the data suggest that there are
other tables that could be constructed to store the data. There
actually is a hint that there are two types of data
structures—business contacts and personal contacts. The design
question now is whether to use one table for all contacts, or two
separate tables—one for business contacts and one for personal
contacts.
My main goal is to organize and maintain the
contact data, so this provides little guidance. The typical queries,
however, make it clear that I'm normally using the name of the
contact to find information. An actual examination of my data shows
that while I do have work phone numbers for most of my friends, I
don't have the work addresses for any of them. It also shows that I
don't have the home address or home phone number for many of my
business contacts. This implies that I essentially have one mailing
address for each contact instead of separate home and business
addresses. Once that assumption is made, a single table design seems
sensible. (Plus, for the first example, I want to use a single data
table.) Just for instructional purposes, however, its worth
discussing the two other design possibilities I considered.
- Two
tables.
This design alternative would have individual tables for
business contacts and personal contacts. One advantage of this
approach is that fields related specifically to one type of
contact only need to be put in the relevant table. The birthday
field, for example, would be relevant for personal contacts but
probably less so for business contacts. Querying could be
slightly more efficient as well, since only one set of contacts
would need to be searched. This design could generate problems,
however, if a contact fits in both tables. At the very least, I
would need to enter data in both tables. I would also have to
take care to enter contacts into the appropriate table.
- Primary
table with two subtables. A more relational design would consist
of a primary (master) table with all of the constant information
about an individual (name, birthday, etc.) and two additional
tables to hold the personal and business information. These
additional tables are essentially subtables of the main table
and would be linked solely with the primary key from the master
table. This is probably the more correct design, but it is more
complicated to design and execute than is warranted by the
objective for this database.
Next, we'll cover the field list for the PIM
database.
Fields
for the PIM Database
In part two, we show how to determine the data types and constraints
to create the field data for our personal information manager.
Designing
a Media Management System
In part three, we show how to design a more complex database with
relationships between data and multiple data tables.