UML软件工程组织

From Chaos to Order: Designing Databases

From www.webreview.com

In my last article, I laid out rules for good database design. Now we'll design two databases using the structured design process presented in that article.

The first is a personal information manager (PIM), a simple example that lets us focus on the process of designing a flat-file database. The second is a more complex media management system that requires some serious thought about the data and relationships between data. Throughout the discussion, I try to present design alternatives and the reasons I made particular choices.

Designing a Personal Information Manager

Most people are familiar with personal information managers in one form or another, either electronic or paper-based, that manage their business or personal contacts. These are basic databases, and a great starting point for design.

The first step in designing a database is to clearly state the problem or objective the database is being created to address. In this case, my problem is keeping track of the names, mailing addresses, and e-mail addresses of people I deal with on a regular basis. This could be succinctly summarized in the following statement:

"The main objective of this database is to maintain and organize contact data about people involved in my personal and professional lives for my day-to-day use."

This definition doesn't include what specific kinds of data will be stored in the database, nor does it mention what kinds of queries will be performed on the data. It is simply a statement of why the database is being created.

The next step of the design process entails both researching the way data is collected in existing databases and finding out how that data is turned into useful information. Preliminary fields and rough ideas about typical kinds of queries should also be determined during this step.

My current database of contact information takes a variety of forms. I have all of the e-mail addresses I commonly use stored in my e-mail software. I have a text file sitting on my virtual desktop at the office that contains phone numbers I use on a regular basis. I use a dry-erase board for phone numbers near the phone in my home. I have a paper address book with all sorts of contact information scribbled in it. There are also a number of scraps of paper and Post-It notes scattered around my office and home with phone numbers and e-mail addresses. These sources contain a number of possible fields for the database, as shown in Table 1.

Field	Example
Name	John Smith
Home address	2201 East Park St. #303, Alexandria, VA, 22303
Work address	1717 J Street NW, Washington, D.C., 20023
Directions	Exit 221 off Interstate 66. Take a left at the third light on Route 221. Take the first right. House is third one on left with green shutters and a garage
Home phone number	703-555-1234
Work phone number	202-555-9876
Fax number	202-555-2222
E-mail address	jsmith@misc.org
Web page	http://www.provider.net~jsmith/home.html
Birthday	June 5, 1966

Table 1. Preliminary field list for PIM database.

At this point, I can also see that a typical way I use this data is to search for phone numbers and e-mail addresses by the contact's last name. I also occasionally send mail through the postal system, so I will also need to look up addresses for individuals. Neither of these specific queries should constrain the design of the database, but it should be possible to perform these functions or the database will be useless to me.

Now that we've defined the focus of the database, and done some research on existing databases, we can design the data structures. The tables in the database should reflect objects or events that are inherent in the data, so both the focus of the database and the preliminary list of fields should be used to help define the data structures.

We're designing this database to store data on people that I contact on a regular basis. I'll start by assuming that the database consists of a single table named Contacts.

The next step is to examine the preliminary field list (Table 1) to determine if the data suggest that there are other tables that could be constructed to store the data. There actually is a hint that there are two types of data structures—business contacts and personal contacts. The design question now is whether to use one table for all contacts, or two separate tables—one for business contacts and one for personal contacts.

My main goal is to organize and maintain the contact data, so this provides little guidance. The typical queries, however, make it clear that I'm normally using the name of the contact to find information. An actual examination of my data shows that while I do have work phone numbers for most of my friends, I don't have the work addresses for any of them. It also shows that I don't have the home address or home phone number for many of my business contacts. This implies that I essentially have one mailing address for each contact instead of separate home and business addresses. Once that assumption is made, a single table design seems sensible. (Plus, for the first example, I want to use a single data table.) Just for instructional purposes, however, its worth discussing the two other design possibilities I considered.

Two tables. This design alternative would have individual tables for business contacts and personal contacts. One advantage of this approach is that fields related specifically to one type of contact only need to be put in the relevant table. The birthday field, for example, would be relevant for personal contacts but probably less so for business contacts. Querying could be slightly more efficient as well, since only one set of contacts would need to be searched. This design could generate problems, however, if a contact fits in both tables. At the very least, I would need to enter data in both tables. I would also have to take care to enter contacts into the appropriate table.
Primary table with two subtables. A more relational design would consist of a primary (master) table with all of the constant information about an individual (name, birthday, etc.) and two additional tables to hold the personal and business information. These additional tables are essentially subtables of the main table and would be linked solely with the primary key from the master table. This is probably the more correct design, but it is more complicated to design and execute than is warranted by the objective for this database.

Next, we'll cover the field list for the PIM database.

Fields for the PIM Database
In part two, we show how to determine the data types and constraints to create the field data for our personal information manager.

Designing a Media Management System
In part three, we show how to design a more complex database with relationships between data and multiple data tables.