UML软件工程组织

Database Design For The Web

From www.webreview.com

The most crucial step in building any database application is the design of the database. Experience has shown there is a clear set of rules for this process.

The technical background of most web designers includes little, if any, work with databases. Now that web database applications are one of the hottest (and most lucrative) web development areas, these same designers are desperately trying to pick up the technical savvy to do the work with few available reference materials.

This article, and the column series that it derives from, is an attempt to fill that need. If you're a novice to web databases, this is the article for you. It includes three parts:

database design metaphors and the parts of a database,
seven tips to designing a database, and
a checklist of common database design mistakes.

Even experienced developers may find this a helpful review and learn a thing or two from the section on common database design mistakes.

Database Design Metaphors

Databases can be built around a number of different metaphors. Some metaphors provide better ways of modeling certain kinds of data; others are more efficient from a storage or access point of view. The three most common types, especially for web database work, are

flat files,
relational databases, and
object-oriented databases.

Flat files are the most basic database. The basic characteristic of a flat-file database is that all of the information is stored together in a single file. Two examples are:

a text file filled with values delimited by commas
a spreadsheet-based data collection

They are useful for simple tasks and are easy to implement, but aren't especially efficient. In a future column we'll use Perl to manipulate flat-file databases, which is the most typical implementation of simple databases on the Web.

One problem with flat-file databases is that they don't efficiently store and manage complex information. For example, take a database that manages a set of media files and the different projects that use those files. You could create a flat file that has a column for each associated project: project1, project2, etc. But how many project columns do you include? Too many columns and you waste valuable storage space. Too few columns and the database doesn't work.

And just imagine what happens when you decide that you need to store other information about each project, such as client name and due date. Now you need to duplicate each bit of project information (for example, clientname1, clientname2). This almost guarantees that data will get entered improperly when you input the same data in multiple columns across different records.

Relational database schemes were designed to prevent the unnecessary duplication of data in the database. In a relational database, each distinct entity is stored in a different database table and relationships are used to connect the entities. (I discuss them in detail in the next section.)

One of the newest database design metaphors is the object-oriented database. The motivation behind this new database model was initially the efficiency arguments that made object-oriented programming languages such as Java and C++ so popular. In object-oriented programming systems, individual components of an application (the objects) should be created once and then reused, extended, or modified. These objects are instances of a class, an abstract data structure that contains the properties and methods that are available to that class. A property is essentially a value that can be read, set, or modified. A method is a function, procedure, or routine that defines and produces an action that an object of the class can perform or undergo.

This provides a rich framework for modeling and manipulating complex data and complex relationships between data. Unfortunately, there are very few truly object-oriented database systems on the market and most are quite expensive. There are a number of object-relational systems from major vendors that add some of the aspects of objects to a relational database, but even these remain essentially relational databases.

Anatomy of a Relational Database

There are several terms that describe the parts of a relational database. These structural terms are standard regardless of whether the database is implemented in any particular piece of software.

Figure 1:A schematic overview of a database.
Figure 1: A schematic overview of a database.

A field is the most basic structural unit of a database. It is a container for a piece of data. In most cases, only a single logical piece of data fits in each field.
A key is a field that contains a unique identifier for each row in a data table. Even though each individual record represents a separate piece of data, some of those records may look identical. A key provides a completely unambiguous way to distinguish between distinct records, and more importantly, serves as a pointer to a particular record in the table. In many cases, data table keys are constructed by simply adding an additional field to function as the key.
A set of fields describing a larger unit is normally called a record or a row. The fields in a record provide a complete description of each item in a collection. A record is a unique instance of data about an object or event.
A table is the formal name given to the group of records that contain the elements of the collection. A table normally represents a distinct object (business clients or library books), or an event (product orders or stock prices).
A database is basically a collection of tables. It also often includes forms for entering data, rules for checking and validating data that has been entered, and the format for creating informative reports from the data in the database.

The connections between records in different data tables are provided by relationships. The most common relationship between two tables is called a one-to-many relationship. In this situation, precisely one record in data table A is related to a number of records in data table B. The primary key of table A is inserted as a field into table B where it serves as a foreign key. Relationships between tables are always made through keys.

There are two other possible ways to relate tables in a database, one-to-one and many-to-many relationships. In a one-to-one relationship, each record in one table is linked to one and only one record in another table. In many cases, one of the tables is set of data about a subset of the entities in the main table.

The many-to-many table design rounds out the possible ways of designing data relationships. In some situations, multiple entries in a data table are related to multiple entries in another data table. The classic example is a database of classes and students. Each class consists of many students and each student can take many classes, so in designing a database to track student and class information, a many-to-many design is necessary.

The distinguishing characteristic of many-to-many relationships between two tables is that they require a third table to make the relationship. Simply putting the key from one table into the other table would result in a lot of duplicated information, so a linking table is used to connect the tables. The linking table simply stores the primary key from one table with the primary key of its related entry from the second table, along with any other information unique to the relationship.

Next, we'll show the Seven Steps to Designing Good Databases.

Seven Steps to Designing Good Databases
A look at how good database design can save you from hard to maintain and difficult to work databases later.
Common Database Mistakes
A checklist to help you avoid some of the most common database design errors.