Database Design For The Web
John
Paul Ashenfelter
From www.webreview.com
The most crucial step in
building any database application is the design of the database.
Experience has shown there is a clear set of rules for this
process.
The technical background of
most web designers includes little, if any, work with databases.
Now that web database applications are one of the hottest (and
most lucrative) web development areas, these same designers are
desperately trying to pick up the technical savvy to do the work
with few available reference materials.
This article, and the column
series that it derives from, is an attempt to fill that need. If
you're a novice to web databases, this is the article for you. It
includes three parts:
- database
design metaphors and the parts of a database,
- seven
tips to designing a database, and
- a
checklist of common database design mistakes.
Even experienced developers
may find this a helpful review and learn a thing or two from the
section on common database design mistakes.
Database Design Metaphors
Databases can be built around
a number of different metaphors. Some metaphors provide better
ways of modeling certain kinds of data; others are more efficient
from a storage or access point of view. The three most common
types, especially for web database work, are
- flat
files,
- relational
databases, and
- object-oriented
databases.
Flat files
are the most basic database. The basic characteristic of a
flat-file database is that all of the information is stored
together in a single file. Two examples are:
- a
text file filled with values delimited by commas
- a
spreadsheet-based data collection
They are useful for simple
tasks and are easy to implement, but aren't especially efficient.
In a future column we'll use Perl to manipulate flat-file
databases, which is the most typical implementation of simple
databases on the Web.
One problem with flat-file
databases is that they don't efficiently store and manage complex
information. For example, take a database that manages a set of
media files and the different projects that use those files. You
could create a flat file that has a column for each associated
project: project1, project2, etc. But how many project columns do
you include? Too many columns and you waste valuable storage
space. Too few columns and the database doesn't work.
And just imagine what happens
when you decide that you need to store other information about
each project, such as client name and due date. Now you need to
duplicate each bit of project information (for example,
clientname1, clientname2). This almost guarantees that data will
get entered improperly when you input the same data in multiple
columns across different records.
Relational database schemes
were designed to prevent the unnecessary duplication of data in
the database. In a relational database, each distinct entity is
stored in a different database table and relationships are used to
connect the entities. (I discuss them in detail in the next
section.)
One of the newest database
design metaphors is the object-oriented database. The motivation
behind this new database model was initially the efficiency
arguments that made object-oriented programming languages such as
Java and C++ so popular. In object-oriented programming systems,
individual components of an application (the objects) should be
created once and then reused, extended, or modified. These objects
are instances of a class, an abstract data structure that
contains the properties and methods that are available to that
class. A property is essentially a value that can be read,
set, or modified. A method is a function, procedure, or
routine that defines and produces an action that an object of the
class can perform or undergo.
This provides a rich
framework for modeling and manipulating complex data and complex
relationships between data. Unfortunately, there are very few
truly object-oriented database systems on the market and most are
quite expensive. There are a number of object-relational systems
from major vendors that add some of the aspects of objects to a
relational database, but even these remain essentially relational
databases.
Anatomy of a Relational
Database
There are several terms that
describe the parts of a relational database. These structural
terms are standard regardless of whether the database is
implemented in any particular piece of software.
Figure 1: A schematic overview of a
database.
- A
field is the most basic structural unit of a database.
It is a container for a piece of data. In most cases, only a
single logical piece of data fits in each field.
- A
key is a field that contains a unique identifier for
each row in a data table. Even though each individual record
represents a separate piece of data, some of those records may
look identical. A key provides a completely unambiguous way to
distinguish between distinct records, and more importantly,
serves as a pointer to a particular record in the table. In
many cases, data table keys are constructed by simply adding
an additional field to function as the key.
- A
set of fields describing a larger unit is normally called a record
or a row. The fields in a record provide a complete
description of each item in a collection. A record is a unique
instance of data about an object or event.
- A
table is the formal name given to the group of records
that contain the elements of the collection. A table normally
represents a distinct object (business clients or library
books), or an event (product orders or stock prices).
- A
database is basically a collection of tables. It also
often includes forms for entering data, rules for checking and
validating data that has been entered, and the format for
creating informative reports from the data in the database.
The connections between
records in different data tables are provided by relationships.
The most common relationship between two tables is called a one-to-many
relationship. In this situation, precisely one record in data
table A is related to a number of records in data table B. The
primary key of table A is inserted as a field into table B where
it serves as a foreign key. Relationships between tables are
always made through keys.
There are two other possible
ways to relate tables in a database, one-to-one and many-to-many
relationships. In a one-to-one relationship, each record in
one table is linked to one and only one record in another table.
In many cases, one of the tables is set of data about a subset of
the entities in the main table.
The many-to-many table
design rounds out the possible ways of designing data
relationships. In some situations, multiple entries in a data
table are related to multiple entries in another data table. The
classic example is a database of classes and students. Each class
consists of many students and each student can take many classes,
so in designing a database to track student and class information,
a many-to-many design is necessary.
The distinguishing
characteristic of many-to-many relationships between two tables is
that they require a third table to make the relationship. Simply
putting the key from one table into the other table would result
in a lot of duplicated information, so a linking table is used to
connect the tables. The linking table simply stores the primary
key from one table with the primary key of its related entry from
the second table, along with any other information unique to the
relationship.
Next, we'll show the Seven
Steps to Designing Good Databases.
Seven
Steps to Designing Good Databases
A look at how good database design can save you from hard to
maintain and difficult to work databases later.
Common
Database Mistakes
A checklist to help you avoid some of the most common database
design errors.