Modeling
Methodologies
From
www.aisintl.com
Many
forms of symbolic notation have been developed to enable data
models to represent various levels of abstraction. Some are
lexical, others graphic; the better approaches are both.
One
of the earliest, Chen's
Entity
Relationship Model, offers a set of shapes and lines which, much
like musical notation, deliver a wealth of information with sparse
economy of drawing. Entity Relationship modeling was readily
adopted outside academia, introducing the concepts of data
modeling to a generation of information professionals.
Chen's
ER spawned a number of variations and improvements, some of which
have been embodied in computer assisted software engineering
(CASE) products
employing ER methodology, e.g., CSA's Silverrun.
Barker90
(p. 5-1) defines an entity as "... a thing or object of
significance, whether real or imagined, about which information
needs to be known or held." Martin90
(vol. II, p. 219) agrees that an entity "is something about
which we store data." Chen's original E-R technique made a
firm (if not clear) distinction between entities, as defined
above, and relationships between them. To cope with inevitable
complexities, Chen allowed relationships to have attributes of
their own, making them look a lot like entities and giving rise to
heated debate over just what is an entity versus a relationship.
Given
the lack of clarity in definitions, it is not surprising that Codd90
(p. 477) says "The major problem with the entity-relationship
approach is that one person's entity is another person's
relationship."Date95
(p.363) agrees , saying "[the ER approach] is seriously
flawed because the very same object can quite legitimately
be regarded as an entity by some users and a relationship by
others." Thus Codd90
(p. 9) says emphatically that "... precisely the same
structure is adopted for entities as for relationships between
entities." Date95
(p.362) puts this in perspective with "[the ER approach] is
essentially just a thin layer on top of the basic relational
model."
James
Martin's Information Engineering, laid out in Martin90,
is a streamlined refinement on the ER theme which discards
the arbitrary notion of the complex
"relationship" with an arity (i.e., the number
of entities related) of two, three, four, or even more.
Martin models them as simply associated entities. Thus
every relationship in IE is binary, involving two entities
(or possibly only one if reflexive). Martin also
simplified the graphic notation in his diagram style. IE
has become the basis for a number of CASE products,
including Powersoft's PowerDesigner
and several others.
|
Another
common modeling technique is IDEF,
developed in the late 1970's and early 1980's by Bob Brown at the
Bank of America and well described in Bruce92.
IDEF was later extended by various parties into a set of tools and
standards which were adopted by the U.S. Air Force as the required
methodology for government projects. IDEF
is semantically weaker than ER and IE and forces its practitioners
into some rather arbitrary methods which lack a sound foundation
in theory. Nonetheless it is a workable, easily learned
methodology which has been taken up, either by choice or for
government contracts, by many modelers. LogicWorks' ERwin,
Popkin's System
Architect,
and InfoModeler
from InfoModerlers, Inc. offer IDEF1X
data modeling products.
Entity-Relationship,
IDEF1X, and Information Engineering all translate business
requirements into formal symbols and statements which can
eventually be transformed into database structural code. Thus the
modeling process reduces undisciplined, non-mathematical narrative
to algebraic regularity. Early practice (see DeMarco78)
when data modeling techniques were not widely known, was built on
a bottom-up approach. Analysts harvested an inventory of raw data
elements or statements ("A customer order has a date of
entry.") from the broad problem space. This examination was
frequently conducted via data
flow diagram
(DFD) techniques, which were invented for the express purpose of
discovering the pool of data items so that their structure could
be considered. Expert analysis of this pool, including various
forms of normalization, rendered aggregations of data elements
into entities.
Unfortunately,
according to Teorey94,
"The number of entities in a database is typically an order
of magnitude less than the number of data elements ..."
Conversely, the number of data items or attributes is one or two
orders of magnitude greater than the number of entities. In
approaching from discovery of the multitude of details, one has
the discouraging experience of watching the work funnel into a
black hole of diagrams and documents, seldom allowing the escape
of an illuminating ray of understanding.
Top-down,
entity-based approachs (ER, IE, etc.) are more concise, more
understandable and far easier to visualize than those which build
up from a multitude of details. Top-down techniques rapidly fan
out through the power of abstraction to generate the multitude of
implementation details. Current practice therefore leans toward
modeling entites (e.g., "customer", "order")
first, since most information systems professionals now understand
the concept of entities or tables in a relational database.
Entities are later related amongst each other and fleshed out with
attributes; during these processes the modeler may choose to
rearrange data items into different entity structures.
While
this delays the analysts' inevitable agony of populating the
model's details, it has the corollary shortcoming of placing
responsibility for critical structural decisions on the designers.
We do not mean to suggest that professional data analysts are
incapable of making such decisions but rather that their time
could be better spent if the CASE tool can make those decisions -
swiftly, reliably, consistently - for them.
Proponents
(e.g.,Halpin95)
of the Object Role Modeling (ORM)
or NIAM schools represent that their methodologies accomplish
precisely that, in addition to enabling the capture of a much
larger range of structural features and constraints than in ER
based methods. In ORM it is the calculus of relational mapping,
rather than the whim or experience of a designer, which determines
how data items ("objects") are assembled into entities.
This does not snatch all judgment and creativity from the
designer. Rather it elevates them to a more symbolic plane of
discussion concerning business issues and implementation options.
Dr. Terry Halpin explains this more thoroughly and articulately in
his several articles on Object
Role Modeling.
Accidents
of history rather than relative deficiencies seem to have kept ORM
in the shadows of ER for many years. Contrary to a frequent
misconception, the academic foundations of ORM date back twenty
years, to the same period which gave birth to ER. Over the years
several CASE tools have employed this methodology yet there has
seldom been even one commercial product available. For a
comprehensive display of the current art of ORM, see Asynetrix's InfoModeler.
The
modeling methodologies discussed above deal with conceptual and
logical understanding of data but not necessarily the physical
details of its storage. Additional techniques from the area of
relational schema design are generally employed to represent
tables, columns, indexes, constraints and other storage structures
which implement a data design. For example, the table below
illustrates some design choices which must be implemented in
declarative or procedural integrity constraints to implement a
model.
The
conceptual, logical, and physical models together comprise a
complete data model which can represent a given database design
from its highest abstraction through its most detailed level of
column data type and index expression.
In
our limited experience no single methodology, method, or tool
covers the full scope of data modeling from raw discovery to
instantiated database, as sketched above. Notice that in the upper
half the techniques funnel downward toward coalescence and
conceptual clarity (or into the black hole of bloated, aborted
projects); in the lower half the process fans rapidly out as
automated algorithms replicate abstract patterns to implement
details (e.g., a simple foreign key reference propagates a lengthy
SQL trigger).
If
you are in search of the appropriate methods, skills, and tools
for a large scale data design effort, keep your eyes and options
open. Contact
AIS for consulting assistance
in evaluating, selecting, and implementing CASE techniques.