The way software is developed today is
rather primitive and is not as effective as can be. When trying
to automate this process it often becomes too complicated.
Programmers tend to frequently do trivial
coding just to build up a skeleton. It is tedious and error
prone. After a while it also becomes a maintenance burden, since
a change in one place (e.g. addition of a new property/type)
ripples into several other classes, interfaces or configuration
files.
For example a design with
Abstract Factory [15]
and separation of interface and implementation of the factory
and the products is sometimes a good design decision. However,
it has drawbacks. Assume that we need 3 factories with 10 products
each, and 2 different implementations of the products (e.g.
real and fake). This ends up in 99 classes!
It might be considered trivial copy paste,
replace programming. Yes, but what if you need to change the
way the products are defined or implemented. E.g. introduce
command pattern in the interface of the products or template
method in the implementation. More copy/paste... To me it is
the #1 bad smell - duplication.
The idea for solution is to use a domain
specific language (DSL) that fits the problem domain and generate
realization with specialized code generators. It raises the
level of abstraction and automates a lot of otherwise hand written
code. This is not a new idea, but it can be difficult to make
it work without introducing too much complexity.
An efficient approach is to be in control
of the DSL and the code generation tools. The tools should be
adopted to fill the needs of your system. As developer you will
define the DSL and implement the code generators yourself. Further
on, it will make you feel comfortable with the result, since
it is under your control. This is a different approach from
many of the general purpose
Model Driven Architecture (MDA) [9]
tools that are popping up in this area. Vendor specific code
generation tools often look good at the first demo, but later
on turns out to be inadequate and hard to evolve as the requirements
changes. . In contrast, when you develop the code generators
yourself you can always change them when it is needed.
The toolset used in this article and
shown in
Figure 1, “Overview” should only be seen as an example
of how to implement the approach. You have to decide which tools
are appropriate for your organization.
Figure 1, “Overview” illustrates the different parts
of the approach:
- ArgoUML can be used to define the
DSL model. As developer you design the DSL model to fit
the problem domain.
- You convert the ArgoUML model to
Ecore, which is the model format in Eclipse.
- You will also develop the code generators
yourself, using JET templates, which is a code generation
plugin to Eclipse.
- The Ecore model is the input to
the templates. You define the mapping between model elements
in the Ecore model and the JET templates with Merlin, which
is another Eclipse plugin.
- The final result is the generated
code.
The approach presented in this article
can be seen as a lightweight variant of
Domain Specific Modeling (DSM) [10].
DSM is targeted at large scale projects and especially product
families. This approach is for small to medium size projects
(< 10 developers, < 1 year). It tries to cut the investments
and learning curve to a minimum, by using a pragmatic solution.
It must be easy to change and add more
code generation patterns as the need arises. The investment
in development of the code generator must be low and the payback
time short. It should be possible to create a new code generator
in a few hours.
The tools should not set constraints
on the design or demand a specific design. You know the requirements
for your system and should also be in charge of the generated
code. You can fully utilize your design skills and system specific
design decisions.
A subset of UML has been selected as
basis for the DSL in this article. Ordinary elements of class
diagrams are being used. You can use other notations, but UML
has the advantage that there are a lot of tools that you can
use as foundation of your toolset.
The model is not a 1-1 representation
of the implementation. The model is a high level abstraction
that is the input for the code generation. It should be as simple
as possible. The code generation templates are custom made for
your problem and your model. Therefore the model doesn't have
to contain everything. Some information is easier to place in
the templates, e.g. naming conventions. Simplicity is important.
It doesn't have to be a general purpose solution that must be
able to handle all strange situations. You are in control of
both the DSL model and the templates and should choose the simplest
solution.
The DSL model is a concrete model and
there is no need for defining a meta model. Also, there is no
need for formal model transformations based on meta models.
This is a major difference from the ideas in MDA, which tools
are intended to be general purpose tools made by tool vendors.
Therefore they might need to use the complex theory of model
transformation.
The purpose of the DSL model is to drive
the development, not to be any documentation of the system.
Use ordinary reverse-engineering UML tools for documentation
purpose, but this reversed model is not a DSL model.
It is convenient to be able to mix generated
code with manually crafted code, i.e. it should be possible
to merge generated code with changes made to it. There are drawbacks
with mixing as described in the pattern
Separate generated and non-generated code [11].
You have to decide what approach is appropriate for your design.
The Lightweight DSM approach is perfect
together with application frameworks, as described in the pattern
Rich Domain Specific Platform [11].
Usage of design patterns and frameworks is a critical success
factor for this approach. A well designed framework often requires
that you plug-in specific implementations and add the glue (configuration
files) to make things work together. Code generation can be
an efficient way to reduce the size and complexity of the 'framework
completion code'. This approach has an advantage over complex
general purpose tools when it comes to adoption to specific
frameworks. You can easily define DSL and templates in a way
that fits your framework.
To illustrate how the Lightweight DSL
approach can be used in practice we will use an example. More
information is available in the referenced appendices, and the
complete source code for the example can be
downloaded.
The example is a simple system for a
library of movies and books. It is "over designed"
compared with the simple functionality it provides. The reason
for this is to be able to illustrate many code generation ideas.
It is not intended to be a best practice of design, but it captures
some patterns that often are used in real applications. The
design patterns are not described in detail; please see the
references for more information.
In the example some simple base classes
and utilities are used together with Hibernate to illustrate
how the approach can be integrated with application frameworks.
This section describes the design so
that you get a feeling of where we are heading.
The core of the system is a
Domain Model [12],
see
Figure 2, “Domain model”. A Library consists
of PhysicalMedia. Books and Movies
are different types of Media, which are stored on
a PhysicalMedia, e.g. DVD, VHS, paper books, eBooks
on CD. A Media has Characters, e.g. James
Bond, which can be played by a Person, e.g Pierce
Brosnan. A person can be involved (Engagement) in
different Media, actually a Person can have
several Engagements in the same Media.
E.g. Quentin Tarantino is both actor and director in the movie
'Reservoir Dogs'.
Repositories [13] are
used to find/update domain model
Aggregates [13]. The
repository uses
Access Objects [14],
which implement the specialization needed for persistence.
Access Objects have separated interface and implementation.
Abstract Factory [15]
is used to create instances of the Access Objects. Access
Objects are implemented as
Commands [15].
Hibernate has been used as O/R-mapping framework, and a generic
set of Access Objects for common CRUD operations are part
of the application framework.
In front of the domain model we have
a
Service Layer [12],
which typically use Repositories to find/update domain model
Aggregates. See
Figure 3, “Overall design” .
Illustration of the normal execution
flow can be found in
Appendix: Sequence Diagram .
The following tools have been used
in the example in this article; Eclipse 3.1, Java 5.0, EMF,
Merlin, Argo2Ecore, ArgoUML, Hibernate, and MySQL. You need
to install them to be able to run the example. See Resources
[1] - [8]
for more information.
These tools have been chosen for this
article mainly because they work well together and are fairly
easy to use. There are many alternatives and you have to decide
for yourself which tools to use when implementing the approach.
The DSL model for this example consists
of three parts; domain model, repositories, and services.
In this example the DSL model is tailored for a rather technical
system design domain. It is rather independent from the business
domain. To leverage the full potential of DSM it is even better
to use concepts from the business domain in the language.
The reason for this is described in
Why DSM? [10]
The class diagram in
Figure 4, “Domain model” is the domain model in the
DSL model. It defines classes, with properties and relations
that represent the domain model. This information is used
for generation of domain model implementation classes, Hibernate
mapping and database schema.
There are two Repositories, for the
Library and Person Aggregates, see
Figure 5, “Repositories” . The repositories are modeled
as ordinary classes with operations. This information is used
for generation of Repositories, Access Objects, and Abstract
Factories.
There is one service class that exposes
the published interface to client applications. In the DSL
model we define the service operations and eventual delegation
to repositories, see
Figure 6, “Services” .
ArgoUML is used to define the DSL model,
which we convert to an Ecore model with the argo2ecore plug-in.
It is done with a simple right click on the exported XMI file.
See
Appendix: Ecore for some more information.
JET is used as code generation templates.
JET is similar to JSP, so any knowledge of this should come
in handy, see
Appendix: JET Basics .
JET templates are defined in a separate
Eclipse JET project, but we don't need to package them as
a plug-in. We use a helper class to simplify the usage of
the Ecore model from the templates. The helper can provide
any functionality you need, such as String manipulation. The
helper in this example is named
EcoreGenerationHelper
. You can
use it as a foundation, but you should modify/extend it to
fit your needs. Some templates may need specific functionality
and then you should implement template specific helpers. An
example of this is the
DatabaseGenerationHelper
, which
is used for the
Hibernate and
DDL generation.
Tip: When developing a template
it is convenient to use the preview pane, see figure in
Appendix: JET Basics .
The Library example make use of 11
JET templates, one of them is
Repository.javajet , and some interesting parts are described
below.
The two repository classes in the DSL
model are mapped to Repository.javajet and it generates
LibraryRepository.java and
PersonRepository.java Naming conventions are used to realize
specific features. The input class must end with "Repository",
and the part before that is used as part of the name in several
of the classes. Package naming conventions are also defined
in the templates.
The interesting part of Repository.javajet
is the section that generates the methods. It generates code
that delegates to the Abstract Factory and Access Object,
but if the 'noaccessobject' annotation has been defined on
the operation an empty method stub is generated, to be filled
in by hand written code.
<%for (EOperation op : h.getOperations(eClass)) {
// a few naming mapping conventions
String mappedOpName = h.getMappedOperationName(op);
boolean findById = (mappedOpName.equals("findById"));
%>
/**
* <!-- begin-user-doc -->
* <!-- end-user-doc -->
* @generated
*/
<%=h.getVisibility(op) %><%=h.getTypeName(op)%>
<%=h.getName(op)%>(<%=
h.getParameterList(op)%>) <% if (findById) {
%>throws <%=baseName%>NotFoundException<%}%> {
<% if (h.getAnnotation(op, "noaccessobject") != null) {%>
// TODO Auto-generated method stub
throw new UnsupportedOperationException("<%=mappedOpName
%> not implemented");
<%} else {%>
<%=h.capName(mappedOpName)%><%=productType%>
<%=h.getGenericType(op)
%> ao = <%=h.uncapName(baseName)%><%=productType%>
Factory.create<%=
h.capName(mappedOpName)%><%=productType%>();
<%for (EParameter parameter : h.getParameters(op)) {%>
ao.set<%=h.capName(h.getName(parameter))%>(
<%=h.getName(parameter)%>);
<%}%>
ao.execute();
<%if (!h.getTypeName(op).equals("void")) {%>
<%if (findById) {
EParameter idParam = h.getParameters(op).get(0);
%>
if (ao.getResult() == null) {
throw new <%=baseName%>NotFoundException("No <%=
baseName%> found with <%=h.getName(idParam)%>: " + <%=
h.getName(idParam)%>);
}
<%}%>
return ao.getResult();
<%}%>
<%}%>
}
<%}%>
Note: Note the convention to
use typed Lists (generics). In the model we use array types
of the operations, e.g. Media[]
that is generated as List<Media>
.
This mapping is implemented in the method getTypeName
in the helper. This is a typical design decision that is implemented
in the templates as a convention.
Detailed descriptions of the code generation
of the other parts of the example system can be found in appendix:
Model to Template Mapping
The next step is to connect the model
elements with appropriate templates. Merlin is used to define
this mapping. It provides a drag-and-drop mapping user interface
between model elements and JET templates. A model element
is typically a class, operation or package, but it can be
anything defined in the model. The model element will be the
input argument to the template. It is also from the Merlin
JET Mapping Model we start the code generation. See
Appendix: Merlin Tools for more details.
Figure 7,
“Generated code” illustrates how the code is generated
by the JET templates with the mapped model elements as input
arguments.
Some parts of the system are generated
and some parts are hand written. The Lightweight Domain Specific
Modeling approach doesn't push full code generation as important,
something that DSM emphasizes. Instead we try to keep it as
simple as possible. If it is easier to write the code manually
it should be done manually. Also, it is important to be able
to use best of breed IDE tools together with code generation,
e.g. when debugging you must feel comfortable with the generated
code. See also
Appendix: Mixing Generated and Hand Written Code.