Abstract
In this article we shall produce a simple
genetic algorithm in C#. It will not be multi-threaded,
nor will it contain exotic operators or convergence criteria
(i.e. a condition where many of the solutions found
are very similar). It will simply demonstrate a genetic
algorithm in managed code, taking advantage of some of the
features of the .NET runtime.
Introduction
A genetic algorithm is an optimization
technique that relies on parallels with nature. It can tackle
a variety of optimization techniques provided that they
can be parameterized in such a way that a solution to the
problem provides measure of how accurate the solution found
by the algorithm is. This measure we define as fitness.
Genetic algorithms were first conceived
in early 1970's (Holland, 1975). The initial idea came from
observing how the evolution of biological creatures derives
from their constituent DNA and chromosomes. In this sense
a simple analogy can be made with a mathematical problem
made up of many parameters. Each parameter can take the
place of a chromosome in the mathematical analogy of a real
chemical sequence.
In nature, evolution is carried out by
a process of selection typified by the expression survival
of the fittest. In order to select an individual, we need
a population of such individuals to choose from to produce
a new generation of individuals.
For any problem that we wish to solve,
we need some measure of the goodness of the solution, i.e.
fitness, often a χ2
(chi-squared) measure, i.e. the better the solution, the
higher the fitness returned from out function. The less
fit the solutions are, the less likely that they are to
survive to a successive population. By employing such a
technique, the algorithm can reduce the number of possible
solutions that it examines.
Many problems are internally represented
in binary by various genetic algorithms. Here we will only
consider a decimal representation. The internal representation
of a genetic algorithm does not actually matter provided
the implementation is thorough (Field, 1995).
The Problem
In our example code, we supply a test function
that uses sin and cos to produce the plot below:
The optimal solution for this problem is
(0.5,0.5), i.e. the highest peak. We choose this example
to demonstrate how a genetic algorithm is not fooled by
the surrounding local maxima (i.e. the high peaks).
Test Harness
We start by declaring a new genetic algorithm:
GA ga = new GA(0.8,0.05,100,2000,2);
ga.FitnessFunction = new GAFunction(theActualFunction);
where we the arguments are the crossover
rate, mutation rate, population size, number of generations,
and number of parameters that we are solving for. We declare
the FitnessFunction property as:
public delegate double GAFunction(double[] values);
public class GA
{
static private GAFunction getFitness;
public GAFunction FitnessFunction {
// etc.
};
// etc.
}
This then enables us to declare our fitness
function the same as the delegate function:
public static double theActualFunction(double[] values)
{
if (values.GetLength(0) != 2)
throw new ArgumentOutOfRangeException("should only have 2 args");
double x = values[0];
double y = values[1];
double n = 9;
double f1 = Math.Pow(15*x*y*(1-x)*(1-y)*Math.Sin(n*Math.PI*x)
*Math.Sin(n*Math.PI*y),2);
return f1;
}
which is therefore accepted by the algorithm.
The genetic algorithm is then set to run using:
ga.Go();
The genetic algorithm will now run for
the supplied number of generations.
The Algorithm
The algorithm code contains 2 simple classes,
GA and Genome, plus a helper class GenomeComparer .
The Genome class can be thought of as a
simple container. The underlying structure is an array of
doubles within the range of 0 to 1. The user is expected
to take these values and scale them to whatever values they
require. Since mutation occurs on the genome, the Mutate
method is found in this class. The Crossover operator requires
access to the private data of the Genome, so it is also
a member function which takes a second Genome, and outputs
two child Genome objects. The fitness of a particular genome
is also stored within the Genome object. There are some
additional helper functions that maybe found in the code
itself.
The GA class does all the work. The genetic
algorithm consists of the following basic steps:
- Create a new population
- Select two individuals from the population
weighting towards the individual that represents the best
solution so far.
- 'Breed' them to produce children.
- If we don't have enough children for
a new population return to step 2.
- Replace old population with new.
- If we have not produced enough generations
return to step 2.
- We have finished.
When selecting individuals to breed, we
use what is called the Roulette wheel method. This is where
fitter individuals have a larger proportion of the 'wheel'
and are more likely to be chosen. We chose to store the
finesses cumulatively in System.Collections.ArrayList
as it had some nice features like sorting. Unfortunately,
its binary search method was only for exact values, so we
had to implement the following work around:
mid = (last - first)/2;
// ArrayList's BinarySearch is for exact values only
// so do this by hand.
while (idx == -1 && first <= last)
{
if (randomFitness < (double)m_fitnessTable[mid])
{
last = mid;
}
else if (randomFitness > (double)m_fitnessTable[mid])
{
first = mid;
}
mid = (first + last)/2;
// lies between i and i+1
if ((last - first) == 1)
idx = last;
}
The GenomeComparer class inherits
from the IComparer interface. Each generation
is stored in a System.Collections.ArrayList ,
and we wish to sort each generation in order of fitness.
We therefore need to implement this interface as follows:
public sealed class GenomeComparer : IComparer
{
public GenomeComparer()
{
}
public int Compare( object x, object y)
{
if ( !(x is Genome) || !(y is Genome))
throw new ArgumentException("Not of type Genome");
if (((Genome) x).Fitness > ((Genome) y).Fitness)
return 1;
else if (((Genome) x).Fitness == ((Genome) y).Fitness)
return 0;
else
return -1;
}
}
Note that we need to explicitly cast the
ArrayList elements back to a Genome type. We
also make the class sealed as there is no point inheriting
from it.
A Quick Note On Operators
We mentioned briefly, two operators, crossover
and mutation, and we shall explain these in a little more
detail here.
Crossover simply takes two genomes, splits
them at some point and produces two new genomes by swapping
the end parts, e.g.
10 20 30 40
50 60 70 |
|
|
10 20 30 40
50 60 70 |
|
|
|
===> |
|
|
00 90 80 70 60 50 40 |
|
|
00 90 80 70
60 50 40 |
|
The split occurs at a randomly chosen point
along the length of the genome, and the split only occurs
if a probability test is passed. This is typically set quite
high which reflects what happens in Nature.
Mutation, in comparison, happens rarely
so the probability that it occurs is set quite low, typically
less than 5%. Each gene within the genome is tested in turn
to see it is allowed to mutate, and if so it is replaced
with a random number, e.g.
10 |
20 |
30 |
40 |
50 |
60 |
|
80 |
90 |
===>
|
10 |
20 |
30 |
40 |
50 |
60 |
|
80 |
90 |
Results
With our simple example we know that the
optimal solution is at (0.5, 0.5), and we find after 250
generations we find a solution extremely close to this (within
4 significant figures). The progress of the GA can be seen
below:
Conclusion and Notes.
A genetic algorithm does not provide a
magic bullet solution to all minimization/maximization problems.
In many cases other algorithms are faster and more practical.
However for problems with a large parameter space and where
the problem itself can be easily specified, it can be an
appropriate solution.
Whilst writing this I learnt several features
of C# that I'd like to summarize here (coming from a C++
background):
- The
System.Collections.ArrayList
container only does shallow copies.
- The
System.Collections.ArrayList
container's binary search only works for exact values.
- An obvious thing, but simply defining
a variable, doesn't necessarily assign it a value.
- Implementing the
IComparer
interface is fairly trivial (see GenomeComparer
class).
Further improvements to the algorithm can
be made by implementing all sorts of weird and wonderful
operators. I'm sure someone will be able to tell me an easier
way to supply the number of parameters the problem has automatically
rather than using the GA constructor and the delegate function.
|