A data-driven application has to deal with static reference data and dynamic data. Reference data is never changed by the system or its users. Generally reference data is only changed by the DBA. While dynamic data is every other bit of data the system deals with. Simple reference data, such as lists of object types, is often mirrored in code through the use of an enumerated type. In most languages we have enumerations which are effectively named integers. Thus they are a perfect candidate to represent reference data, that is mapping a value (the primary key) to a textual description (the user-readable part). By the way - for the purposes of this discussion I define "data-driven" as software which exists mainly to present and manipulate data and the interconnections found within. This covers software used for purposes such as reporting, scheduling, and asset management; and not things like games, message processing and the like.
A common issue developers encounter when programming against a database is that of how to keep enumerations in sync with their table counterpart. In other words, the enumeration (enum) value must map precisely to a corresponding primary key value. The reason that this is important is that other tables which reference the enum's table will have foreign key constraints which may be violated if code attempts to insert an enum value which is not present in the enum's table.
If we have all the values in the database and they can be referenced from other tables, why do we even need to duplicate this in enum form? The reason is that in business layer code we need to check the value of properties with enumerated types. Without representations in code of the values from the enum table, we could not implement any useful business logic - it would all have to reside in the database because this would be the only place we would have concrete values.
This is a simple concept and easy to implement. However, as mentioned, it is important to be cautious when retrieving and updating enum values from and to the database. I'd like to discuss a solution to this issue. It is a "three-pronged attack" making use of:
- Unit tests,
- Data dump scripts, and
- Build automation
Say we have a reference data table AppointmentType, with four records:
1 Board Meeting
2 Team Meeting
3 Lunch Meeting
4 Department Meeting
We'll need a corresponding enum AppointmentTypes:
using System;
using System.Collections.Generic;
using System.Text;
using System.ComponentModel;
namespace DataDrivenEnum.Domain.Appointment
{
public enum AppointmentTypes
{
[Description("Board Meeting")]
BoardMeeting = 1,
[Description("Team Meeting")]
TeamMeeting = 2,
[Description("Lunch Meeting")]
LunchMeeting = 3,
[Description("Departmental Meeting")]
DepartmentMeeting = 4,
}
}
The key things to note here are:
- We are able to manually set the enum values so that they correspond with the primary key values.
- A Description attribute can be applied, which you can get at with Reflection. This is useful when you need a more readable version of the enumeration value label.
This is all fine. But what happens if the DBA goes and reinserts the data AppointmentType without resetting the key sequence, or a new guy on the team adds a new meeting type ManagementMeeting to the enum and doesn't add the corresponding record to the table? Exceptions and unexpected behaviour! Yes, a responsible software team should have change management processes in place but we've all seen these things happen under the crunch. Admit it! :)
So to avoid this we need to add some process around the reference data. Namely we want to make building, data import, and testing very easy to do. This can be done manually however the alternative is much more appealing. This is using NAnt and CruiseControl.NET to automate your builds. Setting this up is out of the scope of this discussion, but I've put some links about these at the end. Apart from automating the build, we also want to automate the data import, and the running of unit tests. By the way, this all falls under the topic of Continuous Integration which everybody should know about and practise.
First of all, we need to generate a data dump script for the reference data table. This is easily done in SQL Server 2005 Management Studio and probably even more easily using command line database clients like with MySQL and PostgreSQL. Ideally you want a bunch of insert statements, as these are easy to edit and run manually when you need to.
Next a reference data import script needs to be created. As I said, you could skip this step and manually import each table but that is not very good use of time now, is it? This script needs to use some kind of command-line tool to grab each reference data dump script and run it against the correct database. There are various ways to do this. A tool such as this can be used, which allows you to work with the data using C# but drive the entire process from the command-line via batch files. You could also write a small C# console app which just takes the insert statements and runs them using ADO.NET. It depends on the amount flexibility you need. This will allow you to easily and quickly deploy a fresh database with all your reference data intact.
Now we have our data dealt with, we need to ensure that the enumerations and data are both up-to-date. This is effectively done using unit tests. This can be done with the unit testing framework of your choice as long as the test runner can be driven by the build automation tool. The example shown here was done using the Microsoft testing framework. The code is pretty close to NUnit anyway. The unit test to ensure that Appointment Types are in sync between the database and the code looks something like this:
[TestMethod]
public void AppointmentTypesIntegrityTest() {
List<AppointmentType> appointmentTypeList = AppointmentBL.AppointmentTypesGet();
Assert.IsNotNull(appointmentTypeList, "Appointment types could not be retrieved from the database.");
Assert.IsTrue(appointmentTypeList.Count > 0, "No appointment types in the database.");
Type typeOfAppointmentTypes = typeof(AppointmentTypes);
Assert.IsTrue(appointmentTypeList.Count == Enum.GetValues(typeOfAppointmentTypes).Length,
"Number of appointment types differs between DB and Enum.");
foreach (AppointmentType appointmentType in appointmentTypeList) {
Assert.IsTrue(
Enum.IsDefined(typeOfAppointmentTypes, appointmentType.Id),
string.Format("Appointment type '{0}' is defined in the database but not in the AppointmentTypes enum.",
appointmentType.Description)
);
}
foreach (int appointmentTypeId in Enum.GetValues(typeOfAppointmentTypes)) {
Assert.IsNotNull(
appointmentTypeList.Find(
delegate( AppointmentType x ) {
return x.Id == appointmentTypeId;
}
),
string.Format("Appointment type '{0}' is defined in enum but not in DB",
Enum.GetName(typeOfAppointmentTypes, appointmentTypeId)
)
);
}
You may be wondering why we need the AppointmentTypes enum and the AppointmentType class. The AppointmentType class is the actual domain object, whereas AppointmentTypes is essentially a "look-up". Most of the time we only need to know the type of an Appointment so that class will have an AppointmentTypes property like so:
namespace Joe.MeetingSystem.Domain.Appointment
{
public class Appointment : DescribedEntityBase
{
private AppointmentTypes appointmentType;
public AppointmentTypes AppointmentType {
get { return appointmentType; }
set { appointmentType = value; }
}
// ...more appointment stuff...
}
}
Once we know the type of an Appointment, we might want to retrieve some data about the AppointmentType and use this to create or modify the Appointment. Using the enum's value we can instantiate an AppointmentType object and get what we need. Since enum values are easily cast to ints this is no big deal. Why doesn't the AppointmentType class have an AppointmentTypes property? That would make sense because that will mean the AppointmentType gets its id and description from the enum value, right? Yes, that would be true, but the reason I haven't implemented things this way is because I have found it very useful to have every reference data class inherit from a common base class which provides an Id and a Description field. This makes it easy to manipulate lists of the base class instead of having to always use the concrete type. We can't use an enum instead because enums cannot make use of inheritance. Thus we essentially end up with a "light-weight" version of the AppointmentType class, which we can use to identify the type of Appointment classes. It does it's job, and does it well.
The AppointmentType class looks like this:
namespace Joe.MeetingSystem.Domain.Appointment
{
public class AppointmentType : DescribedEntityBase
{
private int usualLengthInHours;
public int UsualLengthInHours {
get { return UsualLengthInHours; }
set { UsualLengthInHours = value; }
}
private string defaultLocation;
public string DefaultLocation {
get { return defaultLocation; }
set { defaultLocation = value; }
}
}
}
DescribedEntityBase holds an id and description which almost all classes require to relate them to a database table:
namespace Joe.MeetingSystem.Domain
{
public class DescribedEntityBase : IDescribedEntity
{
private int id;
public int Id {
get { return id; }
set { id = value; }
}
private string description;
public string Description {
get { return description; }
set { description = value; }
}
}
}
You can see that if you make sure your reference data types all implement a common interface, the unit test code could be easily made into a template which you could use to check all types which implement that interface.
Putting it all together the process will work like this:
- Developers update the reference data dump scripts whenever they change the enumeration version.
- When it comes to deploy, the build process will:
- Drop everything in the database and import the reference data.
- Run the unit tests
- Present the success/failure of the unit tests
The power in this process is that it is a sentinel which makes sure that changes to the code or database will not break the system.None of this is difficult, however I believe it is worth discussing since it is a pragmatic approach to reducing just one source of potential chaos in the software development process.
Please let me know if you'd like the solution I whipped up while generating the code.
Further Reading on Build Automation:
http://del.icio.us/joe.niland/buildautomation
Posted
10-02-2006 1:38 PM
by
Joe Niland