Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at imagehelp@codebetter.com
A pragmatic approach to maintaining the integrity of reference data

A data-driven application has to deal with static reference data and dynamic data. Reference data is never changed by the system or its users. Generally reference data is only changed by the DBA. While dynamic data is every other bit of data the system deals with. Simple reference data, such as lists of object types, is often mirrored in code through the use of an enumerated type. In most languages we have enumerations which are effectively named integers. Thus they are a perfect candidate to represent reference data, that is mapping a value (the primary key) to a textual description (the user-readable part). By the way - for the purposes of this discussion I define "data-driven" as software which exists mainly to present and manipulate data and the interconnections found within. This covers software used for purposes such as reporting, scheduling, and asset management; and not things like games, message processing and the like.

A common issue developers encounter when programming against a database is that of how to keep enumerations in sync with their table counterpart. In other words, the enumeration (enum) value must map precisely to a corresponding primary key value. The reason that this is important is that other tables which reference the enum's table will have foreign key constraints which may be violated if code attempts to insert an enum value which is not present in the enum's table.

If we have all the values in the database and they can be referenced from other tables, why do we even need to duplicate this in enum form? The reason is that in business layer code we need to check the value of properties with enumerated types. Without representations in code of the values from the enum table, we could not implement any useful business logic - it would all have to reside in the database because this would be the only place we would have concrete values.

This is a simple concept and easy to implement. However, as mentioned, it is important to be cautious when retrieving and updating enum values from and to the database. I'd like to discuss a solution to this issue. It is a "three-pronged attack" making use of:

  1. Unit tests,
  2. Data dump scripts, and
  3. Build automation

Say we have a reference data table AppointmentType, with four records:

1    Board Meeting

2    Team Meeting

3    Lunch Meeting

4    Department Meeting

 

We'll need a corresponding enum AppointmentTypes:

using System; using System.Collections.Generic; using System.Text; using System.ComponentModel; namespace DataDrivenEnum.Domain.Appointment { public enum AppointmentTypes { [Description("Board Meeting")] BoardMeeting = 1, [Description("Team Meeting")] TeamMeeting = 2, [Description("Lunch Meeting")] LunchMeeting = 3, [Description("Departmental Meeting")] DepartmentMeeting = 4, } }

 
The key things to note here are:

  • We are able to manually set the enum values so that they correspond with the primary key values.
  • A Description attribute can be applied, which you can get at with Reflection. This is useful when you need a more readable version of the enumeration value label.

This is all fine. But what happens if the DBA goes and reinserts the data AppointmentType without resetting the key sequence, or a new guy on the team adds a new meeting type ManagementMeeting to the enum and doesn't add the corresponding record to the table? Exceptions and unexpected behaviour! Yes, a responsible software team should have change management processes in place but we've all seen these things happen under the crunch. Admit it! :)

So to avoid this we need to add some process around the reference data. Namely we want to make building, data import, and testing very easy to do. This can be done manually however the alternative is much more appealing. This is using NAnt and CruiseControl.NET to automate your builds. Setting this up is out of the scope of this discussion, but I've put some links about these at the end. Apart from automating the build, we also want to automate the data import, and the running of unit tests. By the way, this all falls under the topic of Continuous Integration which everybody should know about and practise. 

First of all, we need to generate a data dump script for the reference data table. This is easily done in SQL Server 2005 Management Studio and probably even more easily using command line database clients like with MySQL and PostgreSQL. Ideally you want a bunch of insert statements, as these are easy to edit and run manually when you need to.

Next a reference data import script needs to be created. As I said, you could skip this step and manually import each table but that is not very good use of time now, is it? This script needs to use some kind of command-line tool to grab each reference data dump script and run it against the correct database. There are various ways to do this. A tool such as this can be used, which allows you to work with the data using C# but drive the entire process from the command-line via batch files. You could also write a small C# console app which just takes the insert statements and runs them using ADO.NET. It depends on the amount flexibility you need. This will allow you to easily and quickly deploy a fresh database with all your reference data intact.

Now we have our data dealt with, we need to ensure that the enumerations and data are both up-to-date. This is effectively done using unit tests. This can be done with the unit testing framework of your choice as long as the test runner can be driven by the build automation tool. The example shown here was done using the Microsoft testing framework. The code is pretty close to NUnit anyway. The unit test to ensure that Appointment Types are in sync between the database and the code looks something like this:

       

[TestMethod] public void AppointmentTypesIntegrityTest() { List<AppointmentType> appointmentTypeList = AppointmentBL.AppointmentTypesGet(); Assert.IsNotNull(appointmentTypeList, "Appointment types could not be retrieved from the database."); Assert.IsTrue(appointmentTypeList.Count > 0, "No appointment types in the database."); Type typeOfAppointmentTypes = typeof(AppointmentTypes); Assert.IsTrue(appointmentTypeList.Count == Enum.GetValues(typeOfAppointmentTypes).Length, "Number of appointment types differs between DB and Enum."); foreach (AppointmentType appointmentType in appointmentTypeList) { Assert.IsTrue( Enum.IsDefined(typeOfAppointmentTypes, appointmentType.Id), string.Format("Appointment type '{0}' is defined in the database but not in the AppointmentTypes enum.", appointmentType.Description) ); } foreach (int appointmentTypeId in Enum.GetValues(typeOfAppointmentTypes)) { Assert.IsNotNull( appointmentTypeList.Find( delegate( AppointmentType x ) { return x.Id == appointmentTypeId; } ), string.Format("Appointment type '{0}' is defined in enum but not in DB", Enum.GetName(typeOfAppointmentTypes, appointmentTypeId) ) ); }

You may be wondering why we need the AppointmentTypes enum and the AppointmentType class. The AppointmentType class is the actual domain object, whereas AppointmentTypes is essentially a "look-up". Most of the time we only need to know the type of an Appointment so that class will have an AppointmentTypes property like so:

namespace Joe.MeetingSystem.Domain.Appointment { public class Appointment : DescribedEntityBase { private AppointmentTypes appointmentType; public AppointmentTypes AppointmentType { get { return appointmentType; } set { appointmentType = value; } } // ...more appointment stuff... } }


Once we know the type of an Appointment, we might want to retrieve some data about the AppointmentType and use this to create or modify the Appointment. Using the enum's value we can instantiate an AppointmentType object and get what we need. Since enum values are easily cast to ints this is no big deal. Why doesn't the AppointmentType class have an AppointmentTypes property? That would make sense because that will mean the AppointmentType gets its id and description from the enum value, right? Yes, that would be true, but the reason I haven't implemented things this way is because I have found it very useful to have every reference data class inherit from a common base class which provides an Id and a Description field. This makes it easy to manipulate lists of the base class instead of having to always use the concrete type. We can't use an enum instead because enums cannot make use of inheritance. Thus we essentially end up with a "light-weight" version of the AppointmentType class, which we can use to identify the type of Appointment classes. It does it's job, and does it well.

The AppointmentType class looks like this:

namespace Joe.MeetingSystem.Domain.Appointment { public class AppointmentType : DescribedEntityBase { private int usualLengthInHours; public int UsualLengthInHours { get { return UsualLengthInHours; } set { UsualLengthInHours = value; } } private string defaultLocation; public string DefaultLocation { get { return defaultLocation; } set { defaultLocation = value; } } } }


DescribedEntityBase holds an id and description which almost all classes require to relate them to a database table:

namespace Joe.MeetingSystem.Domain { public class DescribedEntityBase : IDescribedEntity { private int id; public int Id { get { return id; } set { id = value; } } private string description; public string Description { get { return description; } set { description = value; } } } }



You can see that if you make sure your reference data types all implement a common interface, the unit test code could be easily made into a template which you could use to check all types which implement that interface.

Putting it all together the process will work like this:

  1. Developers update the reference data dump scripts whenever they change the enumeration version.
  2. When it comes to deploy, the build process will:
    1. Drop everything in the database and import the reference data.
    2. Run the unit tests
    3. Present the success/failure of the unit tests
The power in this process is that it is a sentinel which makes sure that changes to the code or database will not break the system.None of this is difficult, however I believe it is worth discussing since it is a pragmatic approach to reducing just one source of potential chaos in the software development process.

Please let me know if you'd like the solution I whipped up while generating the code. 

Further Reading on Build Automation:

http://del.icio.us/joe.niland/buildautomation


Posted 10-02-2006 1:38 PM by Joe Niland

[Advertisement]

Comments

mgrzeg wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-04-2006 8:15 PM

In some o/r mappers (like sooda - www.sooda.org) there's other solution than enums - object constants. For me it's more natural, than enums, because i work with objects having access to all fields I need (not only some attribs accessed by reflection). For instance you may need to distinguish some special accounts in your users table - administrator, test account and others and reference them in your code as User.Administrator, User.Test, or User.SpecialAccounts.Administrator, etc (very good candidates to become the integer constants in an enums table). Or regarding your example: Appointment.BoardMeeting, Appointment.TeamMeeting, and so on. Such constants are so OO, unknown in the set-world (driven by SQL queries), that for me there's no place to store them in the db. Metadata itself, provided by database's own services - of course, but nothing more (ie our own extensions to metadata stored in db as raw data).

Much better place for such mappings is a mapping file - the must have file in any o/r mapping solution.

Having this you don't need any additional attributes, reflections, etc - you get the right object directly from the table, as any other.

I know, that enums stand for realy lightweight solutions and such are often welcome while optimizing the code. But they require more care in different places - at the db side and object entities side. I think I'm too lazy for this and rather choose placing constant data in mapping file(s) - in just one place to take care of.

Joe Niland wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-05-2006 7:23 AM

That is an interesting point. So are you saying that using the properties such as Appointment.BoardMeeting you get compile-time checking, and then you let the ORM framework fill this property with the correct value from the DB?

When you say "...there's no place to store them in the db..." - I don't understand how you can achieve referential integrity if you don't store them in the DB? Or are you saying that the object model is not entirely represented in the DB?

mgrzeg wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-05-2006 8:11 AM

Regarding to the sooda.org documentation: you may put some consts in your mapping file, for example:

<class name="User">

 <table name="User">

   <field name="ID" type="Integer" primaryKey="true"/>

   <field name="Name" type="String" size="40"/>

 </table>

 <const name="Administrator" value="1"/>

 <const name="Guest" value="2"/>

 <const name="SystemUser" value="3"/>

</class>

where values refer to the primaryKey (see sooda documentation).

After each such a change in your schema file you have to recompile your code (the pre-build event rebuilds dal) and then you have full access at the design time to all properties of these 'object constants'.

Of course - it requires recompilation and as such is unusable in dynamic environments, where the user of the application needs to add another constant. But - solution based on enums also requires recompilation (or I don't understand something)...

Maybe I've something misunderstood, but solution based on schema file (avoiding enums in db) addresses also your needs... am I right?

Joe Niland wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-05-2006 8:27 AM

Yes, you are right - the enum solution still requires recompilation everytime something is added/removed. That is definitely the tedious part :)

I have not yet looked at the sooda documentation (it looks like a good framework btw) but does it check  that the constants actually map to currently present primary key values? If not, then I think that it doesn't fully address the need to ensure integrity between compile-time data and db data. Or maybe I am missing something.

I remember in one project, we were using an XML file to map between database objects and the DAL code and if the XML file had a typo or discrepancy it was often hard to find what was wrong. The point of that method, btw, was so that the mapping file could be re-written to deal with another database type, and the code could remain the same. We had SQL Server and MySQL at that place so it was kind of worth it.

So in that respect, the enum + unit tests way of doing things is good in that it is easy to check that code matches db and vice versa. I suppose you could just as easily parse the XML mapping file and check that the constants exist as pkeys in the db.

mgrzeg wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-05-2006 8:27 AM

btw. before switching to sooda I was a fan of enums in db and used them wherever I needed some consts from db. (added tables in TableEnums pattern, for example UsersEnums and used MyGeneration as DAL generator). Of course, I've lost some functionality after switchinig to sooda, but mostly I don't need anything more, than sooda offers.

I realy enjoy your posts, because of their originality and touched subjects. I'm truly involved, so keep bloggin'! :)

Joe Niland wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-05-2006 5:12 PM

Yes I can see where you're coming from.

Thankyou.. will do. I also enjoy your discussions.

Cheers!

Michal Grzegorzewski wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-05-2006 6:56 PM

Sorry for not introducing myself earlier, but I could leave a comment anonymously and thought that it doesn't matter at all who am I or where am I from...

Regarding sooda - open-source o/rm I have to say that I'm not connected to this project in any way (the only contribution I did is a MyGeneration template for mapping schema file, which I posted few days ago to the MyGen template lib). Few months ago I had an opportunity to listen to the presentation of sooda driven by a guy who did 99% of work at this project and after the presentation I could talk to him about this and other of his projects, which firstly didn't impress me at all. Now he works for Microsoft at EDM and ADO.NET vNext projects, but left sooda in very stable revision.

I think it's quite good project and for me it's a very good source of well-written code in different areas of software development.

Joe Niland wrote re: A pragmatic approach to maintaining the integrity of reference data
on 10-06-2006 7:31 PM

I didn't think you were connected - just that you liked the framework :) I checked it out and it does look very interesting and worth a try.. thanks for pointing me to it..

Billy McCafferty wrote Show User-Friendly enum Translations
on 08-13-2007 3:59 PM

It's common to populate a drop-down box with enum values or, similarly, to display an enum setting to

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

About The CodeBetter.Com Blog Network
CodeBetter.Com FAQ

Our Mission

Advertisers should contact Brendan

Subscribe
Google Reader or Homepage

del.icio.us CodeBetter.com Latest Items
Add to My Yahoo!
Subscribe with Bloglines
Subscribe in NewsGator Online
Subscribe with myFeedster
Add to My AOL
Furl CodeBetter.com Latest Items
Subscribe in Rojo

Member Projects
DimeCasts.Net - Derik Whittaker

Friends of Devlicio.us
Red-Gate Tools For SQL and .NET

NDepend

SlickEdit
 
SmartInspect .NET Logging
NGEDIT: ViEmu and Codekana
LiteAccounting.Com
DevExpress
Fixx
NHibernate Profiler
Unfuddle
Balsamiq Mockups
Scrumy
JetBrains - ReSharper
Umbraco
NServiceBus
RavenDb
Web Sequence Diagrams
Ducksboard<-- NEW Friend!

 



Site Copyright © 2007 CodeBetter.Com
Content Copyright Individual Bloggers

 

Community Server (Commercial Edition)