Multitenancy
isn't easy, and one of the particularly challenging aspects is what to
do with the database. In Ayende's excellent posts on multi tenancy,
he's pretty clear that he feels a database per tenant is the way to go . There are two major, reasonable approaches that I know of: First, to have a database per tenant. (This is the approach discussed
by Ayende in his multi tenancy series) A second common alternative is to have
a tenant id column in all tables that require differentiation by
client. My team was in a situation requiring multitenancy, and needed
to select an approach. Wary of the idea of multiple databases, the
team immediately decided to go with the tenant column approach.
Looking
back, I think Ayende was right. The one advantage of the tenantid
column system is not having to deal with multiple databases. There are
many disadvantages. I'll discuss the advantages, and then give some thoughts on how to deal wwith the complexity of working with multiple databases.
Advantages of multiple databases:
In
my mind the most important developer-centric advantage of having
multiple databases is the ability to refactor the database. Many
times, developers will design a system a system with certain
assumptions, then realize those assumptions are wrong. Other times, a
major new feature would be much easier and cleaner with certain system
changes. If the database is only talking to one system, then when the
developer comes upon these situations, s/he can also refactor the
database to reflect that understanding, and keep the mapping between
database simple, clean and easy to understand. When the code is pushed
live, db update scripts also get pushed live. However, if multiple
systems (in this case tenants) talk to the same database, this kind of
db structure change just can't happen-pushing a db change would alter
everyone's db, meaning everyone's code would also have to be updated at
once. In many situations, that kind of simultaneous mass upgrade would
be totally unacceptable, meaning that once this kind of multitenancy
system is put in place, the single db can never again be altered.
Another
advantage of multiple databases is the ability to easily load-balance
by moving individual tenants to different db servers as necessary. If
the databases are architected to be isolated, this isn't a
problem-change a few connection strings, and you're done. In a shared
system, all tenants suffer if one tenant is overloading the server.
On
a related note, though nobody likes to think about it, clients
sometimes leave, and when they do, they will want to be able to take
their own data with them. If they have their own database, it's as
easy as zipping it up and getting it to them. If all tenants are on
the same db, it could be a much more difficult exercise involving
picking through every table and selecting out the tenant-specific
data. Hopefully you dont give them anything that isn't theirs. Which
cleads to the next point.
From a business perspective, data
separation is probably the most important reason to go with separate
dbs. With a multitenant system, it's impossible for one tenant to end
up with anyone else's data. Other tenants' data is in a completely
different system, so there's no worry about messing up the tenantid and
passing a client someone else's data. Depending on the situation, this
could just be embarrassing; in other situations, you might be looking
at a lawsuit.
Lastly, having separated databases makes many
things in the application more complicated, for the life of the
application. Every join, every query, every new table, every mapping
has to deal with the tenant id. If the strategy is to use the tenant
id to make a composite key on the tables, then you've got that much
additional complexity-and if you're using a tool like NHibernate,
you'll soon find out that working with composite keys is a LOT less
seamless than a single autogenerated key. Eliminate the tenantid, and
a whole lot of things become forever simpler, clearer and have a lot
less potential for error.
How to deal with multiple databases?
The
trouble with multiple databases is that now you have 20 or even 100
databases to manage instead of one. Managing this manually would be a
difficult, error-prone task that could become overwhelming. In
addition, any shared data can become out of sync. The solution? Don't
do *anything* manually with these databases.
For versioning:
each time you create a new version of your application, create a db
script to update the db from the prior version from the last version to
the newest one, and create a script to roll the database back from the
newest version to the previous version. There are tools to help
produce this kind of script... If you're programming in .Net, Visual
Studio Database edition has a tool to help do this for SQL Server, and
Red Gate has one as well called SQL Compare. Bundle this script in
with the deploy of the code, and push it all out together, to each
tenant as they get upgraded.
For shared data: 99% of the
time, shared data is of the sort where it needs to be updated soon, but
not necessarily real-time. An example of this is cross-client data for
statistical computation. To allow this kind of computation in a
multiclient situation, create a process that goes to each system and
collects the appropriate data, possibly in varying ways based on the
tenant's installed version, then pulls that data into a single system.
The combined reporting system can be optimized and tweaked for
reporting needs without affecting the designs of the individual
tenants. Conversely, if there's any reference data they all need to
share (dropdown data, etc) this information can be coordinated with a
similar process that does a data push to the different systems, based
on application version.
This kind of automation is some up-front
work, but once it's done, it's done, and it allows tremendous
application flexibility... and once it's set up, it's easy-there's no
reason it couldn't become fairly invisible from a management
perspective.. In addition, it's really a very separate task from the
business application itself-it's actually so separate that the syncing
application can be written by a completely different team that knows
little to nothing about the main application. For that matter, I'm not
sure if there's a tool out there that does all of these things
already(I bet there is, come to think of it), but there's no reason
such a tool couldn't exist.
Posted
04-24-2009 12:49 AM
by
Anne Epstein