Tuesday, January 03, 2012

Continuous Integration for the Database

Managing changes to the database is often an overlooked problem but it is one of the biggest issues a team can face. The traditional model of development whereby a DBA is the key holder of the schema is a thing of the past for many teams. Now each developer is responsible for making schema changes. Many of them will be working on local copies of the database, which is a good thing. However, it can fall apart when the change leaves the developers and moves towards production.

In this post I will explore why this causes a problem and suggest a better solution.

The change script

Most changes to a database schema are handled using a simple script. The typical arrangement is a script which includes all the changes for a release. The developer will then apply this to each database involved in the deployment process. First to the build server, then to test and finally to production. When the release is finished the script is stored on a common share or maybe in source control.

This method does not work well. When Continuous Integration is used correctly, every commit to the build server represents a version of the system which could be released. The check-in should include all changes that are required to deploy that version to any environment. This means including changes to the code base and to the database. If the change relies on a script being run independently, then it it is not atomic. If there is one thing that the DBMS has taught us, it is that all changes must be atomic.

Add to source control

The simple solution to this problem is to add the database schema to source control. This brings schema changes in-line with changes to the code base. For simple changes this may just add a column to the table. A more complex piece of work may involve a script which also migrates existing data.

Deploy with build

When using Continuous Integration, every check-in triggers a new build. The build server checks out the latest version, compiles it, runs the unit tests and reports back if the build passed or failed. In the shared scripts model the developer has to apply the changes to the build database before committing the changes. If not, then the build will fail. When the schema changes are part of the commit, applying them is managed by the build server.

So now the change is atomic. If the build succeeds it has verified that the code, schema changes, and the migration are error free. This build can now be deployed to the next step in the deployment pipeline. If the build fails the developer has some bugs to fix.

Deploy locally

One of the biggest problems with the shared script method for managing schema changes is keeping all developers local databases up to date. Often this is achieved by a change script being emailed around the team. The onus is then on each developer to run all the scripts to keep their database up to date. If they don’t, then the next version of the build may not run on their machine.

When all schema migrations are part of the build process there is no onus. When a developer pulls the latest version from source control they just run an update process to apply all the schema changes. In fact it could be the same process which is used to deploy the code base to the different environments. This way every developer is also exposed to the process which runs the deployment.

Making the process work

Like any new process it has to help the people using it. In this case the people who are responsible for maintaining the database schema. In some organisations this may be two separate groups; developers and database administrators (DBAs).

Developers may not currently have to worry about applying their changes to production or UAT. They could be sending a script off to a DBA who then does the work for them. Asking them to be responsible for managing their changes to the schema is asking them to do more work.

DBAs are used to controlling the database schema and being the guardians of change. By automating the changes they loose some control and have to adapt to a new way of working. I would argue that the role of the DBA has changed though. The view that the application code base and the database schema are two separate entities is flawed and outdated. A change to the code is a change to the code, regardless of whether it is written in C# or SQL. Therefore the DBA must make their changes using the same mechanism as the developers. They have to check-in changes in to the source repository and monitor the state of the build.

For both groups it enforces more collaboration. Every check-in communicates to the whole team what someone is working on and how they have approached it. For people new to Continuous Integration this can be daunting as the notion of a failed build can be unsettling. So care must be taken to communicate the benefits. When I work with teams I point out that a failed build is not a bad thing. The team has discovered that this version will not work in production and now they can fix it.

Once the team has engaged with the process one way to loose this positive effort is to have a build process which causes more problems than it solves. If the new way of working introduces more friction for the developers on a daily basis then it will be abandoned. If everyone has to spend an hour every day updating their local database then they will skip it. As it has to work well and be simple to use, the choice of tool to manage the schema changes is important. The tools fall in to two categories, schema diff tools and migration tools. Schema diff tools compare a source with a target schema. They find the differences and create a script which applies the changes from the source to the target. Migration tools run a series of scripts held on the file system which encapsulate a single change. Once all the scripts have been applied to the target it is in-line with the current version in source control.

In my next blog post I will explore the merits of both types of tool and describe my experiences with them.

Conclusion

Traditionally source control is the home for application code with the database being maintained separately. Historically this could even have been by a different team. The growth in popularity of Continuous Integration and testing has shown how flawed this approach is. It has become clear that change to the database must be synchronised with changes to the code base. To achieve this the schema must be managed in source control.

This is the goal but it is not easy to achieve. Special tools must be used and the team must embrace a different way of working. The pay off though is less regression errors, quicker development and more confidence that changes will work when deployed to production.