Tuesday, February 07, 2012

Tools for migrating the database

In my last post controlling the database I showed that putting the database schema in source control brings an easier work-flow for developers and more reliable deployments. In this post I will look at the tools available to automate the process. They fall in to two types, schema diff tools and migration tools.

Schema diff tools

Schema diffs tools work by comparing each object in source database to those in a target database. From this comparison, they produce a script. This script will update the target to match the source and includes adding and dropping columns in tables. Two examples of schema diff tools are Red Gate SQL Compare and Microsoft’s Visual Studio Database project.

Visual Studio Database project

The Visual Studio Database project adds the database to Visual Studio. The database schema then lives in the same solution as the application code. Visual Studio has a schema diff tool which can import an existing database. When it is finished there will be a file in the project for every object in the schema. When a change is made to one of the objects the project can be deployed to a target database to bring it up to date. Visual Studio creates an in memory representation of the database so it can support refactoring of database objects. This is clever and powerful. It provides developers with a lot of support and confidence when making changes to the schema.

I have worked on a project which used the Visual Studio Database. The database was new and it worked well overall. The use of the Visual Studio Database project made sure that all developers had a current copy of the schema and deployments were simpler. There are some downsides though.

Merge conflicts

As every table, key and index is represented by an individual file, it also needs an entry in the project file. This caused a merge conflict for the project file which was either very hard or impossible to resolve. The team’s solution was to put and exclusive lock on the project file so only one person could update.

Failed migrations

The second problem is due to the nature of a schema diff tool. They compare two schemas and produce a script to bring one in line with the other. As this is machine generated it does not examine how to move those schemas in-line with one another. For example, if a field changed from NULL to NOT NULL, what happens to the existing empty fields? When the Visual Studio Database Project encounters this, it stops the migration. The onus is then on each developer to find a work around. The Visual Studio Database project has a mechanism to avoid this in a series of scripts that are always run before or after a deployment. As these scripts are run all the time they can become very large and difficult to manage.

Cost

My final comment about the Visual Studio Database project is the cost. It is only available in the Premium and Ultimate editions. At the time of writing Amazon is selling Premium for £2,112 and Ultimate for £10,792.

Migrations tools

Schema migration tools use a series of scripts to manage the changes to a database. When the migration is run against the target database it applies all the changes which have been created since the last run. These tools do not extract changes from the database. Instead the developer must write a script to migrate the schema. Two examples are RoundhousE, Ruby on Rails migrations and Tarrantino. RoundhousE RoundhousE is an open source migration tool. It does not integrate with Visual Studio or provide the refactoring abilities of the Visual Studio Database project. But then it takes a different approach which removes the need for the heavy weight tooling. How does it do this? RoundhousE is based on SQL scripts kept in a set of directories. The directories are split in to two main groups; any time scripts and one time only scripts.

Any time scripts are procedures, views, functions and indexes. If the entity is missing from the database Roundhouse will run the script to create it. If the entity has changed it will run the script to update it.

One time scripts are what make RoundhousE a migration tool. They contain the SQL to change the schema for a particular iteration. Each file is prefixed with a number and they are run sequentially.

Migrations

To demonstrate this I have a simple example application to track students. The first task is to make a screen which captures the students name. As part of this task I create a table, but instead of creating the table on my local instance of SQL Server I add this script to the RoundHousE “up” folder, which is the default location for the one time scripts

File called 0001_Make_student_table.sql

CREATE TABLE Student Id int NOT NULL, Name varchar(50)


The file name is important as RoundhousE will run the scripts in numerical order. Running RoundhousE against the local database executes the script and creates the table.

The task is compete and so the changes are committed to source control triggering a new build. The build server also starts of RoundhousE against it’s own instance of SQL Server, before running the unit tests. RoundhousE runs through the scripts in the “up” folder and applies all those since the last time it was run.

The next feature is to also capture the student’s email address. To meet the specification the student table needs a new column, so I add the following migration script:

File called 0002_Add_email_to_student_table.sql 


ALTER TABLE Student ADD email varchar(100) NULL

I check in and the same process runs so the build server has also updated the Student table on it’s local database.

When another developer pulls the latest changes form source control, all they have to do is run Roundhouse and it will run the two new scripts and their schema is also up to date.

Deploying the changes follows the same process as the build server. The call to RoundhousE is added to the deploy script which updates the target schema. RoundhousE will also read a version number from a file and write this to the database. This is very useful, as it is possible to see that UAT is running version 1.0.44.1 while production is running 1.0.33.2.

Conclusion

Adding a tool to manage database changes makes development easier and deployments safer. With out it the developer has to remember to run any schema changes on the build server before committing the change, otherwise the tests will fail. Without a tool the person responsible for the deployment has to compile a large SQL script. As the various environments are managed by hand it maybe that UAT is out of step with Production. This means that the deployment script will have to be tested on a copy of the target environment before it is run. These complications lead to day long deployments. With a tool the changes to the database are all contained in the release package ready to be run. The tool will decide what needs to be updated since the last release. When a team works with this method for a while releases become more frequent because they are less work and more reliable.

5 comments:

David Atkinson said...
This comment has been removed by the author.
David Atkinson said...

Are you aware that Red Gate now supports a migrations approach? We'd be interested to get your thoughts on this.

Keith Bloom said...

Hi David,

I've heard about the new Red Gate tools for managing the database schema but I don't have any first hand experience. Do you have any links or examples of it?

I have some posts coming up soon about the work flow I use with Roundhouse and it would be interesting to compare them.

Keith.

David Atkinson said...

I came across your post while putting this together: http://geekswithblogs.net/SQLDev/archive/2012/05/09/list-of-resources-for-database-continuous-integration.aspx

This should give you some background. I'd be happy to demo this to you via remote desktop and discuss how this compares to other approaches.

Macrosoft said...

Great tools for migration.

ASP.Net Migration