It’s easy to decide that you’re going to store changes to your database in source control, but it can be difficult to figure out what and where it should be stored. In this article I’ll explain what I like to see put into source control, and how I layout my source control folders. There are 3 types of files that I like to keep in source control, and these correspond to the 3 higher-level directories in my file structure. These are scripts, source, and DDL.
The scripts folder contains scripts that run outside of the database, but whose presence is critically important to the database. These are Perl or Shell, or (insert scripting language du jour here) scripts. Good examples might be custom backup scripts, configuration files for programs like mylvmbackup, a log scraper written in Perl, or a script that kicks off a mysqldump with special parameters that you use often but don’t want to type over and over again.
The Source folder contains scripts for what I like to call “Destructible” objects. These are scripts for routines, triggers, views, events, etc. Basically these are items that can be replaced, and in fact I do replace them each time I push a new release (more on that in the future). These objects can be replaced because although they manipulate data – they are not directly responsible for it’s storage. In order for something to get into the source folder, I give it a basic test – can I safely destroy this object and recreate it in the same script without losing any of my data. My rule is that each object gets it’s own script, and the script is named schema.object_name.sql. I like to have subfolders in the source directory for each type – so I have a routines folder, a views folder, an events folder, and a trigger folder.
The DDL folder contains your SQL for creating and manipulating tables. I like to call these “Non-Destructible” objects because you cannot safely destroy them. I realize the correct term should be “Indestructible”, but for some reason whenever I tell an Engineer that a table is indestructible, they want to test the theory by attempting to drop it. How you address your DDL folder depends on how you plan to release your database. There are two ways I can think of:
- You release your database as part of an application install. In this case the database goes empty or has a minimal amount of data that gets inserted when it’s created. The important factor here is that, no matter how many times you release the database, or what version it’s on, the database is always created anew.
- Your database is part of a living application. The application is online and it’s constantly changing. Your database must change too, but you have to be very careful because you can’t afford to accidentally lose data, and you can’t afford to take the database down for long periods of time to make changes.
If you have the first type of database, you DDL folder can be pretty straightforward. You can simply put your table create statements directly in the folder and be done. Although I know the tendency is to have a single script to create the application’s database, I would advice against storing it in source control that way. It makes it much harder to track changes, look at diffs, and can cause a lot of merges if you have mutiple engineers working on the database. If there are many files, then there is a good chance the engineers will be touching different files because they are working on different features. It is still feasible to ship a single SQL statement – just put a separate script in your Scripts folder that concatenates all the files together. If you are shipping this type of database, you might want to consider also having a DATA folder in your tree.
So your source folder structure might look like this:
DATABASE +-- DDL | +-- DATA | +-- SCRIPTS | +-- SOURCE +-- VIEWS +-- ROUTINES +-- TRIGGERS +-- EVENTS
If you have the second type of database then your DDL folder becomes a lot more complicated. You can’t simply drop your database and create it from scratch – you need to preserve your existing data. In fact, you’ll probably be looking for creative ways to modify your database without incurring downtime. In this case, I advocate creating an individual folder under the DDL folder for each new release. So underneath your DDL folder you might have folders named “1.0″,”1.1″,”1.2″ and “1.3″. The 1.0 folder might include your initial database creation script (if you were smart enough to start source control before creating your prototype). Subsequent folders will contain only push and rollback scripts for specifc chanages to your database – scripts that alter tables, add new tables, drop old tables, etc. I’ll talk more about these push and rollback scripts in a future post.
So for the second type of database, you’re folder structure might look something like this:
DATABASE +-- DDL | +-- 1.0 | +-- 1.1 | +-- 1.2 | +-- 1.3 | +-- SCRIPTS | +-- SOURCE +-- VIEWS +-- ROUTINES +-- TRIGGERS +-- EVENTS
There could actually be a hybrid system where you have the first type of database, but you also want to store the DDL statements for upgrades. You could devise a script so that when a user types in their current version, it grabs the latest snapshot from the source code repository and then creates the right patch based on all the new versions.
I’m sure there are other ways of setting up your database in source control – if you have a different or better way, please share it with us