photo by Atlantios
Data Warehouse solutions often include several copies of the same structure, I mean Development, Production and often Test and Pre-Production environments. When the Data Warehousing team works on the new piece of requirements, they don’t want to put fully functioning production server at risk; so, development happens on a dedicated server with the copy of production environment, where developers are free to experiment, create new structures, load data or modify existing data to test something. That’s how a good and well-tested code is born. After this devastating development took place, it is often required to align the development database with the one on production server to make sure that there will be no surprises when the next portion of development starts.
Quite often DW teams use a backup restoration method to align the environments; they take the most recent production database backup and restore it on the development or test environment. However, this is not the only and not even the most effective solution for the job in Microsoft SQL Server. It is often forgotten that the Data Warehouse is a special database; the data is loaded into it in scheduled batches, unlike the OLTP systems where updates happen all the time. Let’s explore what are the other options you have in copying DW databases between SQL Server environments.
Copying Structure Only
A Data Warehouse is populated by ETL process, most likely it would be a scheduled job to pull the data from source databases, transform it and write into the Data Warehouse. The latter is designed to store all the history of the record states. But if it is fine to lose the history on the development server and start from the current state of the data, copying the structure only could be a lightweight solution.
Generate Scripts – SQL Server Management Studio can generate creation script for any object. Right click on the database in the Object Explorer. Choose Tasks, then Generate Scripts. There are plenty of options here. You can choose the whole database to be scripted or just particular tables, also what to do with triggers, indexes, foreign keys and other database bits and pieces. Script could be saved on disk or opened in new query window. Creation statements for each of the selected objects are put into one lengthy script. With this option you could only drop the previous version of the database and create it from scratch with the generated script. To be fair, data could be scripted as a part of this as well. However, it means it would be stored in a plain text in insert statements.
Data-tier application – it has very similar functionality as the one above. SQL Server prepares a nice package of database objects definition; it has a DACPAC extension and it’s actually a bunch of compressed XML files. Right click on the database, choose Tasks and then Extract Data-tier Application. Don’t confuse with the Export Data-tier one, we’ll talk about it below. To apply the structure on another environment, right click on Databases folder in SSMS Object Explorer, choose Deploy data-tier Application (don’t confuse with Import Data-tier Application option), navigate to the file on disk and choose the name for a database.
Visual Studio Schema compare – you can find this in Tools menu of the Visual Studio, choose SQL Server and then New Schema Comparison. Choose the source and the target databases and click Compare button. This option is smarter than the ones above because it won’t rebuild those objects which are the same. Also it has a nice preview of the changes before they are applied. There are plenty of options here to choose a comparison parameters and a target database behaviour on update. At the end Visual Studio also generates a script. You can let it to be executed automatically or show it in query window.
After the structural changes are applied, you need to run ETL to populate all the tables in the first two cases and update the data in the last one. This could be a lengthy process in case if your current dataset is quite big. You can transfer all the data from Prod database separately or altogether with the structure.
Visual Studio Data compare – this tool compares data between two databases with the same tables structure. Well, it will ignore differences and update only those fields which are the same, so it makes sense first to use the schema compare tool described above. In Visual Studio, from the Tools menu choose SQL Server and New Data Comparison. Choose Source and Target databases. You can choose which tables to compare and it shows the comparison key for the records.
Data-tier application – I know I have mentioned it above as the way to upgrade structure only. I don’t know why Microsoft has named two different things the same, but here we are. Right click on the database, choose Tasks and then Export Data-tier application (this time not Extract!). After you have located a folder on disk and checked tables to be exported (or selected all by default), SQL Server creates a BACPAC file. This file includes both structure and a snapshot of the data. Data is stored in BCP insert commands. BACPAC file is smaller than a backup file and the restoration from the data-tier application process is simpler (for both you and SQL Server) than a backup restore. To restore the database on another server, right-click on Databases in SSMS Object Explorer and select Import Data-Tier Application. This will create a new database, i.e. there is no option for the old database replacement. The only issue with this method is reference validation, if your database has views or procedures that refer another databases on the server, the BACPAC file would not be created, external refrences are not supported by this method.
I will just mention a backup restore here. To copy the database you need a Full database backup file. One backup file could contain multiple images or subsequent differential backups, which is more than you have asked for in our case. The mechanism of database backups was developed to keep all the data, it is especially powerful when it comes to capturing running transactions, which is not required for our case.
Database detach and attach – works magically if you can afford your production database going offline for some time. You need to locate the database files on disk, .mdf and .ldf. Right click on the database in SSMS Object Explorer, choose Detach. After that files on disk are available for copying. To attach the database from file, right click on Databases and Attach, then navigate to the files. Don’t forget to attach production database back too.
There is also a Wizard to make it easier. Right click on the database you need to copy in Object Explorer, choose Tasks and Copy Database. Here you need to choose a source and destination server, SQL Server will detach, copy and attach databases, also this could be delayed to run later (it generates an SSIS package on the last step). There is another option Wizard provides for copying a database without detaching it, a SQL Management Object option. I don’t have any details on what exactly happens here and I couldn’t make it run. Most likely, this is related to the security permissions given to SQL Agent account on my server, but it could also be something else; I can see many different suggestions on fixing the issue in the internet. However, I tend to avoid tools that don’t provide good troubleshooting options: I couldn’t even find informative error message to give me a clue what might be wrong with it.
The last option I would mention is buying some tool to do the job. If you copy databases often, or your databases are too big for a manual operation, a professional tool could help you to save a lot of effort. Some of them use methods described above under the hood, but other have implemented something smart. I won’t name any here, their advertising should be good enough to find them when you need one.
If you take a snapshot of the OLTP database, most likely you’ll lose some data as some transactions could be active and not recorded in the database file yet. However, with the Data Warehouse database after the processing is finished, data stays in unchanged state until the next ETL load. This means that a data snapshot, which is the way most of the described methods use, is good enough for copying the data warehouse database.