illustration by mohamed_hassanImagin
Imagine for a minute, a conversation between a database administrator and a data warehouse developer.
“I could never understand what you folks actually do..?”
– “Well, we create a single database called a data warehouse where the data from multiple other databases is copied to. Once data is in one place and linked, it works as a whole thing. Then we create colorful graphs and dashboards to monitor and analyse the whole business.”
“Cool! Can you build this dashboardy thing to monitor and analyse server stats for me?”
– “Haha! Do you think it’s easy as a couple of clicks? Usually it costs a fortune to build one and takes ages.”
Sorry, if you didn’t expect a joke here.
Meanwhile, Microsoft thought that it’s actually not a bad idea and built a SQL Server Management Data Warehouse, a BI-ish thing for admins. It makes sense, for such a repetitive task as server monitoring you need to build a solution just once. I have checked this feature, mostly attracted with the “data warehouse” part of the name. You can find one in SQL Server Management studio Object explorer in the Management folder. It is called Data Collection there.
It takes literally a couple of clicks to start collecting server data into the MDW. It could collect data from multiple servers, so you need just one database to store it all. Once you’ve clicked on the Data Collection task “Configure Management Data Warehouse”, SQL Server will create one with pre-defined tables structure. It also has pre-defined security roles, you’ll have to decide who can access and administer this database.
The second part is data collection itself, or Extract-Transform-Load (ETL) as we say in Business Intelligence. There are four pre-defined types of data that could be collected from server, disk usage, query statistics, server activity and utility information. The latter is disabled by default and, as I understand, requires SQL Server Utility to be up and running, the mysterious multi-server orchestration thing which is beyond the scope of my research. You have some power to manage how often data will be collected and sent to MDW for each type. Data retention period is set here as well. Data collection gets data from system tables and views, some details on this could be found under “Input parameters”. Documentation says you can also create your own collection if you find something else that you want to trace is missing from existing ones.
The last part of this data warehouse is reporting, of course. Microsoft has supplied us with a set of built-in reports. You can find them if you click on your MDW and choose a report called “Management Data Warehouse Overview”. Yes, they all are stuffed into one thing as subreports, so you have to dig into it to get the data you’re interested in.
What about BI?
MDW structure is neither a star schema nor data vault, which are standard data modelling techniques for data warehouses; but it makes sense due to the data specifics. Table structure is more or less understandable, but don’t expect too much, it’s only one line description per table in the documentation. It is reasonable in a way, as one should be familiar with the subject area to be able to navigate through it. That’s not me, so I have struggled with making sense out of it.
I wanted to copy one of the built-in reports, which are only available via SQL Server Management Studio, in my reporting tool. Disk usage report looked easy enough to play with it.
Built-in MDW Disk usage report
It took me a while to understand the units of measure of the database size values in MDW table. Most of the fields in different MDW tables have KB or MB suffixes, but it appears that this one is measured in pages which is 8 Kilobytes. I found it in the original data source specification, sys.database_files view. This view also gives an information about log file size, also in pages. However, more details on log usage is captured from another source, and it is measured in percents there. While it is still reported in Megabytes. I find it inconsistent and misleading. The purpose of data warehouse is to make data usable. If I was given this task, I would either store the data in consistent units of measure across the data warehouse or have a view with all possible units of measure for each value.
The information I couldn’t find and add to my visualisation is index size. There is no such field! Moreover, looking for the information about indexes, I found a query to get the current index size on the database. I have run this; it turns out that indexes in ODE_Stage database hardly use a few kilobytes, most of the tables have no indexes. So, the number in the original report, over 1 MB, is definitely not what I thought it is.
My Disk space usage report
Anyway, I have managed to create a report after a while. On the screenshot you can see that MDW is the biggest database I have on the server. To be fair, my server is not used for the real data load tasks, databases are small compared to the real life ones. However, it is something to be aware of: MDW grows quickly. The start size was 100 Megabytes. I only used it for a couple of weeks to collect the stats about my toy server with the default frequency, and it’s over 2 Gigabytes now. I was told that it should stop growing drastically at some stage. Server activity and query performance stats are only retained for 14 days and disk space data shouldn’t take much space as it’s a few numbers only.
As a data warehouse specialist I would give this tool 6 out of 10 for the Data Warehouse implementation. It looks like no actual BI experts were involved in the design process, it definitely could be done better and more attractive to SQL Server users. That’s how this tool has ended up on the “SQL Server features nobody uses” shelf. Which is a pity as overall it’s a cool tool any SQL Server administrator would like to have.