Automation of Tedious Data Vault tasks

By
Anastasia Gorbatenko
August 3, 2017

One of the things I admire the Data Vault modelling approach for is the modular structure being represented by sets of hubs, links and satellites. From a developer’s perspective it provides enough flexibility and power to automate various tasks easily, such as controlling load counts, reporting result of a daily sanity check or data profiling.
Optimal Data Engine (ODE), as a data engine, apart from facilitating configuration and management of Data Vault, can help you achieve this goal gracefully. Simply, ODE’s config database stores metadata about DV objects in a highly reusable manner.
In this blog I will outline possible ways of utilizing some of ODE’s features to automate repetitive work.
Imagine, you reached the point to test a Data Vault and do not have at your disposal any quality assurance tools, just is the RDBMS itself to check each Data Vault object. Most likely, you’ll have more than 10 objects in the Data Vault, then scripting and executing queries will be a painful and time consuming exercise, as you need to deal with a lot of tables and columns.
Well, this is where ODE can help.
Developing a detailed test plan, test cases descriptions and perhaps a model query each is all you will need to do to achieve your goal.
For instance, in the code snippet below, a cursor is used to generate and execute dynamic SQL for all non-retired (actual) hubs in the Data Vault and record the desired test outcomes into the table created beforehand.
[sql]USE ODE_Config
SET NOCOUNT ON;
DECLARE @sql varchar(3000);
DECLARE @d_hub_prefix varchar(128) , @d_hub_date varchar(128);
SELECT @d_hub_prefix    = CAST([dbo].[fn_get_default_value] (‘Prefix’,’Hub’) AS varchar(128));
SELECT @d_hub_date    = [column_name]        FROM [dbo].[dv_default_column]    WHERE object_type = ‘Hub’    AND object_column_type = ‘Load_Date_Time’;
— generate queries
DECLARE cur_emp CURSOR STATIC
FOR
SELECT ‘select ”’+hub_name+”’ as hub_name
, min(‘ + @d_hub_date + ‘), max(‘ + @d_hub_date + ‘), count(*)  from ‘ + hub_database + ‘.’ + hub_schema + ‘.’ + @d_hub_prefix+hub_name
FROM dv_hub h JOIN dv_hub_key_column hk ON hk.hub_key = h.hub_key
WHERE h.is_retired = 0 AND hub_database = ‘ODE_Vault’
OPEN cur_emp;
IF @@CURSOR_ROWS > 0
BEGIN
FETCH NEXT FROM cur_emp INTO @sql;
WHILE @@Fetch_status = 0
BEGIN
— record test results
INSERT INTO Test.dbo.hub_tests
EXEC (@sql);
FETCH NEXT FROM cur_emp INTO @sql;
END;
END;
CLOSE cur_emp;
DEALLOCATE cur_emp;
SET NOCOUNT OFF;
[/sql]Here is another example of how you can profile your data, in this case, identify which of the tracking satellite columns have NULL values.
[sql]USE ODE_Config
DECLARE @d_sat_prefix   varchar(128)
SELECT @d_sat_prefix    = CAST([dbo].[fn_get_default_value] (‘Prefix’,’Sat’) AS varchar(128))
SELECT ‘select ”’+s.satellite_name+”’ as sat_name, ”’+sc.column_name+”’ as column_name,count(*) as row_count
from ‘ + s.satellite_database + ‘.’ + s.satellite_schema + ‘.’ +@d_sat_prefix+ satellite_name+’
where ‘+sc.column_name+’ is NULL and dv_is_tombstone=0
group by ‘+ sc.column_name
FROM dv_satellite s
JOIN dv_satellite_column sc ON sc.satellite_key = s.satellite_key
JOIN dv_column c ON c.satellite_col_key = sc.satellite_col_key
WHERE s.link_hub_satellite_flag = ‘H’
AND s.satellite_database = ‘ODE_Vault’
AND s.is_retired = 0
AND c.is_retired = 0
— moreover, you can add any filter on source table/source system parameters/satellite here
;
[/sql]If you want to refactor the code for a particular source system and want to be sure that you are working with the latest version of code, or no changes were made meantime – just combine RDBMS metadata and ODE!
[sql]USE ODE_Stage;
SELECT name AS object_name,SCHEMA_NAME(schema_id) AS schema_name,type_desc,create_date,modify_date
FROM sys.objects s join
— fetch the list of stored procedures in ode_config
(
SELECT distinct sv.source_procedure_name
FROM ode_config.dbo.dv_hub h join ode_config.dbo.dv_hub_key_column hkc ON hkc.hub_key=h.hub_key
JOIN ode_config.dbo.dv_hub_column hc ON hc.hub_key_column_key=hkc.hub_key_column_key
JOIN ode_config.dbo.dv_column c ON c.column_key=hc.column_key
JOIN ode_config.dbo.dv_source_table st ON st.source_table_key=c.table_key
JOIN ode_config.dbo.dv_source_version sv ON st.source_table_key=sv.source_table_key
JOIN ode_config.dbo.dv_source_system s ON s.source_system_key=st.system_key
WHERE
— filter on some DV parameters
h.hub_database=’ODE_Vault’
AND h.is_retired=0
AND c.is_retired=0
AND st.is_retired=0
AND s.source_system_name=’Raw_Vault’
AND source_procedure_name is not null) t ON t.source_procedure_name=s.name
WHERE modify_date > GETDATE() – 30 — stored procedures changed recently
and type_desc=’SQL_STORED_PROCEDURE’
order by modify_date desc;
[/sql]To understand how else ODE can help automate in other scenarios visit ode.ninja.
If you want to go deeper and learn about the Metrics Vault . Being a Data Vault it collects various statistics about your Data Vault data warehouse performance.
Metadata is powerful!
Anastasia

Anastasia blogs about Data Vault and Business Intelligence for both technical and non-technical people.

Connect with Anastasia on LinkedIn or read her other blogs here.

Copyright © 2019 OptimalBI LTD.