OptimalBI | We do cool sh!t with data

In the previous blog Why you should look into Attunity Compose I discussed the possible potential in the Attunity Compose product. Since then I have had an opportunity to install the Compose product and run series of tests to see how well it performs in a series of data warehousing scenarios.
In the testing exercise the following components have been involved:

Compose 3.1 version running in a AWS EC2 instance using Windows Server 2012 64-bit;
MS SQL Server 2016 RDBMS to manage a data source and a data warehouse;
AdventureWorks2013 as a source database.

Getting ready: Installation
As expected with AWS standing up a Quad core, 8GB Memory, 100GB SSD Windows Server 2012 EC2 was quick. For a temporary test bed I did not need to worry about ports or SSL certificates.
I also had to install:

Microsoft Visual Studio C++ 2010 X64 Redistributable
Chrome Browser to access Compose (for some reason the default version of IE on the server did not work correctly).
And to ensure that .NET Framework version 4.5.,2 or higher was available on the machine.

The setup process was easy. In my case it was a quick run through wizard screens and accepting the default options.
However if you do intend to use SSL some preparation is required to be done before you are able to start working with the product. For a complete install you may need to set up a few additional components: define host name, change port, determine number of database connections.
Model and data source
As I mentioned above, the AdventureWorks database has been chosen as a test dataset for this exercise. Due to the data model complexity and the large number of objects in it I decided to focus on Sales-Order-Product-Customer entities.
For testing a multiple source integration I have created an additional database with a single table Product_01 identical to Product table in the source AdventureWorks database.
I have also created a target data warehouse database and separate schemas for staging and datamart tables.
Data warehousing made simple
From the web console just one click and voila! A data warehouse project has been created.
Getting access to MS SQL database through Compose was the most time consuming part of the exercise. Unfortunately documentation is very brief and relies on the person who is setting up Compose data sources to clearly understand database administration tasks. By default windows authentication is switched off and I used MS SQL Server authentication for simplicity.
Luckily, I did not have to configure Attunity Replicate tasks, my data source assumed the role of the landing area.

Pic. 1. Managing source database access

The rest of the process using the Compose interface was more intuitive.

Pic. 2. Main panel

The next step is an automatic source discovery and model build. It requires clicking through multiple landing areas tabs and selecting the desired tables. The model appeared on the screen with a nice visual layout and relationship links between entities.
Note, that in case of using Attunity Replicate to copy your data to landing area you will need to find a work around to preserve foreign keys.

Pic. 3. Sales model derived from AdventureWorks database

Compose auto-generates hubs and satellites with attributes based on the model created in the previous step. By default, almost all columns are identified as type 1 attributes which makes them part of the hub business key. I had to manually change types of the majority of attributes to type 2 in order for them to be placed into satellites. After this I tweaked Product entity by adding a source field to the hub, then linked it with SalesOrderDetail entity and proceeded to physical creation of the objects in database.

Pic. 4. Mapping example

My test cases in regards to ETL tasks were a From Date field in the Product satellite being updated with current timestamp and a SOURCE column updated with the appropriate values depending on which source table data come from.

Pic. 5. Updating source column mapping using Expressions pane

The next logical steps are a datamart build and running loading tasks. None of these tasks caused much difficulty and I could follow the general advice given in the user guide.

Pic. 6. Transactional star schema Orders

To prove the product’s ability to sustain the changes, I ran a few test cases: excluded attributes from entities and included the new ones, done model validation, generated database scripts to adjust a physical model to logical, compiled project documentation. Compose handles these exercises well.

Pic. 7. Project documentation includes all necessary descriptions, model representation, source settings, etc

This completes the initial setup. The GUI work is finished. The most interesting part is about to start.
A more in-depth look under the hood
I was the most curious about how Compose would deal with:

handling complex source structures and business rules
business rules versioning
data vault compliance and relationship management
troubleshooting.

As it was mentioned in the previous blog the data vault modelling pattern has not been fully adopted in the product. It does not support links and excludes source traceability fields from hubs and satellites.
If there is a need to track a record source it can only be done by a manual tweaking entity’s attributes.
In addition record datetime traceability fields may not follow the data vault modelling standard and relate to load timestamps. However the user’s guide states these fields should be automatically updated with the current load date. In my case the fields were populated with the default historic date and attempts to correct it failed.
I have found that a source column can only be mapped to one target column.This means the option of including the same attribute in the different satellites related to one hub won’t be easy manageable. An idea to add all duplicate attributes to one entity does not look very attractive.
One of the most realistic scenarios is an intention to keep data from different sources related to one hub separately by creating several satellites with an identical set of attributes.
Other cases somewhat similar to the described above are keeping a retired source data separately from a new data source, separation between “raw” and “business” satellites etc.
Relationships between entities are managed with foreign keys instead of link tables. The core idea of data vault approach is to keep business keys, descriptive information and relationships separately to make the model easily manageable.

Pic. 9. Relationships management

Troubleshooting options include:

cleaning up stage tables;
switching off dropping of temporary tables;
defining task’s breakpoints.

I found this was not always enough to identify the issue. Unfortunately errors I’ve encountered were not obvious to understand and the codes not listed explicitly in the documentation.
Compose can be integrated with GIT and this is the only way to track business rule versions. ETL and database adjustment scripts are stored in a dedicated directory and can be accessed, reviewed or moved as required.

Pic. 8. ETL instructions

Conclusion
There are pluses and minuses in the design and implementation of the tool.
Attunity made an attempt to develop a business model centric tool that makes your life as a business intelligence developer easier. By giving you a relatively flexible set of wizard-driven screens you are able to create data marts without deep knowledge of database technologies.
In my opinion, Attunity Compose might be a choice of data warehouse automation for small to mid-size businesses with a not very complex business models, with clean and simple data sources. However it does not suite a proper Inmon methodology data warehouse project implementations and will most likely over-complicate development and support.
Anastasia

Anastasia blogs about Data Vault and Business Intelligence for both technical and non-technical people.

Connect with Anastasia on LinkedIn or read her other blogs here.

Attunity Compose: DIY session