OptimalBI | We do cool sh!t with data

At the end of last year, I attended a virtual conference organised by Snowflake for the Asia-Pacific region. The beauty of online event is that you can watch sessions on demand, which is handy as many interesting sessions were scheduled to be streamed at the same time.

image source WikiImages

There’s no magic…

Looks like it was a busy year for Snowflake as they have a lot of new features presented on the technical sessions. They have added features for performance improvement, data accessibility and data governance, plus a range of external function capability were added.

I was very pleased to hear how many times Data governance was mentioned across multiple sessions. Usually I only hear sales talk, which states that buying a tool, e.g. Snowflake, magically fixes all problems. Successful customers, who were presenting on this conference, tell different story: that buying a modern robust tool enabled their whole data strategy transformation. Data is a valuable asset when it’s used for decision making, and it only happens when data in right format is accessible to the right person quickly. But it’s also important for data to be owned and maintained.

Architectural solutions with Snowflake

The most interesting thing for me, was to see the architectural solutions with Snowflake. Cloud technologies required foundations to be re-imagined, and terms to be re-purposed. Snowflake is not just data storage as on-premises databases were, but also Snowflake doesn’t replace all the toolsets required for Data Warehousing.

One of the major differences between Cloud and on-premise data warehouse is having a separate tool for data acquisition and data transformation. In the old times it was enough to have one ETL tool to do it all; I used to write tedious data extraction scripts in SSIS just to grab data from source and land it into the raw data warehouse layer. Now we all know that copying data is a no-brainer, therefore many automatic tools can do this job quickly, avoid copy/paste mistakes and even recognise and handle source schema changes. The replication tools market is very saturated, tools that were mentioned on the conference were Qlik Replicate, Fivetran and Stitch, but it’s not difficult to name 5-10 more. Also, Snowflake’s own tool Snowpipe allows for loading data from files located in the same cloud environment.

Once data is extracted from source and loaded into Snowflake, it could be massaged for analytical and reporting needs; it could be re-structured, cleansed or transformed in any other way. Here the expensive ETL tool is no longer required as data is not moved anywhere, therefore tooling options are abundant.

Snowflake extends its own functionality with a range of external functions options: AWS Lambda or Azure functions could be executed already, Java functions are coming soon. This would be a great option for those teams which already has programming skill under their belts.

Classic ETL tools, like Informatica and Matillion, are still an option.

But another strong player in this area turns out to be DBT_ and similar tools. They use SQL-like syntax for transformations, and this makes these tools so popular: they democratise transformations, anyone in the team with basic SQL skills can write one, which helps with including analysts’ scripts into the main data load quickly. Another incredibly important feature of DTB_ is helping with building a dependency graph, which results in both showing the data lineage and scheduling the data loads in a correct order.

After all of this effort put into data preparation in data warehouse, data should start making sense. Data Cloud Summit put a spotlight on analytics and data science. All the major analytic platforms have a Snowflake connector, and scalable compute resource makes it widely accessible.

Conclusion

With Cloud being a buzzword for the last decade, I think it is important to keep an eye on all the emerging solutions and technologies even if they are not immediately applicable in your situation. Seeing other companies success stories in adapting both new technologies and better data governance strategies can help with fresh ideas in any business.

Kate Loguteva
Data masseuse

Kate writes technical blogs about data warehouses, and is a Data Vault convert who works mostly with MS SQL Server.

You can connect with Kate on LinkedIn, or read her other blogs here.

‍

Data Cloud Summit 2020

There’s no magic…

Architectural solutions with Snowflake

Conclusion