OptimalBI | We do cool sh!t with data

photo by sasint

Most of my career I have spent in big organisations developing and maintaining enterprise-wide solutions. Therefore, so far I’ve mostly worked with MS SQL Server and only touched OracleDB a bit. But for a new, smaller project, I’ve uses PostgreSQL for the first time in my life. It wasn’t hard to learn as the query language is standardised, but I also needed to use it for development as well. I have started learning the platform and its features, and I was surprised how advanced it is.

The documentation for PostgreSQL is an example of how everyone should write it. It is very informative, has examples and you can even feel some emotions between the lines.

I like how PostgreSQL works with JSON and XML. Our project deals a lot with JSON data, and PostgreSQL gives me the freedom to query and process data in this format.

What is amazing, is what the Common Table Expressions (CTE) allow you to do in PostgreSQL. While in SQL Server CTE is a powerful method for querying data, in PostgreSQL CTE can include insert, update and delete statements, therefore it could be used for chained operations within one transaction. This definitely could be achieved in SQL Server as well, but using CTE for this feels a bit more elegant. But the recursive CTE is totally mind-blowing! It’s not the easiest concept to wrap your head around, but it’s very powerful and allows achieving a lot with minimum code.

Table inheritance is a concept borrowed from object-oriented programming. As I understand, this was a solution which could have been used for table partitioning in earlier versions of PostgreSQL, and it has been overridden by more suitable solution in later versions. Although I couldn’t imagine any scenario for how I would use it, I find this feature very cool. The new table partitioning looks a bit raw compared to the similar feature in SQL Server.

Another amazing feature that vaguely reminds me about OOP are row and record variables. For example, you can use those in functions and procedures to slice a table without referring the exact structure. It gives a little bit of free code sustainability when the table structure changes, or it could be used for generic functions.

In SQL Server data warehousing we use MERGE statement a lot for data loads. It doesn’t exist in PostgreSQL, but you can specify a handler for INSERT statement to deal with conflicts, which could be used instead.

Some features are similar to what you can find in SQL Server, but they work slightly differently, which sometimes caught me by surprise. For example, the very recent feature, computed columns, are only calculated on insert or update, while SQL Server calculates it every time the data is queried. Thus, I couldn’t have the persons age field, but had to use a view instead. Another surprise came with NOW() or CURRENT_DATE built-in functions. I have used it for auditing a procedure steps, but the result showed that all steps happened exactly simultaneously, which couldn’t be true. Turns out, these functions return a transaction start timestamp, which is considered as a PostgreSQL feature; for the actual step timestamp function CLOCK_TIMESTAMP() should be used instead.

In general, PostgreSQL is more flexible and fast with acquiring new cool features for data manipulations; not so fast to scale to enterprise data administration needs, but it’s catching up. I have already mentioned that table partitioning and computed columns have been added with the last major release. The security model works very similar to SQL Server, even the row level security is available. Unfortunately, there’s no official scheduler available, however, pgAgent looks promising.

Many of the mentioned blemishes even out when PosgreSQL is used in a Cloud environment, as we did on our project. AWS Aurora makes replication a matter of a few clicks, instances could easily scale up, and there are a lot of choices for external job scheduling. In this new arena, PostgeSQL takes advantage of being a powerful relational data platform with a modern advanced querying features, while not having a licence cost attached to a bill.

Kate
Data masseuse

Kate writes technical blogs about data warehouses, and is a Data Vault convert who works mostly with MS SQL Server.

You can connect with Kate on LinkedIn, or read her other blogs here.

‍

First Steps with PostgreSQL