OptimalBI | We do cool sh!t with data

Capturing and storing data of one thing changing over time is called time series data. When we capture time series data we are more interested in trends and change than the current state of a ‘thing’. Storing and querying this type of data can present interesting issues. Enter InfluxDB.

“InfluxDB is a time series database built from the ground up to handle high write and query loads… InfluxDB is meant to be used as a backing store for any use case involving large amounts of timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics.” – InfluxDB Version 1.3 Documentation

InfluxDB is purpose built to handle storing and querying time series data, and having a purpose built tool can make all the difference in writing clear and precise apps that deal with time series based data. InfluxDB makes operations like grouping results in blocks of 15 minutes, or getting all the rows with a timestamp in the last 15 minutes and aggregating them at the 1 minute level very easy.

SQL-Like Syntax

InfluxDB uses a SQL based query language to make querying data simple and familiar. For example if we wanted to select a data point(s) from a particular app that occurred in the last 15 minutes the query would look like:

SELECT max("trackedValue") as "max_trackedValue" FROM "database"
WHERE time > now() - 15m AND "app"="ourApp" GROUP BY "APP"

Nice, simple, easy to follow SQL!

HTTP(S) Query API

The general issue with SQL syntax based databases that are not one of your standard SQL databases is there is a lack of drivers to connect from various languages. This is not an issue with InfluxDB as they have chosen to expose the database query interface via a REST-like HTTP api! This means that any language that supports HTTP queries can run queries easily against InfluxDB.

curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=mydb"
--data-urlencode "q=SELECT "value" FROM "cpu_load_short" WHERE "region"='us-west'"

Data Retention and Downsampling

Timeseries data can get large quickly. When you are recording something every second, a year’s worth of data can be quite hard to store effectively. InfluxDB can delete and manage old data for you so that data that is no longer relevant is not using up large amounts of disk space! The simplest steps it can do is just to delete data that is older than a certain time (30 days for example), which can easily be done with a command like the following:

CREATE RETENTION POLICY "two_hours" ON "food_data"
DURATION 30d REPLICATION 1

With InfluxDB’s continuous query features we can also aggregate down the data to a better grain to keep the data for future querying but not store so much as to cost us all our sweet earned cash. An example of using continuous queries to do this is below.

CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."downsampled_orders"
FROM "orders"
GROUP BY time(30m)

Support for Other Technologies

While I have yet to have a chance to experiment with it, one of the things that made choosing InfluxDB as one of the technologies we use is that it can communicate in many different languages. While you have the HTTP api for standard use InfluxDB also supports Graphite, collectd, and OpenTSDB which means that other tools that need to connect to and query data in InfluxDB can do easily and in their own prefered communication method.
Influxdb is Open Source with a reasonably active community, you can find their GitHub here.
Hopefully that give you a good idea on what InfluxDB can do! Until next time,
Tim Gray
Coffee to Code

Tim works on a lot of things here at optimal and writes blogs on a wide range of topics.

Connect with Tim on LinkedIn or read his other blogs here.

Storing Time Series with InfluxDB

SQL-Like Syntax

HTTP(S) Query API

Data Retention and Downsampling

Support for Other Technologies