OptimalBI | We do cool sh!t with data

When talking about deploying an application to the cloud (especially SaaS or similar hardware independent applications) Docker always seems to come up in conversation. With that and the new(ish) Docker Elastic Container Service offering from Amazon we decided that now was a good time to do a little exploration into the Docker space.

What is Docker/ECS?

“Docker is an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating-system-level virtualization on Linux.”
– https://en.wikipedia.org/wiki/Docker_(software)

Docker helps us draw a line in the sand between being a developer and being an infrastructure manager. With Docker we can define a task and create an environment for that task to run in. This means that the developer can define and install dependencies and specify what resources the task needs and the server administrators or an automated management tool can drop and run it on any machine running the Docker tool-set. It also means that the development environment and production for the task is identical as the environment becomes part of the deployment.
ECS (EC2 Container Service) is AWS’s (Amazon Web Service’s) way of helping us deploy and manage our docker tasks. Simply put, it lets us group up EC2s (Elastic Cloud 2) and automatically assigns Docker tasks to an EC2 that they can fit on. It also enables us to group tasks into services and help’s us with scheduling tasks. As with most AWS services ECS gives us the added benefit of being able to collect metrics on the clusters that we are using to run Docker tasks, and extend the built-in functionality using the AWS CLI (Command Line Interface) or AWS SDK (Software Development Kit).

Pros/Cons of Docker on AWS?

Pros

Separated management of dependencies and server hardware
Development environment is identical (internally) to production environment
Dependency management means that not everyone needs intimate knowledge about every part of your technology stack
Easy custom task and service scheduling with AWS SDK or a third-part tool
Make good use of available resources with ECS assigning tasks to EC2’s with enough free resources
Use auto-scaling when tasks need more resources

Cons

Build produces a large file that needs to be uploaded
Docker NAT can increase network latency (use docker run –net=host, for more docker performance info see here)
Some developers have a bias when the word docker is mentioned
Some applications need to be fixed to work on Docker

An Aside: Docker Myths and Misconceptions

There are a bunch of myths floating around about docker and I just wanted to take a second to address a few of them. If you already know a little about docker skip ahead.
Docker adds another OS’s worth of overhead.
Simple Answer: No!
Long Answer: This is a common one from people that have had docker explained to them, but not done any of their own research. Docker runs as a linux container (in particular runC) which means that most of the parent OS can be reused, and that the docker task is registered as a process so that the OS can manage some of its resources.
Docker is much slower than native.
Simple Answer: Only for NAT.
Long Answer: IBM has done a study on the relative performance of native vs KVM and Docker which showed that the only place that docker is shown to be slower in a real world example is if you are using the Docker NAT to route your incoming ports to a container. If you are using the hosts networking than you will not see an increase in latency. There is a slight decrease in storage performance if you are using a storage device local to the container, but this can be avoided by using storage volumes. To read more on that IBM study see here.
If I Dockerise everything my cloud/application stack with run much easier/faster.
Simple Answer: No.
Long Answer: Docker moves around your server administration, but does not remove it. For small simple applications (static webpages, small databases) Dockerising might be seen as a waste of time, as should be assessed on a case-by-case basis. While is makes the final install step faster or easier it does not decrease the configuration complexity (in some cases configuration is 90% of the work). While running in docker is faster than running in KVM its does not make it faster than just running it on a clean server.

What are the other choices in AWS?

AWS does have a number of ways to deploy and run code. First up, is the most traditional way of running an application on the web, we just boot up ec2’s and run code on them. Easy to do, but does not scale or maintain itself in anyway. Most applications running in the AWS Cloud do a bit better than this with use of AMI’s and auto-scaling, where we setup a machine and EC2 auto scaling group’s will make sure that there are enough of those machines and that they are healthy.
Another choice (and a favorite of mine) is to forgo the need for self-managed server’s entirely and use AWS Lambda, which allows us to upload and run code in response to events. This works particularly well with S3 buckets or API Gateway as these both provide well defined events for us to do some work on. The difference here is that Lambda is not designed to run long-running code or code that needs a large amount of local resources.

What is ECS deployment like?

As mentioned before there are two different types of ‘things’ to be setup by (probably) different people:

EC2 Cluster

To run docker tasks we need a bunch of computing power to run them on. The best way of doing this is to have an auto scale group using the default ECS AMI. This makes sure that if a container falls over it is brought back up, and gives us the capability to scale up when we have high cpu or memory usage. For the easiest way to actually doing that see here.

Docker Task

The journey from code to docker task starts on the developer’s local machine (or something similar). The first step is to dockerise and upload to an available docker repository. I chose the AWS ECR, but there are many good choices at varying price point. For a good starting point for docker read this.
Once we have code uploaded to a repository we need to define a task. Once defined this task will run when prompted or we can set up a service to run a number of them all the time.
One thing to note is that when we dockerise something the resulting image can be quite large as it generally includes some sort of linux distribution and a bunch of applications and dependencies. This does not particularly increase build time as docker is smart about caching and reusing other images, but if the developer is stuck behind a slow connection (cough cough New Zealand) uploading the resulting image can mean a rather long coffee break.

What tools can help with deployment

All three of the systems we are looking at have third party tools to hasten, simplify or harden the deployment process.
The most easily recognized one is Jenkins. With plugins jenkins can deploy post-build to Lambda and EC2’s, and can build and push a docker image to most docker repositories. And once a docker image is pushed, new tasks will adopt this image some time later. This also means that in AWS land Jenkins can sit close to the ECR target for the Docker image, and we would lose the annoying issue of slow upload speeds. There are many, many tools out there so it’s just a matter of finding which one works for you.

So to Docker or not to Docker?

Yes, you should and should not Docker. The answer is not as simple as yes or no. The best thing is to decide on a per component basis. For things like ruby, python or node.js where there are a bunch of dependencies that need to be installed locally it makes sense to Dockerise, but if we are building C++ number crunchers with no external dependencies than Docker might just be adding to the mess for no gain. For us there are cases for both and going forward Docker will be in our thinking if not in our apps. With .net core coming to linux C# and Docker could be fun! More on that if/when it happens.
Any questions about Docker or Docker on AWS? Chuck it in the comments and I will do my best to answer them.
Coffee to Code – Tim Gray

Tim works on a lot of things here at optimal and writes blogs on a wide range of topics.

Connect with Tim on LinkedIn or read his other blogs here.

To Docker or Not To Docker on AWS