In deploying metoo.io with Kubernetes, I wanted to reduce the time from launching a new deployment until the deployment was live as short as possible. In the beginning, metoo was containerized using the typical uncomplicated method of using an Ubuntu Docker image like most other GeoDjango builds. While the image worked fine, the image size was ~600MB. But why is this size an issue?
Smaller containers simply download and deploy faster than larger ones. While we already use Google Compute Engine for our remote servers, we found it easy to decide that we would store our Docker images in GCR instead of Dockerhub. This was because GCR allows for free storage and transfer to nodes within Google’s data centers. Additionally, polling GCR for container downloads on GKE servers hosted inside those same data centers inherently makes container download much faster.
While Ubuntu is not necessarily the most bloated operating system you could run inside Docker container, there is definitely room for improvement. Under further investigation, Alpine Linux was the most minimal while being (almost) fully-compatible-with-GeoDjango operating systems you could choose from. Compared with Ubuntu’s size of 600MB, Alpine sported a size of just ~200MB. We saw improvements of deployment times of just over 1 minute with Ubuntu images down to just about 10 seconds using Alpine.
There are a few considerations you want to make in deciding whether or not to switch to alpine or not.
- Be prepared to have a test suite to run after deploying.
- Understand that since you are likely not developing on the same Alpine Linux Environment, that you’re deploying on, there may be dependency issues that do not arise until you run the container and check literally everything that can be checked with your deployment. This is both a good thing and a bad thing because it will keep you honest about testing all of your code but will also really slap you in the face if you accidentally forget to test something which breaks. For example, we had a very outlandish error arise when we were using a specific GeoDjango admin module which would crash the server on our Alpine build when viewing an admin page that rendered a point on a map which didn’t occur on my Arch development environment.
If you are hosting an open source application that you want to be easily accessible to others, recognize that Alpine is much harder for the laymen programmer to work with.
- Dependency Issues arise frequently and out of nowhere
- Anticipate for the worst case that you are going to run into a build issue at least once every 6 months at the cost of the smaller image size.
Alpine with GeoDjango
No released version of Alpine came with either the GEOS or GDAL libraries available
for installation, and only the experimental
edge version of alpine to this day has
the packages downloadable. To solve for this, I build the
python:alpine docker image
from scratch within my Dockerfile then added the required build dependencies manually in a separate
Feel free to use the code for your GeoDjango Deployments and spread the love! :D