Installing Apache Airflow on Ubuntu >= 18.04 using Docker

Installing Apache Airflow on Ubuntu >= 18.04 using Docker

Apache Airflow is more “pythonic” than Jenkins and it feels a little easier and more effective than Jenkins to me for data-science tasks. Especially because I run a lot of commands on ZSH or Bash, or Python tasks to pre-evaluate data.

Jenkins is based on Java / JEE, which is great for Enterprise middlewares and large-scale Web-apps. It’s not great for system orchestration, tool orchestration, or simple tasks in general. And it’s the little things I want to automate. In Data Science / Machine Learning and Log Analysis as well as in InfoSec (reporting, stats, dashboard, situational awareness tools …). Therefore: Airflow.

The easiest way to run Airflow is using Docker. But it may not be that easy…

Upgrading docker-compose, and just that

Similar to Debian packages sometimes Ubuntu’s Apt packages are “a little” outdated. docker-compose isn’t super-essential for the system, so we can just add it manually without messing up the Docker installation on Linux.

We have to remove the package if it’s installed, but that’s easy.

VERSION=$(curl --silent https://api.github.com/repos/docker/compose/releases/latest | grep -Po '"tag_name": "\K.*\d')
DESTINATION=/usr/local/bin/docker-compose
sudo curl -L https://github.com/docker/compose/releases/download/${VERSION}/docker-compose-$(uname -s)-$(uname -m) -o $DESTINATION
sudo chmod 755 $DESTINATION

Check…

Before:

[email protected]:~/Source/airflow$ docker-compose -v
docker-compose version 1.22.0, build f46880fe

After:

[email protected]:~/Source/airflow$ docker-compose -v
docker-compose version 1.29.2, build 5becea4c

Now install Airflow via docker-compose

Voila:

[email protected]:~/Source/airflow$ docker-compose up airflow-init
Creating network "airflow_default" with the default driver
...
airflow-init_1       | Upgrades done
airflow-init_1       | [2021-08-18 13:42:10,305] {manager.py:784} WARNING - No user yet created, use flask fab command to do it.
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.1.2
airflow_airflow-init_1 exited with code 0

Problem solved. :slight_smile:

Getting started

After a couple of minutes, the service stack is up and you can open the web UI on port 8080:

From there onwards you can open a couple of examples and intuitively adapt them. You don’t need to consider typical Jenkins CI / CD integration work steps. No Groovy, no JVM. It’s more direct.

Just some Bash, Python, an inline code editor, and always remember: test within test environments. Not in live envs. CI / CD is wonderful to make things right, at scale. But in the wrong hands you can mess up a lot of stuff very quickly, and Airflow is quicker than most other automation tools I have used to far :boom:

Have fun!

This topic was automatically closed after 30 minutes. New replies are no longer allowed.