Workflow Orchestration Guide

    Deploy Apache Airflow

    Deploy production-ready workflow orchestration with Apache Airflow, CeleryExecutor, PostgreSQL, Redis, and Nginx on RamNode's VPS hosting.

    Airflow 2.10
    CeleryExecutor
    PostgreSQL + Redis
    Let's Encrypt SSL
    1

    Prerequisites

    Airflow workloads vary based on the number of DAGs, task concurrency, and worker load. Choose your RamNode plan accordingly.

    Use CasevCPUsRAMStorage
    Development / Testing24 GB40 GB NVMe
    Small Production48 GB80 GB NVMe
    Medium Production616 GB160 GB NVMe
    Heavy Workloads8+32 GB+320 GB+ NVMe

    Software Requirements

    • Ubuntu 24.04 LTS (fresh installation recommended)
    • Python 3.10 or later (ships with Ubuntu 24.04)
    • A registered domain name with DNS A record pointing to your VPS IP
    2

    Initial Server Setup

    Update system and install essentials
    ssh your-user@your-vps-ip
    sudo apt update && sudo apt upgrade -y
    sudo apt install -y build-essential python3-dev python3-pip \
      python3-venv libpq-dev libffi-dev git curl wget unzip

    Configure Firewall

    Enable UFW
    sudo ufw allow OpenSSH
    sudo ufw allow 80/tcp
    sudo ufw allow 443/tcp
    sudo ufw enable
    sudo ufw status

    Create Airflow System User

    Dedicated user for security isolation
    sudo useradd -m -s /bin/bash airflow
    sudo mkdir -p /opt/airflow
    sudo chown airflow:airflow /opt/airflow
    3

    Install and Configure PostgreSQL

    Airflow requires a metadata database to track DAG runs, task instances, and execution state. PostgreSQL is the recommended production backend.

    Install PostgreSQL
    sudo apt install -y postgresql postgresql-contrib
    sudo systemctl enable postgresql
    sudo systemctl start postgresql
    Create database and user
    sudo -u postgres psql <<EOF
    CREATE USER airflow_user WITH PASSWORD 'your_secure_password';
    CREATE DATABASE airflow_db OWNER airflow_user;
    GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
    \q
    EOF

    🔐 Security Note

    Replace 'your_secure_password' with a strong, randomly generated password. Use openssl rand -base64 32 to generate one. Never commit credentials to version control.

    4

    Install and Configure Redis

    Redis serves as the message broker for CeleryExecutor, queuing tasks for distributed execution across workers.

    Install and verify Redis
    sudo apt install -y redis-server
    sudo systemctl enable redis-server
    sudo systemctl start redis-server
    
    # Verify Redis is running
    redis-cli ping
    # Expected output: PONG

    Secure Redis

    Edit /etc/redis/redis.conf
    bind 127.0.0.1 ::1
    requirepass your_redis_password
    maxmemory 256mb
    maxmemory-policy allkeys-lru
    Restart Redis
    sudo systemctl restart redis-server
    5

    Install Apache Airflow

    Set Up Python Virtual Environment

    Create venv and install Airflow
    sudo -u airflow bash
    cd /opt/airflow
    python3 -m venv venv
    source venv/bin/activate
    
    export AIRFLOW_VERSION=2.10.4
    export PYTHON_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
    export CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
    
    pip install --upgrade pip setuptools wheel
    pip install "apache-airflow[celery,postgres,redis]==${AIRFLOW_VERSION}" \
      --constraint "${CONSTRAINT_URL}"
    6

    Configure Airflow

    Set AIRFLOW_HOME and initialize
    export AIRFLOW_HOME=/opt/airflow
    echo 'export AIRFLOW_HOME=/opt/airflow' >> ~/.bashrc
    
    # Initialize the database schema and generate airflow.cfg
    airflow db migrate

    Edit airflow.cfg

    Key configuration settings
    # [core]
    executor = CeleryExecutor
    sql_alchemy_conn = postgresql+psycopg2://airflow_user:your_secure_password@localhost/airflow_db
    load_examples = False
    default_timezone = UTC
    parallelism = 32
    max_active_runs_per_dag = 4
    dag_dir_list_interval = 60
    
    # [celery]
    broker_url = redis://:your_redis_password@localhost:6379/0
    result_backend = db+postgresql://airflow_user:your_secure_password@localhost/airflow_db
    
    # [webserver]
    expose_config = False
    Re-initialize database and create admin user
    # Re-run migration with PostgreSQL backend
    airflow db migrate
    
    # Create admin user
    airflow users create \
      --username admin \
      --firstname Admin \
      --lastname User \
      --role Admin \
      --email admin@yourdomain.com \
      --password your_admin_password
    7

    Configure Systemd Services

    Create systemd unit files to manage each Airflow component as a background service with automatic restart.

    /etc/systemd/system/airflow-webserver.service
    [Unit]
    Description=Apache Airflow Webserver
    After=network.target postgresql.service redis-server.service
    Wants=postgresql.service redis-server.service
    
    [Service]
    User=airflow
    Group=airflow
    Type=simple
    Environment=AIRFLOW_HOME=/opt/airflow
    ExecStart=/opt/airflow/venv/bin/airflow webserver --port 8080
    Restart=on-failure
    RestartSec=10
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    /etc/systemd/system/airflow-scheduler.service
    [Unit]
    Description=Apache Airflow Scheduler
    After=network.target postgresql.service redis-server.service
    Wants=postgresql.service redis-server.service
    
    [Service]
    User=airflow
    Group=airflow
    Type=simple
    Environment=AIRFLOW_HOME=/opt/airflow
    ExecStart=/opt/airflow/venv/bin/airflow scheduler
    Restart=on-failure
    RestartSec=10
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    /etc/systemd/system/airflow-worker.service
    [Unit]
    Description=Apache Airflow Celery Worker
    After=network.target postgresql.service redis-server.service
    Wants=postgresql.service redis-server.service
    
    [Service]
    User=airflow
    Group=airflow
    Type=simple
    Environment=AIRFLOW_HOME=/opt/airflow
    ExecStart=/opt/airflow/venv/bin/airflow celery worker
    Restart=on-failure
    RestartSec=10
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    /etc/systemd/system/airflow-flower.service (Optional)
    [Unit]
    Description=Apache Airflow Celery Flower
    After=network.target redis-server.service
    
    [Service]
    User=airflow
    Group=airflow
    Type=simple
    Environment=AIRFLOW_HOME=/opt/airflow
    ExecStart=/opt/airflow/venv/bin/airflow celery flower --port 5555
    Restart=on-failure
    RestartSec=10
    
    [Install]
    WantedBy=multi-user.target
    Enable and start all services
    sudo systemctl daemon-reload
    sudo systemctl enable airflow-webserver airflow-scheduler airflow-worker
    sudo systemctl start airflow-webserver airflow-scheduler airflow-worker
    
    # Optional: enable Flower monitoring
    sudo systemctl enable airflow-flower
    sudo systemctl start airflow-flower
    
    # Verify all services are running
    sudo systemctl status airflow-webserver airflow-scheduler airflow-worker
    8

    Nginx Reverse Proxy with SSL

    Install Nginx and Certbot
    sudo apt install -y nginx certbot python3-certbot-nginx
    sudo systemctl enable nginx
    /etc/nginx/sites-available/airflow
    server {
        listen 80;
        server_name airflow.yourdomain.com;
    
        location / {
            proxy_pass http://127.0.0.1:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_read_timeout 90;
        }
    }
    Enable site and obtain SSL certificate
    sudo ln -s /etc/nginx/sites-available/airflow /etc/nginx/sites-enabled/
    sudo nginx -t
    sudo systemctl reload nginx
    
    # Obtain SSL certificate
    sudo certbot --nginx -d airflow.yourdomain.com

    Update Airflow for HTTPS

    Add to airflow.cfg [webserver] section
    [webserver]
    base_url = https://airflow.yourdomain.com
    web_server_ssl_cert =
    web_server_ssl_key =
    enable_proxy_fix = True
    Restart webserver
    sudo systemctl restart airflow-webserver
    9

    Deploy Your First DAG

    DAGs (Directed Acyclic Graphs) are Python files that define your workflow logic. Place them in the dags directory.

    Create /opt/airflow/dags/hello_ramnode.py
    from airflow import DAG
    from airflow.operators.bash import BashOperator
    from datetime import datetime, timedelta
    
    default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'email_on_failure': False,
        'retries': 2,
        'retry_delay': timedelta(minutes=5),
    }
    
    with DAG(
        'hello_ramnode',
        default_args=default_args,
        description='A simple introductory DAG',
        schedule_interval='@daily',
        start_date=datetime(2026, 2, 1),
        catchup=False,
        tags=['example', 'ramnode'],
    ) as dag:
    
        check_disk = BashOperator(
            task_id='check_disk_space',
            bash_command='df -h / | tail -1',
        )
    
        check_memory = BashOperator(
            task_id='check_memory',
            bash_command='free -m | grep Mem',
        )
    
        log_status = BashOperator(
            task_id='log_status',
            bash_command='echo "Health check completed at $(date)"',
        )
    
        [check_disk, check_memory] >> log_status

    The DAG will appear in the Airflow web UI within 60 seconds (based on dag_dir_list_interval). Enable it from the UI toggle and trigger a manual run to verify.

    10

    Production Hardening

    Security Configuration

    Generate Fernet key and secret key
    # Generate Fernet key for encryption
    python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
    # Add to airflow.cfg [core]: fernet_key = your_generated_key_here
    
    # Generate webserver session secret
    python3 -c "import secrets; print(secrets.token_hex(32))"
    # Add to airflow.cfg [webserver]: secret_key = your_secret_key_here
    API and exposure settings in airflow.cfg
    [api]
    auth_backends = airflow.api.auth.backend.session
    
    [webserver]
    expose_config = False
    warn_deployment_exposure = True

    Log Rotation

    Create /etc/logrotate.d/airflow
    /opt/airflow/logs/*.log {
        daily
        missingok
        rotate 14
        compress
        delaycompress
        notifempty
        copytruncate
    }

    Resource Tuning

    Setting4 GB VPS16 GB VPS
    parallelism832
    max_active_tasks_per_dag416
    max_active_runs_per_dag28
    worker_concurrency416
    dag_dir_list_interval12030
    min_file_process_interval6030
    11

    Monitoring & Health Checks

    Check Airflow health endpoint
    curl -s https://airflow.yourdomain.com/health | python3 -m json.tool

    Automated Health Check Script

    Create /opt/airflow/scripts/health_check.sh
    #!/bin/bash
    HEALTH=$(curl -sf http://localhost:8080/health)
    SCHEDULER=$(echo $HEALTH | python3 -c "import sys,json; print(json.load(sys.stdin)['scheduler']['status'])")
    
    if [ "$SCHEDULER" != "healthy" ]; then
        echo "[ALERT] Airflow scheduler unhealthy at $(date)" | \
            mail -s "Airflow Health Alert" admin@yourdomain.com
        sudo systemctl restart airflow-scheduler
    fi
    Make executable and schedule
    chmod +x /opt/airflow/scripts/health_check.sh
    
    # Add to crontab (runs every 5 minutes)
    (crontab -l 2>/dev/null; echo "*/5 * * * * /opt/airflow/scripts/health_check.sh") | crontab -
    12

    Backup Strategy

    Create /opt/airflow/scripts/backup.sh
    #!/bin/bash
    BACKUP_DIR=/opt/airflow/backups
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    
    mkdir -p $BACKUP_DIR
    
    # Backup PostgreSQL database
    pg_dump -U airflow_user airflow_db | gzip > \
      $BACKUP_DIR/airflow_db_$TIMESTAMP.sql.gz
    
    # Backup DAGs and configuration
    tar czf $BACKUP_DIR/airflow_config_$TIMESTAMP.tar.gz \
      /opt/airflow/dags \
      /opt/airflow/airflow.cfg \
      /opt/airflow/plugins
    
    # Retain only last 7 days of backups
    find $BACKUP_DIR -name '*.gz' -mtime +7 -delete
    
    echo "Backup completed: $TIMESTAMP"
    Schedule daily backups
    chmod +x /opt/airflow/scripts/backup.sh
    
    # Schedule daily backups at 2 AM
    (crontab -l 2>/dev/null; echo "0 2 * * * /opt/airflow/scripts/backup.sh") | crontab -
    13

    Upgrading Airflow

    Upgrade steps
    # 1. Stop all Airflow services
    sudo systemctl stop airflow-webserver airflow-scheduler airflow-worker airflow-flower
    
    # 2. Create a full backup using the backup script
    /opt/airflow/scripts/backup.sh
    
    # 3. Upgrade the Airflow package
    source /opt/airflow/venv/bin/activate
    export AIRFLOW_VERSION=2.x.x  # Target version
    pip install "apache-airflow[celery,postgres,redis]==${AIRFLOW_VERSION}" \
      --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
    
    # 4. Run the database migration
    airflow db migrate
    
    # 5. Restart all services
    sudo systemctl start airflow-webserver airflow-scheduler airflow-worker airflow-flower
    14

    Troubleshooting

    Webserver won't start

    Check logs with journalctl -u airflow-webserver -n 50. Verify PostgreSQL is running and the connection string in airflow.cfg is correct.

    DAGs not appearing

    Verify DAG files have no syntax errors: python3 /opt/airflow/dags/your_dag.py. Check the dag_dir_list_interval setting.

    Tasks stuck in queued

    Ensure the Celery worker is running: systemctl status airflow-worker. Check Redis connectivity with redis-cli ping.

    High memory usage

    Reduce parallelism and worker_concurrency. Increase dag_dir_list_interval to reduce parsing frequency.

    502 Bad Gateway in Nginx

    Verify the webserver is running on port 8080. Check with curl http://localhost:8080/health.

    Permission denied errors

    Ensure all files in /opt/airflow are owned by the airflow user: sudo chown -R airflow:airflow /opt/airflow.

    Apache Airflow Deployed Successfully!

    Your Apache Airflow instance is now running in production on a RamNode VPS with CeleryExecutor, PostgreSQL metadata backend, Redis message broker, Nginx reverse proxy, and SSL encryption. RamNode's NVMe-backed VPS infrastructure provides the I/O performance and dedicated resources that workflow orchestration demands — starting at just $4/month.