VPS Monitoring & Observability Series
    Part 3 of 5

    Gatus — GitOps Health Checks

    Define health checks in YAML, commit to Git, auto-deploy on merge. Version-controlled monitoring for teams that manage infrastructure as code.

    35 minutes
    Any RamNode VPS plan
    Prerequisites

    RamNode VPS, Docker & Docker Compose, Git repository

    Time to Complete

    30–40 minutes

    Best For

    Teams using GitOps, IaC, or anyone wanting version-controlled monitoring

    Uptime Kuma is excellent, but its configuration lives in a database you interact with through a UI. That is fine for a personal setup, but it creates a problem for teams: your monitoring configuration is not version-controlled, not reviewable via pull request, and not recoverable from a fresh deploy without a database backup.

    Gatus takes a different approach. All of your health checks are defined in a single YAML file. You commit that file to Git. If something changes in your monitoring config at 2 AM, you have a Git commit to blame.

    Gatus vs. Uptime Kuma

    AspectUptime KumaGatus
    ConfigurationUI-driven, stored in SQLiteYAML file, stored in Git
    Onboarding speedFast — UI is intuitiveSlower — requires YAML familiarity
    AuditabilityLimited (no change history)Full (Git history)
    GitOps compatibilityPoorExcellent
    Alert conditionsBasic thresholdsConditional expressions per response field
    Status pageBuilt-in, polishedBuilt-in, clean
    Multi-environmentAwkwardEasy via config file variants

    What Gatus Can Check

    • HTTP/HTTPS — Status code, response body content, response time, certificate validity
    • TCP — Port connectivity
    • ICMP — Ping reachability
    • DNS — Record resolution
    • WebSocket — Connectivity and message exchange

    The real power is the condition syntax:

    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 300"
      - "[BODY].healthy == true"
      - "[CERTIFICATE_EXPIRATION] > 720h"

    Deployment

    Step 1 — Create the project directory

    mkdir -p /opt/gatus && cd /opt/gatus

    Step 2 — Write an initial config file

    /opt/gatus/config.yaml
    web:
      port: 8080
    
    storage:
      type: sqlite
      path: /data/gatus.db
    
    alerting:
      discord:
        webhook-url: "https://discord.com/api/webhooks/YOUR_WEBHOOK_URL"
        default-alert:
          description: "Health check failed"
          send-on-resolved: true
          failure-threshold: 3
          success-threshold: 2
    
    endpoints:
      - name: ramnode-homepage
        url: "https://ramnode.com"
        interval: 1m
        conditions:
          - "[STATUS] == 200"
          - "[RESPONSE_TIME] < 1000"
        alerts:
          - type: discord
    
      - name: your-app-api
        url: "https://api.yourdomain.com/health"
        interval: 30s
        conditions:
          - "[STATUS] == 200"
          - "[BODY].status == healthy"
          - "[RESPONSE_TIME] < 500"
        alerts:
          - type: discord

    Step 3 — Write the Compose file

    /opt/gatus/docker-compose.yml
    services:
      gatus:
        image: twinproduction/gatus:latest
        container_name: gatus
        restart: unless-stopped
        ports:
          - "8080:8080"
        volumes:
          - ./config.yaml:/config/config.yaml:ro
          - ./gatus-data:/data
        environment:
          GATUS_CONFIG_PATH: /config/config.yaml

    Step 4 — Start Gatus

    docker compose up -d

    Open http://YOUR_SERVER_IP:8080. You should see your defined endpoints listed with their current status.

    Step 5 — Reverse proxy

    Caddyfile
    gatus.yourdomain.com {
        reverse_proxy localhost:8080
    }

    Config File Deep Dive

    Condition syntax reference

    PlaceholderWhat it represents
    [STATUS]HTTP response status code
    [RESPONSE_TIME]Response time in milliseconds
    [BODY]Full response body (string comparison)
    [BODY].path.to.fieldJSON body field by dot notation
    [CONNECTED]TCP connection success (true/false)
    [DNS_RCODE]DNS response code
    [CERTIFICATE_EXPIRATION]Time until cert expires

    Operators: ==, !=, >, >=, <, <=, pat (pattern match)

    Interval recommendations

    Service criticalitySuggested interval
    Critical production API30s
    Standard web service60s
    Background worker2m
    External dependency5m
    Non-critical internal tool5m

    Alert Configuration

    Define defaults under alerting and override per endpoint:

    alerting:
      discord:
        webhook-url: "https://discord.com/api/webhooks/..."
        default-alert:
          failure-threshold: 3
          success-threshold: 2
          send-on-resolved: true
    
      slack:
        webhook-url: "https://hooks.slack.com/services/..."
        default-alert:
          failure-threshold: 5
          send-on-resolved: true
    
    endpoints:
      - name: critical-payment-api
        url: "https://api.yourdomain.com/payments/health"
        conditions:
          - "[STATUS] == 200"
        alerts:
          - type: discord
            failure-threshold: 1   # Override: alert immediately
            success-threshold: 1
          - type: slack             # Send to both for critical services

    Supported alert providers

    Discord, Slack, PagerDuty, Telegram, Email (SMTP), ntfy, and Opsgenie are all supported natively.

    GitOps Workflow

    Repository structure

    infrastructure/
      monitoring/
        gatus/
          config.yaml          # All endpoint definitions
          docker-compose.yml
      apps/
        your-app/
          ...

    Making changes via pull request

    1. Create a branch: git checkout -b add-worker-monitoring
    2. Edit config.yaml to add the new endpoint
    3. Open a pull request with a description
    4. Team reviews conditions, interval, and alert thresholds
    5. Merge and deploy

    CI/CD auto-deploy on merge

    .github/workflows/deploy-monitoring.yml
    name: Deploy monitoring config
    
    on:
      push:
        paths:
          - 'infrastructure/monitoring/gatus/config.yaml'
        branches:
          - main
    
    jobs:
      deploy:
        runs-on: ubuntu-latest
        steps:
          - name: Deploy to monitoring server
            uses: appleboy/ssh-action@master
            with:
              host: ${{ secrets.MONITORING_SERVER_IP }}
              username: deploy
              key: ${{ secrets.DEPLOY_KEY }}
              script: |
                cd /opt/gatus
                git pull
                docker kill --signal=SIGHUP gatus

    Environment Variable Substitution

    Avoid committing secrets to Git:

    alerting:
      discord:
        webhook-url: "${DISCORD_WEBHOOK_URL}"
    
      pagerduty:
        integration-key: "${PAGERDUTY_KEY}"

    Pass the variables via your Compose file:

    services:
      gatus:
        image: twinproduction/gatus:latest
        env_file:
          - .env  # Contains DISCORD_WEBHOOK_URL=https://...

    The .env file should be in .gitignore. Commit the config with variable placeholders; keep secrets managed separately.

    Status Page

    Gatus includes a built-in status page showing current health, uptime percentage, response time history, and incident log.

    ui:
      title: "Your Company Status"
      description: "Real-time health status for our services"
      logo: "https://yourdomain.com/logo.png"
      link: "https://yourdomain.com"
      badge:
        response-time:
          thresholds: [200, 500, 750, 1000]

    To temporarily disable an endpoint during maintenance, set enabled: false in the config, commit, and deploy. The change is tracked in Git.

    Config Template for a Typical RamNode Fleet

    config.yaml
    web:
      port: 8080
    
    storage:
      type: sqlite
      path: /data/gatus.db
    
    alerting:
      discord:
        webhook-url: "${DISCORD_WEBHOOK_URL}"
        default-alert:
          failure-threshold: 3
          success-threshold: 2
          send-on-resolved: true
    
    endpoints:
      # Production web app
      - name: app-homepage
        group: "Production"
        url: "https://yourdomain.com"
        interval: 1m
        conditions:
          - "[STATUS] == 200"
          - "[RESPONSE_TIME] < 1000"
          - "[CERTIFICATE_EXPIRATION] > 720h"
        alerts:
          - type: discord
    
      # API health endpoint
      - name: app-api
        group: "Production"
        url: "https://api.yourdomain.com/health"
        interval: 30s
        conditions:
          - "[STATUS] == 200"
          - "[BODY].status == ok"
          - "[RESPONSE_TIME] < 300"
        alerts:
          - type: discord
            failure-threshold: 2
    
      # Database connectivity
      - name: database-connectivity
        group: "Infrastructure"
        url: "https://api.yourdomain.com/health/db"
        interval: 1m
        conditions:
          - "[STATUS] == 200"
          - "[BODY].database == connected"
        alerts:
          - type: discord
            failure-threshold: 1
    
      # Outbound mail relay
      - name: smtp-relay
        group: "Infrastructure"
        url: "tcp://mail.yourdomain.com:587"
        interval: 5m
        conditions:
          - "[CONNECTED] == true"
        alerts:
          - type: discord
    
      # Staging environment
      - name: staging-app
        group: "Staging"
        url: "https://staging.yourdomain.com"
        interval: 5m
        conditions:
          - "[STATUS] == 200"
        alerts:
          - type: discord
            failure-threshold: 5

    What's Next

    Gatus, Beszel, and Uptime Kuma all give you visibility into whether your services are up and how they are performing. But they share a limitation: the metrics are relatively high-level. When you need to correlate arbitrary metrics, trace slow requests, or build custom dashboards, you need the full Prometheus + Grafana stack.

    Part 4 covers:

    • Prometheus + Grafana full stack deployment
    • Node Exporter + cAdvisor on production servers
    • IP-locked exporters for security
    • Community dashboards and custom PromQL queries