Part 3 of 5

Gatus — GitOps Health Checks

Define health checks in YAML, commit to Git, auto-deploy on merge. Version-controlled monitoring for teams that manage infrastructure as code.

35 minutes

Any RamNode VPS plan

Prerequisites

RamNode VPS, Docker & Docker Compose, Git repository

Time to Complete

30–40 minutes

Best For

Teams using GitOps, IaC, or anyone wanting version-controlled monitoring

Uptime Kuma is excellent, but its configuration lives in a database you interact with through a UI. That is fine for a personal setup, but it creates a problem for teams: your monitoring configuration is not version-controlled, not reviewable via pull request, and not recoverable from a fresh deploy without a database backup.

Gatus takes a different approach. All of your health checks are defined in a single YAML file. You commit that file to Git. If something changes in your monitoring config at 2 AM, you have a Git commit to blame.

Gatus vs. Uptime Kuma

Aspect	Uptime Kuma	Gatus
Configuration	UI-driven, stored in SQLite	YAML file, stored in Git
Onboarding speed	Fast — UI is intuitive	Slower — requires YAML familiarity
Auditability	Limited (no change history)	Full (Git history)
GitOps compatibility	Poor	Excellent
Alert conditions	Basic thresholds	Conditional expressions per response field
Status page	Built-in, polished	Built-in, clean
Multi-environment	Awkward	Easy via config file variants

What Gatus Can Check

• HTTP/HTTPS — Status code, response body content, response time, certificate validity
• TCP — Port connectivity
• ICMP — Ping reachability
• DNS — Record resolution
• WebSocket — Connectivity and message exchange

The real power is the condition syntax:

conditions:
  - "[STATUS] == 200"
  - "[RESPONSE_TIME] < 300"
  - "[BODY].healthy == true"
  - "[CERTIFICATE_EXPIRATION] > 720h"

Deployment

Step 1 — Create the project directory

mkdir -p /opt/gatus && cd /opt/gatus

Step 2 — Write an initial config file

/opt/gatus/config.yaml

web:
  port: 8080

storage:
  type: sqlite
  path: /data/gatus.db

alerting:
  discord:
    webhook-url: "https://discord.com/api/webhooks/YOUR_WEBHOOK_URL"
    default-alert:
      description: "Health check failed"
      send-on-resolved: true
      failure-threshold: 3
      success-threshold: 2

endpoints:
  - name: ramnode-homepage
    url: "https://ramnode.com"
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 1000"
    alerts:
      - type: discord

  - name: your-app-api
    url: "https://api.yourdomain.com/health"
    interval: 30s
    conditions:
      - "[STATUS] == 200"
      - "[BODY].status == healthy"
      - "[RESPONSE_TIME] < 500"
    alerts:
      - type: discord

Step 3 — Write the Compose file

/opt/gatus/docker-compose.yml

services:
  gatus:
    image: twinproduction/gatus:latest
    container_name: gatus
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - ./config.yaml:/config/config.yaml:ro
      - ./gatus-data:/data
    environment:
      GATUS_CONFIG_PATH: /config/config.yaml

Step 4 — Start Gatus

docker compose up -d

Open http://YOUR_SERVER_IP:8080. You should see your defined endpoints listed with their current status.

Step 5 — Reverse proxy

Caddyfile

gatus.yourdomain.com {
    reverse_proxy localhost:8080
}

Config File Deep Dive

Condition syntax reference

Placeholder	What it represents
`[STATUS]`	HTTP response status code
`[RESPONSE_TIME]`	Response time in milliseconds
`[BODY]`	Full response body (string comparison)
`[BODY].path.to.field`	JSON body field by dot notation
`[CONNECTED]`	TCP connection success (true/false)
`[DNS_RCODE]`	DNS response code
`[CERTIFICATE_EXPIRATION]`	Time until cert expires

Operators: ==, !=, >, >=, <, <=, pat (pattern match)

Interval recommendations

Service criticality	Suggested interval
Critical production API	30s
Standard web service	60s
Background worker	2m
External dependency	5m
Non-critical internal tool	5m

Alert Configuration

Define defaults under alerting and override per endpoint:

alerting:
  discord:
    webhook-url: "https://discord.com/api/webhooks/..."
    default-alert:
      failure-threshold: 3
      success-threshold: 2
      send-on-resolved: true

  slack:
    webhook-url: "https://hooks.slack.com/services/..."
    default-alert:
      failure-threshold: 5
      send-on-resolved: true

endpoints:
  - name: critical-payment-api
    url: "https://api.yourdomain.com/payments/health"
    conditions:
      - "[STATUS] == 200"
    alerts:
      - type: discord
        failure-threshold: 1   # Override: alert immediately
        success-threshold: 1
      - type: slack             # Send to both for critical services

Supported alert providers

Discord, Slack, PagerDuty, Telegram, Email (SMTP), ntfy, and Opsgenie are all supported natively.

GitOps Workflow

Repository structure

infrastructure/
  monitoring/
    gatus/
      config.yaml          # All endpoint definitions
      docker-compose.yml
  apps/
    your-app/
      ...

Making changes via pull request

Create a branch: git checkout -b add-worker-monitoring
Edit config.yaml to add the new endpoint
Open a pull request with a description
Team reviews conditions, interval, and alert thresholds
Merge and deploy

CI/CD auto-deploy on merge

.github/workflows/deploy-monitoring.yml

name: Deploy monitoring config

on:
  push:
    paths:
      - 'infrastructure/monitoring/gatus/config.yaml'
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to monitoring server
        uses: appleboy/ssh-action@master
        with:
          host: ${{ secrets.MONITORING_SERVER_IP }}
          username: deploy
          key: ${{ secrets.DEPLOY_KEY }}
          script: |
            cd /opt/gatus
            git pull
            docker kill --signal=SIGHUP gatus

Environment Variable Substitution

Avoid committing secrets to Git:

alerting:
  discord:
    webhook-url: "${DISCORD_WEBHOOK_URL}"

  pagerduty:
    integration-key: "${PAGERDUTY_KEY}"

Pass the variables via your Compose file:

services:
  gatus:
    image: twinproduction/gatus:latest
    env_file:
      - .env  # Contains DISCORD_WEBHOOK_URL=https://...

The .env file should be in .gitignore. Commit the config with variable placeholders; keep secrets managed separately.

Status Page

Gatus includes a built-in status page showing current health, uptime percentage, response time history, and incident log.

ui:
  title: "Your Company Status"
  description: "Real-time health status for our services"
  logo: "https://yourdomain.com/logo.png"
  link: "https://yourdomain.com"
  badge:
    response-time:
      thresholds: [200, 500, 750, 1000]

To temporarily disable an endpoint during maintenance, set enabled: false in the config, commit, and deploy. The change is tracked in Git.

Config Template for a Typical RamNode Fleet

config.yaml

web:
  port: 8080

storage:
  type: sqlite
  path: /data/gatus.db

alerting:
  discord:
    webhook-url: "${DISCORD_WEBHOOK_URL}"
    default-alert:
      failure-threshold: 3
      success-threshold: 2
      send-on-resolved: true

endpoints:
  # Production web app
  - name: app-homepage
    group: "Production"
    url: "https://yourdomain.com"
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 1000"
      - "[CERTIFICATE_EXPIRATION] > 720h"
    alerts:
      - type: discord

  # API health endpoint
  - name: app-api
    group: "Production"
    url: "https://api.yourdomain.com/health"
    interval: 30s
    conditions:
      - "[STATUS] == 200"
      - "[BODY].status == ok"
      - "[RESPONSE_TIME] < 300"
    alerts:
      - type: discord
        failure-threshold: 2

  # Database connectivity
  - name: database-connectivity
    group: "Infrastructure"
    url: "https://api.yourdomain.com/health/db"
    interval: 1m
    conditions:
      - "[STATUS] == 200"
      - "[BODY].database == connected"
    alerts:
      - type: discord
        failure-threshold: 1

  # Outbound mail relay
  - name: smtp-relay
    group: "Infrastructure"
    url: "tcp://mail.yourdomain.com:587"
    interval: 5m
    conditions:
      - "[CONNECTED] == true"
    alerts:
      - type: discord

  # Staging environment
  - name: staging-app
    group: "Staging"
    url: "https://staging.yourdomain.com"
    interval: 5m
    conditions:
      - "[STATUS] == 200"
    alerts:
      - type: discord
        failure-threshold: 5

What's Next

Gatus, Beszel, and Uptime Kuma all give you visibility into whether your services are up and how they are performing. But they share a limitation: the metrics are relatively high-level. When you need to correlate arbitrary metrics, trace slow requests, or build custom dashboards, you need the full Prometheus + Grafana stack.

Part 4 covers:

Prometheus + Grafana full stack deployment
Node Exporter + cAdvisor on production servers
IP-locked exporters for security
Community dashboards and custom PromQL queries

Part 2: Uptime Kuma Part 4: Grafana + Prometheus