Gatus — GitOps Health Checks
Define health checks in YAML, commit to Git, auto-deploy on merge. Version-controlled monitoring for teams that manage infrastructure as code.
RamNode VPS, Docker & Docker Compose, Git repository
30–40 minutes
Teams using GitOps, IaC, or anyone wanting version-controlled monitoring
Uptime Kuma is excellent, but its configuration lives in a database you interact with through a UI. That is fine for a personal setup, but it creates a problem for teams: your monitoring configuration is not version-controlled, not reviewable via pull request, and not recoverable from a fresh deploy without a database backup.
Gatus takes a different approach. All of your health checks are defined in a single YAML file. You commit that file to Git. If something changes in your monitoring config at 2 AM, you have a Git commit to blame.
Gatus vs. Uptime Kuma
| Aspect | Uptime Kuma | Gatus |
|---|---|---|
| Configuration | UI-driven, stored in SQLite | YAML file, stored in Git |
| Onboarding speed | Fast — UI is intuitive | Slower — requires YAML familiarity |
| Auditability | Limited (no change history) | Full (Git history) |
| GitOps compatibility | Poor | Excellent |
| Alert conditions | Basic thresholds | Conditional expressions per response field |
| Status page | Built-in, polished | Built-in, clean |
| Multi-environment | Awkward | Easy via config file variants |
What Gatus Can Check
- • HTTP/HTTPS — Status code, response body content, response time, certificate validity
- • TCP — Port connectivity
- • ICMP — Ping reachability
- • DNS — Record resolution
- • WebSocket — Connectivity and message exchange
The real power is the condition syntax:
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 300"
- "[BODY].healthy == true"
- "[CERTIFICATE_EXPIRATION] > 720h"Deployment
Step 1 — Create the project directory
mkdir -p /opt/gatus && cd /opt/gatusStep 2 — Write an initial config file
web:
port: 8080
storage:
type: sqlite
path: /data/gatus.db
alerting:
discord:
webhook-url: "https://discord.com/api/webhooks/YOUR_WEBHOOK_URL"
default-alert:
description: "Health check failed"
send-on-resolved: true
failure-threshold: 3
success-threshold: 2
endpoints:
- name: ramnode-homepage
url: "https://ramnode.com"
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 1000"
alerts:
- type: discord
- name: your-app-api
url: "https://api.yourdomain.com/health"
interval: 30s
conditions:
- "[STATUS] == 200"
- "[BODY].status == healthy"
- "[RESPONSE_TIME] < 500"
alerts:
- type: discordStep 3 — Write the Compose file
services:
gatus:
image: twinproduction/gatus:latest
container_name: gatus
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- ./config.yaml:/config/config.yaml:ro
- ./gatus-data:/data
environment:
GATUS_CONFIG_PATH: /config/config.yamlStep 4 — Start Gatus
docker compose up -dOpen http://YOUR_SERVER_IP:8080. You should see your defined endpoints listed with their current status.
Step 5 — Reverse proxy
gatus.yourdomain.com {
reverse_proxy localhost:8080
}Config File Deep Dive
Condition syntax reference
| Placeholder | What it represents |
|---|---|
[STATUS] | HTTP response status code |
[RESPONSE_TIME] | Response time in milliseconds |
[BODY] | Full response body (string comparison) |
[BODY].path.to.field | JSON body field by dot notation |
[CONNECTED] | TCP connection success (true/false) |
[DNS_RCODE] | DNS response code |
[CERTIFICATE_EXPIRATION] | Time until cert expires |
Operators: ==, !=, >, >=, <, <=, pat (pattern match)
Interval recommendations
| Service criticality | Suggested interval |
|---|---|
| Critical production API | 30s |
| Standard web service | 60s |
| Background worker | 2m |
| External dependency | 5m |
| Non-critical internal tool | 5m |
Alert Configuration
Define defaults under alerting and override per endpoint:
alerting:
discord:
webhook-url: "https://discord.com/api/webhooks/..."
default-alert:
failure-threshold: 3
success-threshold: 2
send-on-resolved: true
slack:
webhook-url: "https://hooks.slack.com/services/..."
default-alert:
failure-threshold: 5
send-on-resolved: true
endpoints:
- name: critical-payment-api
url: "https://api.yourdomain.com/payments/health"
conditions:
- "[STATUS] == 200"
alerts:
- type: discord
failure-threshold: 1 # Override: alert immediately
success-threshold: 1
- type: slack # Send to both for critical servicesSupported alert providers
Discord, Slack, PagerDuty, Telegram, Email (SMTP), ntfy, and Opsgenie are all supported natively.
GitOps Workflow
Repository structure
infrastructure/
monitoring/
gatus/
config.yaml # All endpoint definitions
docker-compose.yml
apps/
your-app/
...Making changes via pull request
- Create a branch:
git checkout -b add-worker-monitoring - Edit
config.yamlto add the new endpoint - Open a pull request with a description
- Team reviews conditions, interval, and alert thresholds
- Merge and deploy
CI/CD auto-deploy on merge
name: Deploy monitoring config
on:
push:
paths:
- 'infrastructure/monitoring/gatus/config.yaml'
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy to monitoring server
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.MONITORING_SERVER_IP }}
username: deploy
key: ${{ secrets.DEPLOY_KEY }}
script: |
cd /opt/gatus
git pull
docker kill --signal=SIGHUP gatusEnvironment Variable Substitution
Avoid committing secrets to Git:
alerting:
discord:
webhook-url: "${DISCORD_WEBHOOK_URL}"
pagerduty:
integration-key: "${PAGERDUTY_KEY}"Pass the variables via your Compose file:
services:
gatus:
image: twinproduction/gatus:latest
env_file:
- .env # Contains DISCORD_WEBHOOK_URL=https://...The .env file should be in .gitignore. Commit the config with variable placeholders; keep secrets managed separately.
Status Page
Gatus includes a built-in status page showing current health, uptime percentage, response time history, and incident log.
ui:
title: "Your Company Status"
description: "Real-time health status for our services"
logo: "https://yourdomain.com/logo.png"
link: "https://yourdomain.com"
badge:
response-time:
thresholds: [200, 500, 750, 1000]To temporarily disable an endpoint during maintenance, set enabled: false in the config, commit, and deploy. The change is tracked in Git.
Config Template for a Typical RamNode Fleet
web:
port: 8080
storage:
type: sqlite
path: /data/gatus.db
alerting:
discord:
webhook-url: "${DISCORD_WEBHOOK_URL}"
default-alert:
failure-threshold: 3
success-threshold: 2
send-on-resolved: true
endpoints:
# Production web app
- name: app-homepage
group: "Production"
url: "https://yourdomain.com"
interval: 1m
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 1000"
- "[CERTIFICATE_EXPIRATION] > 720h"
alerts:
- type: discord
# API health endpoint
- name: app-api
group: "Production"
url: "https://api.yourdomain.com/health"
interval: 30s
conditions:
- "[STATUS] == 200"
- "[BODY].status == ok"
- "[RESPONSE_TIME] < 300"
alerts:
- type: discord
failure-threshold: 2
# Database connectivity
- name: database-connectivity
group: "Infrastructure"
url: "https://api.yourdomain.com/health/db"
interval: 1m
conditions:
- "[STATUS] == 200"
- "[BODY].database == connected"
alerts:
- type: discord
failure-threshold: 1
# Outbound mail relay
- name: smtp-relay
group: "Infrastructure"
url: "tcp://mail.yourdomain.com:587"
interval: 5m
conditions:
- "[CONNECTED] == true"
alerts:
- type: discord
# Staging environment
- name: staging-app
group: "Staging"
url: "https://staging.yourdomain.com"
interval: 5m
conditions:
- "[STATUS] == 200"
alerts:
- type: discord
failure-threshold: 5What's Next
Gatus, Beszel, and Uptime Kuma all give you visibility into whether your services are up and how they are performing. But they share a limitation: the metrics are relatively high-level. When you need to correlate arbitrary metrics, trace slow requests, or build custom dashboards, you need the full Prometheus + Grafana stack.
Part 4 covers:
- Prometheus + Grafana full stack deployment
- Node Exporter + cAdvisor on production servers
- IP-locked exporters for security
- Community dashboards and custom PromQL queries
