Part 5 of 6

    Monitoring & Observability

    Prometheus for metrics, Grafana for dashboards, Loki for logs, and Uptime Kuma for uptime. Know about problems before your users do.

    30 min read
    Full observability stack

    Your apps are deployed and your CI/CD pipeline works. But how do you know when something breaks? This guide covers the observability stack: metrics with Prometheus, visualization with Grafana, logs with Loki, uptime monitoring, and alerting.

    The Observability Stack

    We're building a complete monitoring solution:

    ComponentPurposePort
    PrometheusMetrics collection and storage9090
    GrafanaDashboards and visualization3001
    LokiLog aggregation3100
    PromtailLog shipping to Loki
    Uptime KumaUptime monitoring and alerting3002
    cAdvisorContainer metrics8080
    Node ExporterHost system metrics9100

    All of these run as containers alongside your applications in Dokploy.

    Deploy Prometheus

    Prometheus scrapes metrics from your applications and stores them as time series data.

    Create the Service

    1. In Dokploy, Create ServiceDocker Compose
    2. Name it monitoring
    3. Paste this compose file:
    docker-compose.yml
    version: '3.8'
    
    services:
      prometheus:
        image: prom/prometheus:latest
        container_name: prometheus
        volumes:
          - prometheus_data:/prometheus
          - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention.time=30d'
          - '--web.enable-lifecycle'
        ports:
          - "9090:9090"
        restart: unless-stopped
    
    volumes:
      prometheus_data:

    Prometheus Configuration

    Create prometheus.yml in the same directory

    prometheus.yml
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    alerting:
      alertmanagers:
        - static_configs:
            - targets: []
    
    rule_files: []
    
    scrape_configs:
      # Prometheus itself
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']
    
      # Node Exporter (host metrics)
      - job_name: 'node'
        static_configs:
          - targets: ['node-exporter:9100']
    
      # cAdvisor (container metrics)
      - job_name: 'cadvisor'
        static_configs:
          - targets: ['cadvisor:8080']
    
      # Your applications (add as needed)
      - job_name: 'myapp'
        static_configs:
          - targets: ['myapp:3000']
        metrics_path: '/metrics'

    Add Node Exporter and cAdvisor

    Expand your compose file for host and container metrics

    Expanded docker-compose.yml
    version: '3.8'
    
    services:
      prometheus:
        image: prom/prometheus:latest
        container_name: prometheus
        volumes:
          - prometheus_data:/prometheus
          - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention.time=30d'
          - '--web.enable-lifecycle'
        ports:
          - "9090:9090"
        restart: unless-stopped
    
      node-exporter:
        image: prom/node-exporter:latest
        container_name: node-exporter
        volumes:
          - /proc:/host/proc:ro
          - /sys:/host/sys:ro
          - /:/rootfs:ro
        command:
          - '--path.procfs=/host/proc'
          - '--path.sysfs=/host/sys'
          - '--path.rootfs=/rootfs'
          - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($|/)'
        ports:
          - "9100:9100"
        restart: unless-stopped
    
      cadvisor:
        image: gcr.io/cadvisor/cadvisor:latest
        container_name: cadvisor
        volumes:
          - /:/rootfs:ro
          - /var/run:/var/run:ro
          - /sys:/sys:ro
          - /var/lib/docker/:/var/lib/docker:ro
        ports:
          - "8080:8080"
        restart: unless-stopped
    
    volumes:
      prometheus_data:

    Verify Prometheus

    Visit http://your-server-ip:9090 and check:

    • Status → Targets — All targets should be "UP"
    • Graph — Try querying up to see all monitored services

    Deploy Grafana

    Grafana turns your metrics into dashboards.

    Add to Compose

    Add Grafana service
      grafana:
        image: grafana/grafana:latest
        container_name: grafana
        volumes:
          - grafana_data:/var/lib/grafana
        environment:
          - GF_SECURITY_ADMIN_USER=admin
          - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
          - GF_USERS_ALLOW_SIGN_UP=false
        ports:
          - "3001:3000"
        restart: unless-stopped
    
    volumes:
      prometheus_data:
      grafana_data:

    Initial Setup

    1. Visit http://your-server-ip:3001
    2. Login with admin / your-secure-password
    3. Go to Connections → Data sources → Add data source
    4. Select Prometheus
    5. Set URL: http://prometheus:9090
    6. Click Save & test

    Import Dashboards

    Don't build from scratch. Import community dashboards.

    1. Go to Dashboards → Import
    2. Enter dashboard ID and click Load
    3. Select your Prometheus data source
    4. Click Import
    DashboardIDPurpose
    Node Exporter Full1860Host system metrics
    Docker Container893Container overview
    cAdvisor14282Detailed container metrics
    Traefik4475Reverse proxy metrics

    Custom Application Dashboard

    PromQL queries for your own apps

    Request Rate
    rate(http_requests_total{job="myapp"}[5m])
    Error Rate
    rate(http_requests_total{job="myapp",status=~"5.."}[5m])
    Response Time (95th percentile)
    histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="myapp"}[5m]))
    Memory Usage
    container_memory_usage_bytes{name="myapp"}
    CPU Usage
    rate(container_cpu_usage_seconds_total{name="myapp"}[5m])

    Deploy Loki for Logs

    Metrics tell you something's wrong. Logs tell you why.

    Add Loki and Promtail

    Add to docker-compose.yml
      loki:
        image: grafana/loki:latest
        container_name: loki
        volumes:
          - loki_data:/loki
        ports:
          - "3100:3100"
        command: -config.file=/etc/loki/local-config.yaml
        restart: unless-stopped
    
      promtail:
        image: grafana/promtail:latest
        container_name: promtail
        volumes:
          - /var/log:/var/log:ro
          - /var/lib/docker/containers:/var/lib/docker/containers:ro
          - ./promtail.yml:/etc/promtail/config.yml:ro
        command: -config.file=/etc/promtail/config.yml
        restart: unless-stopped
    
    volumes:
      prometheus_data:
      grafana_data:
      loki_data:

    Promtail Configuration

    promtail.yml
    server:
      http_listen_port: 9080
      grpc_listen_port: 0
    
    positions:
      filename: /tmp/positions.yaml
    
    clients:
      - url: http://loki:3100/loki/api/v1/push
    
    scrape_configs:
      # Docker container logs
      - job_name: containers
        static_configs:
          - targets:
              - localhost
            labels:
              job: containerlogs
              __path__: /var/lib/docker/containers/*/*log
    
        pipeline_stages:
          - json:
              expressions:
                output: log
                stream: stream
                attrs:
          - json:
              expressions:
                tag:
              source: attrs
          - regex:
              expression: (?P<container_name>(?:[a-zA-Z0-9][a-zA-Z0-9_.-]+))
              source: tag
          - labels:
              stream:
              container_name:
          - output:
              source: output
    
      # System logs
      - job_name: system
        static_configs:
          - targets:
              - localhost
            labels:
              job: syslog
              __path__: /var/log/syslog

    Connect Loki to Grafana

    1. In Grafana, go to Connections → Data sources → Add data source
    2. Select Loki
    3. Set URL: http://loki:3100
    4. Click Save & test

    Query Logs

    In Grafana, go to Explore and select Loki as the data source

    All logs from a container
    {container_name="myapp"}
    Error logs only
    {container_name="myapp"} |= "error"
    JSON log parsing
    {container_name="myapp"} | json | level="error"
    Count errors over time
    count_over_time({container_name="myapp"} |= "error" [5m])

    Deploy Uptime Kuma

    External uptime monitoring catches issues that internal monitoring misses.

    Add to Compose

    Add Uptime Kuma service
      uptime-kuma:
        image: louislam/uptime-kuma:latest
        container_name: uptime-kuma
        volumes:
          - uptime_kuma_data:/app/data
        ports:
          - "3002:3001"
        restart: unless-stopped
    
    volumes:
      uptime_kuma_data:

    Initial Setup

    1. Visit http://your-server-ip:3002
    2. Create your admin account
    3. Click Add New Monitor

    Monitor Types

    HTTP(s) — Web endpoints

    • URL: https://app.yourdomain.com/health
    • Interval: 60 seconds
    • Retries: 3

    TCP — Database connectivity

    • Hostname: main-db
    • Port: 5432

    Docker Container

    • Container name: myapp
    • Checks if container is running

    DNS

    • Hostname: app.yourdomain.com
    • Verifies DNS resolution

    Status Page

    Uptime Kuma can generate a public status page:

    1. Go to Status Pages
    2. Click New Status Page
    3. Add your monitors
    4. Set the slug (e.g., status)
    5. Access at http://your-server-ip:3002/status/status

    Point a subdomain like status.yourdomain.com at it for a professional look.

    Health Checks

    Health checks tell Dokploy (and your monitoring) whether your app is actually working.

    Node.js / Express Health Endpoint

    Basic health check
    app.get('/health', (req, res) => {
      res.json({ 
        status: 'ok',
        timestamp: new Date().toISOString()
      });
    });
    With dependency checks
    app.get('/health', async (req, res) => {
      try {
        // Check database
        await db.query('SELECT 1');
        
        // Check Redis
        await redis.ping();
        
        res.json({ 
          status: 'ok',
          database: 'connected',
          cache: 'connected'
        });
      } catch (error) {
        res.status(503).json({ 
          status: 'error',
          message: error.message 
        });
      }
    });

    Dockerfile Health Check

    Add to your Dockerfile
    HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
      CMD curl -f http://localhost:3000/health || exit 1

    Parameters:

    • --interval=30s — Check every 30 seconds
    • --timeout=3s — Fail if no response in 3 seconds
    • --start-period=10s — Grace period for app startup
    • --retries=3 — Mark unhealthy after 3 failures

    Dokploy Health Check

    In your application settings:

    1. Go to Advanced tab
    2. Set Health Check Path: /health
    3. Set Health Check Interval: 30

    Dokploy uses this to:

    • Determine when new deployments are ready
    • Restart containers that become unhealthy
    • Route traffic only to healthy instances

    Alerting

    Dashboards are useless at 3 AM. Set up alerts.

    Grafana Alerts

    Go to Alerting → Alert rules → New alert rule

    High Error Rate Alert (when error rate exceeds 5%)
    rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
    High Memory Usage (when exceeds 90%)
    container_memory_usage_bytes{name="myapp"} / container_spec_memory_limit_bytes{name="myapp"} > 0.9
    Container Down
    up{job="myapp"} == 0

    Notification Channels

    Configure where alerts go in Alerting → Contact points

    Slack

    Add new contact point → Slack → Enter your Slack webhook URL

    Discord

    Add new contact point → Discord → Enter your Discord webhook URL

    Email

    Configure SMTP in Grafana settings first, then add contact point → Email

    PagerDuty / Opsgenie

    For on-call rotations, integrate with your PagerDuty or Opsgenie API key

    Uptime Kuma Alerts

    Uptime Kuma has built-in notifications:

    1. Go to Settings → Notifications
    2. Click Setup Notification
    3. Choose provider (Slack, Discord, Telegram, Email, etc.)
    4. Test the notification
    5. Assign to your monitors

    Resource Limits

    Prevent runaway containers from taking down your server.

    Docker Resource Limits

    In your compose file
    services:
      myapp:
        image: myapp:latest
        deploy:
          resources:
            limits:
              cpus: '1.0'
              memory: 512M
            reservations:
              cpus: '0.25'
              memory: 128M

    Recommended Limits

    App TypeMemoryCPU
    Static site (Nginx)64-128M0.25
    Node.js API256-512M0.5-1.0
    Laravel/Django256-512M0.5-1.0
    Next.js512M-1G0.5-1.0
    Background workers256-512M0.5

    Start conservative and increase based on actual usage in Grafana.

    Auto-Restart on OOM

    Docker automatically restarts containers killed for exceeding memory limits if you have:

    restart: unless-stopped

    Or in Dokploy, enable Auto Restart in application settings.

    Complete Monitoring Stack

    Here's the full docker-compose.yml:

    Full monitoring docker-compose.yml
    version: '3.8'
    
    services:
      prometheus:
        image: prom/prometheus:latest
        container_name: prometheus
        volumes:
          - prometheus_data:/prometheus
          - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention.time=30d'
          - '--web.enable-lifecycle'
        ports:
          - "9090:9090"
        restart: unless-stopped
    
      grafana:
        image: grafana/grafana:latest
        container_name: grafana
        volumes:
          - grafana_data:/var/lib/grafana
        environment:
          - GF_SECURITY_ADMIN_USER=admin
          - GF_SECURITY_ADMIN_PASSWORD=changeme
          - GF_USERS_ALLOW_SIGN_UP=false
        ports:
          - "3001:3000"
        restart: unless-stopped
    
      loki:
        image: grafana/loki:latest
        container_name: loki
        volumes:
          - loki_data:/loki
        ports:
          - "3100:3100"
        command: -config.file=/etc/loki/local-config.yaml
        restart: unless-stopped
    
      promtail:
        image: grafana/promtail:latest
        container_name: promtail
        volumes:
          - /var/log:/var/log:ro
          - /var/lib/docker/containers:/var/lib/docker/containers:ro
          - ./promtail.yml:/etc/promtail/config.yml:ro
        command: -config.file=/etc/promtail/config.yml
        restart: unless-stopped
    
      node-exporter:
        image: prom/node-exporter:latest
        container_name: node-exporter
        volumes:
          - /proc:/host/proc:ro
          - /sys:/host/sys:ro
          - /:/rootfs:ro
        command:
          - '--path.procfs=/host/proc'
          - '--path.sysfs=/host/sys'
          - '--path.rootfs=/rootfs'
          - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($|/)'
        ports:
          - "9100:9100"
        restart: unless-stopped
    
      cadvisor:
        image: gcr.io/cadvisor/cadvisor:latest
        container_name: cadvisor
        volumes:
          - /:/rootfs:ro
          - /var/run:/var/run:ro
          - /sys:/sys:ro
          - /var/lib/docker/:/var/lib/docker:ro
        ports:
          - "8080:8080"
        restart: unless-stopped
    
      uptime-kuma:
        image: louislam/uptime-kuma:latest
        container_name: uptime-kuma
        volumes:
          - uptime_kuma_data:/app/data
        ports:
          - "3002:3001"
        restart: unless-stopped
    
    volumes:
      prometheus_data:
      grafana_data:
      loki_data:
      uptime_kuma_data:

    Quick Reference

    Service URLs

    ServiceURLDefault Credentials
    Prometheus:9090None
    Grafana:3001admin / (your password)
    Uptime Kuma:3002(set on first visit)
    cAdvisor:8080None
    Node Exporter:9100/metricsNone
    Loki:3100None (API only)

    Useful PromQL Queries

    Common monitoring queries
    # CPU usage by container
    rate(container_cpu_usage_seconds_total[5m])
    
    # Memory usage by container
    container_memory_usage_bytes
    
    # Disk usage
    node_filesystem_avail_bytes / node_filesystem_size_bytes
    
    # Network traffic
    rate(node_network_receive_bytes_total[5m])
    
    # HTTP request rate
    rate(http_requests_total[5m])
    
    # Container restarts
    increase(container_restart_count[1h])

    What's Next

    You can see what's happening. Part 6 locks it down:

    Part 6 — Final

    Production Hardening

    SSO with Authentik, Cloudflare Tunnel for zero-trust access, secrets management with Infisical, and a disaster recovery playbook.

    Related Guides