Monitoring Remote Servers with Grafana

Monitoring multiple remote servers from a central location is essential for maintaining system health and preventing downtime. This guide will show you how to set up Grafana, Prometheus, and Node Exporter to monitor all your servers from one dashboard.

What You’ll Build

A monitoring system where:

Central monitoring server runs Grafana and Prometheus
Remote servers run Node Exporter to collect metrics
Single dashboard shows metrics from all servers
Alerts notify you when issues occur

Architecture

Central Server                    Remote Servers
┌─────────────────┐              ┌─────────────────┐
│ Grafana :3000   │◄─────────────┤ Server 1 :9100  │
│ Prometheus :9090│              └─────────────────┘
└─────────────────┘              ┌─────────────────┐
                                 │ Server 2 :9100  │
                                 └─────────────────┘

Prerequisites

Central monitoring server (2GB RAM minimum)
SSH access to all remote servers
Basic Linux command line knowledge

Set Up Remote Servers

SSH into each remote server and run these commands:

Install Node Exporter

# Create user
sudo useradd --no-create-home --shell /bin/false node_exporter

# Download and install
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
rm -rf node_exporter-1.6.1.linux-amd64*

Create Service File

sudo nano /etc/systemd/system/node_exporter.service

Add this content:

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=0.0.0.0:9100

[Install]
WantedBy=multi-user.target

Start the Service

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

# Verify it's working
curl http://localhost:9100/metrics | head -10

Configure Firewall

Allow access from your monitoring server only:

# Replace MONITORING_SERVER_IP with your actual monitoring server IP
sudo ufw allow from MONITORING_SERVER_IP to any port 9100
sudo ufw enable

Repeat these steps on all remote servers you want to monitor.

Set Up Central Monitoring Server

Install Prometheus

# Create user
sudo useradd --no-create-home --shell /bin/false prometheus

# Create directories
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

# Download Prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
tar xvf prometheus-2.47.0.linux-amd64.tar.gz

# Install
sudo cp prometheus-2.47.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.47.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

# Copy console files
sudo cp -r prometheus-2.47.0.linux-amd64/consoles /etc/prometheus
sudo cp -r prometheus-2.47.0.linux-amd64/console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/

# Clean up
rm -rf prometheus-2.47.0.linux-amd64*

Configure Prometheus

sudo nano /etc/prometheus/prometheus.yml

Add this configuration (replace IP addresses with your actual server IPs):

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'servers'
    static_configs:
      - targets: 
          - '10.0.1.10:9100'  # Server 1
          - '10.0.1.11:9100'  # Server 2
          - '10.0.1.12:9100'  # Server 3
        labels:
          group: 'production'

Create Prometheus Service

sudo nano /etc/systemd/system/prometheus.service

Add this content:

[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090

[Install]
WantedBy=multi-user.target

Start Prometheus

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

Install Grafana

# Add Grafana repository
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

# Install Grafana
sudo apt update
sudo apt install grafana -y

# Start Grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Configure Firewall

# Allow Grafana access
sudo ufw allow 3000

# Optional: Allow Prometheus access
sudo ufw allow 9090

sudo ufw enable

Set Up Dashboards

Access Grafana

Open http://your-monitoring-server-ip:3000
Login with username: admin, password: admin
Change the password when prompted

Add Prometheus Data Source

Click the gear icon (⚙️) → Data Sources
Click “Add data source”
Select “Prometheus”
Set URL to http://localhost:9090
Click “Save & Test”

Import Dashboard

Click the “+” icon → Import
Enter dashboard ID: 1860
Click “Load”
Select your Prometheus data source
Click “Import”

You’ll now see metrics from all your servers!

Basic Alerting

Create an alert rules file:

sudo nano /etc/prometheus/alert_rules.yml

Add these basic alerts:

groups:
  - name: basic_alerts
    rules:
      - alert: ServerDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Server {{ $labels.instance }} is down"

      - alert: HighCPUUsage
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory on {{ $labels.instance }}"

      - alert: LowDiskSpace
        expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) * 100) > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"

Update Prometheus config to include alerts:

sudo nano /etc/prometheus/prometheus.yml

Add this line under the global section:

rule_files:
  - "alert_rules.yml"

Restart Prometheus:

sudo systemctl restart prometheus

Key Dashboards to Create

Server Overview Panel

Create a table showing all servers:

Panel Type: Table
Query: up
Shows which servers are online/offline

CPU Usage Panel

Panel Type: Time series
Query: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Shows CPU usage for each server

Memory Usage Panel

Panel Type: Time series
Query: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Shows memory usage percentage

Disk Usage Panel

Panel Type: Gauge
Query: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) * 100)
Shows disk usage percentage

Troubleshooting

Servers Not Showing in Prometheus

Check if Node Exporter is running:
```
sudo systemctl status node_exporter
```

Test network connectivity:

curl http://REMOTE_SERVER_IP:9100/metrics

Check firewall rules:
```
sudo ufw status
```

Prometheus Not Starting

Check logs:

sudo journalctl -u prometheus -f

Common issues:

Config file syntax errors
Permission problems
Port already in use

Grafana Dashboard Empty

Verify Prometheus data source is working
Check if data is in Prometheus: http://your-ip:9090/targets
Verify queries in dashboard panels

Maintenance

Regular Tasks

Update software monthly:
```
sudo apt update && sudo apt upgrade
```
Check disk space on monitoring server:
```
df -h /var/lib/prometheus
```
Review alerts and adjust thresholds as needed

Backup configurations:

sudo tar czf monitoring-backup-$(date +%Y%m%d).tar.gz \
    /etc/prometheus/ /etc/grafana/

Performance Tips

Increase scrape intervals for non-critical servers
Use shorter retention periods to save disk space
Monitor the monitoring server itself

Security Best Practices

Use HTTPS for Grafana (set up nginx reverse proxy with SSL)
Restrict access to monitoring ports using firewall rules
Change default passwords for Grafana
Regularly update all components
Use strong authentication for Grafana users

Remember to regularly review your dashboards and fine-tune alert thresholds based on your specific infrastructure needs.