Monitoring multiple remote servers from a central location is essential for maintaining system health and preventing downtime. This guide will show you how to set up Grafana, Prometheus, and Node Exporter to monitor all your servers from one dashboard.
What You’ll Build
A monitoring system where:
- Central monitoring server runs Grafana and Prometheus
- Remote servers run Node Exporter to collect metrics
- Single dashboard shows metrics from all servers
- Alerts notify you when issues occur
Architecture
Central Server Remote Servers
┌─────────────────┐ ┌─────────────────┐
│ Grafana :3000 │◄─────────────┤ Server 1 :9100 │
│ Prometheus :9090│ └─────────────────┘
└─────────────────┘ ┌─────────────────┐
│ Server 2 :9100 │
└─────────────────┘
Prerequisites
- Central monitoring server (2GB RAM minimum)
- SSH access to all remote servers
- Basic Linux command line knowledge
Set Up Remote Servers
SSH into each remote server and run these commands:
Install Node Exporter
# Create user
sudo useradd --no-create-home --shell /bin/false node_exporter
# Download and install
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
rm -rf node_exporter-1.6.1.linux-amd64*
Create Service File
sudo nano /etc/systemd/system/node_exporter.service
Add this content:
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=0.0.0.0:9100
[Install]
WantedBy=multi-user.target
Start the Service
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
# Verify it's working
curl http://localhost:9100/metrics | head -10
Configure Firewall
Allow access from your monitoring server only:
# Replace MONITORING_SERVER_IP with your actual monitoring server IP
sudo ufw allow from MONITORING_SERVER_IP to any port 9100
sudo ufw enable
Repeat these steps on all remote servers you want to monitor.
Set Up Central Monitoring Server
Install Prometheus
# Create user
sudo useradd --no-create-home --shell /bin/false prometheus
# Create directories
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
# Download Prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
tar xvf prometheus-2.47.0.linux-amd64.tar.gz
# Install
sudo cp prometheus-2.47.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.47.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
# Copy console files
sudo cp -r prometheus-2.47.0.linux-amd64/consoles /etc/prometheus
sudo cp -r prometheus-2.47.0.linux-amd64/console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/
# Clean up
rm -rf prometheus-2.47.0.linux-amd64*
Configure Prometheus
sudo nano /etc/prometheus/prometheus.yml
Add this configuration (replace IP addresses with your actual server IPs):
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'servers'
static_configs:
- targets:
- '10.0.1.10:9100' # Server 1
- '10.0.1.11:9100' # Server 2
- '10.0.1.12:9100' # Server 3
labels:
group: 'production'
Create Prometheus Service
sudo nano /etc/systemd/system/prometheus.service
Add this content:
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090
[Install]
WantedBy=multi-user.target
Start Prometheus
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus
Install Grafana
# Add Grafana repository
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
# Install Grafana
sudo apt update
sudo apt install grafana -y
# Start Grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Configure Firewall
# Allow Grafana access
sudo ufw allow 3000
# Optional: Allow Prometheus access
sudo ufw allow 9090
sudo ufw enable
Set Up Dashboards
Access Grafana
- Open
http://your-monitoring-server-ip:3000
- Login with username:
admin
, password:admin
- Change the password when prompted
Add Prometheus Data Source
- Click the gear icon (⚙️) → Data Sources
- Click “Add data source”
- Select “Prometheus”
- Set URL to
http://localhost:9090
- Click “Save & Test”
Import Dashboard
- Click the “+” icon → Import
- Enter dashboard ID:
1860
- Click “Load”
- Select your Prometheus data source
- Click “Import”
You’ll now see metrics from all your servers!
Basic Alerting
Create an alert rules file:
sudo nano /etc/prometheus/alert_rules.yml
Add these basic alerts:
groups:
- name: basic_alerts
rules:
- alert: ServerDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Server {{ $labels.instance }} is down"
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory on {{ $labels.instance }}"
- alert: LowDiskSpace
expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) * 100) > 90
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
Update Prometheus config to include alerts:
sudo nano /etc/prometheus/prometheus.yml
Add this line under the global
section:
rule_files:
- "alert_rules.yml"
Restart Prometheus:
sudo systemctl restart prometheus
Key Dashboards to Create
Server Overview Panel
Create a table showing all servers:
- Panel Type: Table
- Query:
up
- Shows which servers are online/offline
CPU Usage Panel
- Panel Type: Time series
- Query:
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
- Shows CPU usage for each server
Memory Usage Panel
- Panel Type: Time series
- Query:
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
- Shows memory usage percentage
Disk Usage Panel
- Panel Type: Gauge
- Query:
100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) * 100)
- Shows disk usage percentage
Troubleshooting
Servers Not Showing in Prometheus
- Check if Node Exporter is running:
sudo systemctl status node_exporter
- Test network connectivity:
curl http://REMOTE_SERVER_IP:9100/metrics
- Check firewall rules:
sudo ufw status
Prometheus Not Starting
Check logs:
sudo journalctl -u prometheus -f
Common issues:
- Config file syntax errors
- Permission problems
- Port already in use
Grafana Dashboard Empty
- Verify Prometheus data source is working
- Check if data is in Prometheus:
http://your-ip:9090/targets
- Verify queries in dashboard panels
Maintenance
Regular Tasks
- Update software monthly:
sudo apt update && sudo apt upgrade
- Check disk space on monitoring server:
df -h /var/lib/prometheus
- Review alerts and adjust thresholds as needed
- Backup configurations:
sudo tar czf monitoring-backup-$(date +%Y%m%d).tar.gz \ /etc/prometheus/ /etc/grafana/
Performance Tips
- Increase scrape intervals for non-critical servers
- Use shorter retention periods to save disk space
- Monitor the monitoring server itself
Security Best Practices
- Use HTTPS for Grafana (set up nginx reverse proxy with SSL)
- Restrict access to monitoring ports using firewall rules
- Change default passwords for Grafana
- Regularly update all components
- Use strong authentication for Grafana users
Remember to regularly review your dashboards and fine-tune alert thresholds based on your specific infrastructure needs.