What is OpenTelemetry?
OpenTelemetry (OTel) is the industry-standard open-source observability framework for collecting, processing, and exporting telemetry data. It provides a unified approach to traces, metrics, and logs, eliminating the need for multiple vendor-specific agents.
- • Distributed Tracing: Track requests across services with Jaeger
- • Metrics Collection: Time-series data via Prometheus
- • Host Metrics: CPU, memory, disk, and network out of the box
- • Vendor Neutral: Single SDK for any backend
- • Grafana Dashboards: Unified visualization across all signals
- • Batch Processing: Efficient data pipeline with configurable limits
This guide builds on top of other RamNode guides:
Prerequisites
Recommended VPS Specifications
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 2 vCPUs | 4 vCPUs |
| RAM | 2 GB | 4 GB |
| Storage | 20 GB SSD | 40 GB+ SSD |
| OS | Ubuntu 22.04 LTS | Ubuntu 24.04 LTS |
Software Requirements
- • Docker Engine 24.0+ and Docker Compose v2 — see our Docker guide
- •
curlandwgetfor downloading binaries
Initial Server Setup
Update your system and install Docker:
sudo apt update && sudo apt upgrade -y
sudo apt install -y ca-certificates curl gnupg lsb-releasecurl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp dockerdocker --version
docker compose versionFor a detailed Docker walkthrough, see our Docker Basics Guide.
Architecture Overview
Component Breakdown
| Component | Role | Default Port |
|---|---|---|
| OTel Collector | Receives, processes, and exports telemetry | 4317 (gRPC), 4318 (HTTP) |
| Prometheus | Time-series metrics storage and querying | 9090 |
| Jaeger | Distributed trace storage and UI | 16686 (UI), 14250 |
| Grafana | Visualization dashboards and alerting | 3000 |
Data Flow
Your application sends traces and metrics to the OpenTelemetry Collector via gRPC or HTTP. The Collector processes the data (batching, filtering, enriching) and exports it to the appropriate backends — metrics go to Prometheus, traces go to Jaeger. Grafana connects to each backend as a data source, providing unified dashboards across all three telemetry signals.
Deploy with Docker Compose
mkdir -p ~/otel-stack/{config,data/prometheus,data/grafana}
cd ~/otel-stackversion: '3.8'
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
container_name: otel-collector
command: ['--config=/etc/otel/config.yaml']
volumes:
- ./config/otel-collector.yaml:/etc/otel/config.yaml:ro
ports:
- '4317:4317' # OTLP gRPC
- '4318:4318' # OTLP HTTP
- '8888:8888' # Collector metrics
restart: unless-stopped
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./data/prometheus:/prometheus
ports:
- '9090:9090'
restart: unless-stopped
jaeger:
image: jaegertracing/all-in-one:latest
container_name: jaeger
environment:
- COLLECTOR_OTLP_ENABLED=true
ports:
- '16686:16686' # Jaeger UI
- '14250:14250' # gRPC
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
volumes:
- ./data/grafana:/var/lib/grafana
ports:
- '3000:3000'
environment:
- GF_SECURITY_ADMIN_PASSWORD=changeme
restart: unless-stoppedGF_SECURITY_ADMIN_PASSWORD with a strong, unique password before deploying to production.Configure the OpenTelemetry Collector
The Collector is the central hub. Its configuration defines receivers, processors, and exporters.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
memory: {}
disk: {}
network: {}
load: {}
processors:
batch:
send_batch_size: 1024
timeout: 5s
memory_limiter:
check_interval: 5s
limit_mib: 512
spike_limit_mib: 128
resourcedetection:
detectors: [system]
system:
hostname_sources: [os]
exporters:
prometheus:
endpoint: 0.0.0.0:8889
namespace: otel
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
logging:
loglevel: warn
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/jaeger, logging]
metrics:
receivers: [otlp, hostmetrics]
processors: [memory_limiter, batch, resourcedetection]
exporters: [prometheus]Key Configuration Decisions
- Memory limiter: Prevents the Collector from consuming excessive RAM. On a 4 GB VPS, capping at 512 MiB with a 128 MiB spike limit provides a good balance.
- Batch processor: Batching reduces network overhead and improves backend write performance. A batch size of 1024 with a 5-second timeout works well for most workloads.
- Host metrics: The
hostmetricsreceiver provides system-level CPU, memory, disk, and network metrics without any application instrumentation.
Configure Prometheus
Prometheus scrapes the metrics endpoint exposed by the Collector.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8889']
- job_name: 'collector-internal'
static_configs:
- targets: ['otel-collector:8888']The first job collects application and host metrics (port 8889). The second scrapes the Collector's own internal metrics (port 8888) for monitoring pipeline health.
Launch the Stack
cd ~/otel-stack
docker compose up -ddocker compose psdocker compose logs otel-collector --tail 50Verify Endpoints
| Service | URL | Expected |
|---|---|---|
| Grafana | http://YOUR_IP:3000 | Login page |
| Jaeger UI | http://YOUR_IP:16686 | Search page |
| Prometheus | http://YOUR_IP:9090 | Query interface |
| Collector Health | http://YOUR_IP:8888/metrics | Metrics output |
Instrument a Sample Application
To verify the full pipeline, instrument a simple Python Flask application. The same principles apply to any language supported by OpenTelemetry SDKs.
pip install flask \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-instrumentation-flask \
opentelemetry-exporter-otlpfrom flask import Flask
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter \
import OTLPSpanExporter
from opentelemetry.instrumentation.flask \
import FlaskInstrumentor
from opentelemetry.sdk.resources import Resource
# Configure the tracer
resource = Resource.create({'service.name': 'my-flask-app'})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(
endpoint='http://localhost:4317',
insecure=True
)
provider.add_span_processor(BatchSpanExporter(exporter))
trace.set_tracer_provider(provider)
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
@app.route('/')
def hello():
return 'Hello from OpenTelemetry!'
@app.route('/health')
def health():
return {'status': 'ok'}
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)python app.py &
curl http://localhost:5000/
curl http://localhost:5000/healthAfter sending requests, open the Jaeger UI at http://YOUR_IP:16686 and select "my-flask-app" from the service dropdown. You should see traces for each HTTP request.
Set Up Grafana Dashboards
Grafana provides the visualization layer. For a comprehensive walkthrough of Grafana setup, see our Grafana + Prometheus Guide.
Adding Data Sources
- Prometheus: Navigate to Configuration → Data Sources → Add data source. Select Prometheus and set the URL to
http://prometheus:9090. Click Save & Test. - Jaeger: Add another data source, select Jaeger, and set the URL to
http://jaeger:16686. This enables trace visualization within Grafana panels.
Recommended Dashboards
- • Node Exporter Full (Dashboard ID 1860) for system-level metrics
- • OpenTelemetry Collector (Dashboard ID 15983) for pipeline health
- • Custom application dashboards using PromQL queries against your OTel metrics
To import a community dashboard, go to Dashboards → Import, enter the dashboard ID, select your Prometheus data source, and click Import.
Firewall & Security Hardening
# Allow SSH
sudo ufw allow 22/tcp
# Allow OTel Collector (restrict to app servers)
sudo ufw allow from YOUR_APP_IP to any port 4317
sudo ufw allow from YOUR_APP_IP to any port 4318
# Allow Grafana (restrict to your IP)
sudo ufw allow from YOUR_ADMIN_IP to any port 3000
# Block public access to Prometheus and Jaeger
# Access via Grafana data sources instead
sudo ufw enableAdditional Security Measures
- • Place Grafana behind a reverse proxy (Nginx or Caddy) with TLS
- • Enable Grafana authentication with OAuth or LDAP for team access
- • Set resource limits on Docker containers to prevent runaway memory usage
- • Rotate the Grafana admin password and store credentials in environment files excluded from version control
- • Use Docker network isolation so Prometheus and Jaeger are only accessible within the Compose network
Resource Tuning for VPS Environments
Memory Allocation Guidelines
| Component | 2 GB VPS | 4 GB VPS | 8 GB VPS |
|---|---|---|---|
| OTel Collector | 256 MiB | 512 MiB | 1 GiB |
| Prometheus | 512 MiB | 1 GiB | 2 GiB |
| Jaeger | 256 MiB | 512 MiB | 1 GiB |
| Grafana | 128 MiB | 256 MiB | 512 MiB |
Prometheus Retention Settings
Adjust storage retention to match your available disk space. Add these flags to the Prometheus service:
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=15d'
- '--storage.tsdb.retention.size=5GB'Reducing Collector Overhead
For low-traffic applications, increase the batch timeout and reduce scrape frequency:
processors:
batch:
send_batch_size: 512
timeout: 10sTroubleshooting
Common Issues
docker compose logs otel-collector. The most common issue is incorrect indentation in pipeline definitions.http://YOUR_IP:9090/targets. If the otel-collector target shows as down, confirm port 8889 is exposed.memory_limiter thresholds and add Docker memory limits using deploy.resources.limits.memory in your Compose file.prometheus, jaeger) rather than localhost in Grafana data source URLs.Useful Diagnostic Commands
docker stats --no-streamcurl -v http://localhost:4318/v1/tracescurl http://localhost:8888/metrics | grep oteldocker compose restart otel-collectorNext Steps
- • Add Loki for centralized log aggregation, completing the three pillars of observability
- • Set up Grafana alerting rules for latency spikes, error rates, or resource exhaustion
- • Instrument additional services and use trace context propagation across microservices
- • Implement the OTel Collector's
tail_samplingprocessor to reduce storage costs - • Explore Grafana Tempo as an alternative to Jaeger for Grafana-native tracing
- • Automate the deployment with Ansible or Terraform for reproducible infrastructure
