Benchmarking a Cloud VPS with sysbench, fio, and stress-ng

Applies to: All RamNode Cloud VPS plans | Debian, Ubuntu, AlmaLinux, Rocky, RHEL | Rev. 2026

1. Introduction

Before deploying production workloads on a VPS, you should understand what you are actually getting. Provider specifications give you a starting point, but real-world performance depends on factors like CPU steal time, storage backend architecture, neighbor density on the host, and the underlying hardware generation. Benchmarking lets you verify that the resources you are paying for are actually delivering.

This guide covers three industry-standard tools for benchmarking a cloud VPS:

sysbench for CPU and memory performance
fio for disk I/O characterization
stress-ng for stability and stress testing

By the end, you will have a reproducible benchmarking workflow you can run on any new instance to establish baselines and detect performance drift over time.

2. When to Benchmark

Immediately after provisioning, to establish a baseline
Before migrating production workloads to a new instance
When investigating performance complaints from users or applications
When comparing providers, regions, or plan tiers
Periodically (monthly or quarterly) to detect noisy neighbor degradation — see Noisy Neighbor Symptoms vs. Real Performance Issues

3. Prerequisites

Root or sudo access to the VPS
At least 10 GB of free disk space on the device you plan to benchmark — see Diagnosing and Fixing Disk Space Issues
Network connectivity for package installation
A quiescent system: stop unnecessary services before testing to avoid skewed results

4. Installing the Tools

On Ubuntu or Debian:

Install on Debian/Ubuntu

apt update
apt install -y sysbench fio stress-ng

On AlmaLinux, Rocky Linux, or RHEL:

Install on AlmaLinux/Rocky/RHEL

dnf install -y epel-release
dnf install -y sysbench fio stress-ng

Verify the installed versions:

Version check

sysbench --version
fio --version
stress-ng --version

5. CPU Benchmarking with sysbench

The sysbench CPU test calculates prime numbers up to a configurable ceiling. While synthetic, the results correlate well with general compute throughput and make it straightforward to compare clock speed and core efficiency between instances.

Single-thread CPU test

Single-thread test

sysbench cpu --cpu-max-prime=20000 --threads=1 --time=60 run

Multi-threaded CPU test

Match the thread count to the available vCPU count. Check vCPUs with nproc:

Multi-threaded test

THREADS=$(nproc)
sysbench cpu --cpu-max-prime=20000 --threads=$THREADS --time=60 run

Key Metrics

events per second: primary throughput metric, higher is better
total time: should match the --time value
latency (avg, 95th percentile): lower is better
threads fairness: low standard deviation across threads indicates consistent per-core performance

Reference numbers: A modern dedicated CPU core on recent Ryzen, EPYC, or Xeon Sapphire Rapids hardware should produce roughly 1,500 to 3,000 events per second at --cpu-max-prime=20000 on a single thread. Shared or burstable vCPUs often land 30 to 60 percent lower. If your numbers are unexpectedly low, check Diagnosing High CPU Usage.

6. Memory Benchmarking with sysbench

The memory test measures both sequential and random memory access performance.

Sequential write throughput

Sequential write

sysbench memory \
  --memory-block-size=1M \
  --memory-total-size=10G \
  --memory-oper=write \
  --memory-access-mode=seq \
  run

Random read latency

Random read

sysbench memory \
  --memory-block-size=1K \
  --memory-total-size=10G \
  --memory-oper=read \
  --memory-access-mode=rnd \
  run

Memory bandwidth varies significantly between hardware generations. DDR5-equipped hosts produce noticeably higher throughput than DDR4 systems, particularly at larger block sizes. Small-block random access is more sensitive to memory latency than to raw bandwidth.

7. Disk I/O Benchmarking with fio

fio is the most flexible and widely trusted disk benchmark available. Disk performance is not a single number — it varies by block size, queue depth, read and write mix, and access pattern. You need multiple tests to characterize a storage device meaningfully.

Critical fio Parameters

--direct=1: bypasses the OS page cache so you measure storage, not RAM
--ioengine=io_uring: preferred on kernels 5.1+; fall back to libaio
--bs: block size; 4K for random I/O, 1M for sequential
--iodepth: queue depth; 1 for latency, 32–64 for peak throughput
--numjobs: parallel workers; equal to vCPUs to maximize load
--size: should exceed RAM to defeat caching
--runtime with --time_based: caps each test at a fixed duration

Test 1: 4K Random Read IOPS

The most important metric for database, web, and general application workloads.

4K random read

fio --name=randread4k \
    --filename=/tmp/fio-test \
    --rw=randread \
    --bs=4k \
    --size=4G \
    --numjobs=1 \
    --iodepth=32 \
    --direct=1 \
    --ioengine=io_uring \
    --runtime=60 \
    --time_based \
    --group_reporting

Test 2: 4K Random Write IOPS

4K random write

fio --name=randwrite4k \
    --filename=/tmp/fio-test \
    --rw=randwrite \
    --bs=4k \
    --size=4G \
    --numjobs=1 \
    --iodepth=32 \
    --direct=1 \
    --ioengine=io_uring \
    --runtime=60 \
    --time_based \
    --group_reporting

Test 3: Mixed 70/30 Random Read/Write

Approximates a typical database workload:

Mixed 70/30

fio --name=mixedrw \
    --filename=/tmp/fio-test \
    --rw=randrw \
    --rwmixread=70 \
    --bs=4k \
    --size=4G \
    --numjobs=4 \
    --iodepth=32 \
    --direct=1 \
    --ioengine=io_uring \
    --runtime=60 \
    --time_based \
    --group_reporting

Test 4: Sequential Throughput

Useful for large file workloads, backups, and streaming:

Sequential 1M read

fio --name=seqread1m \
    --filename=/tmp/fio-test \
    --rw=read \
    --bs=1M \
    --size=4G \
    --numjobs=1 \
    --iodepth=8 \
    --direct=1 \
    --ioengine=io_uring \
    --runtime=60 \
    --time_based \
    --group_reporting

Test 5: Single-Queue Latency

Critical for understanding interactive responsiveness:

Latency probe

fio --name=latency \
    --filename=/tmp/fio-test \
    --rw=randread \
    --bs=4k \
    --size=4G \
    --numjobs=1 \
    --iodepth=1 \
    --direct=1 \
    --ioengine=io_uring \
    --runtime=30 \
    --time_based

Clean up

Remove test file

rm /tmp/fio-test

Reading fio Output

IOPS: operations per second, higher is better
BW: throughput in MiB/s
clat: per-operation completion latency; review average and 99th percentile
clat 99.99: tail latency, important for user-facing applications

Reference numbers: Modern NVMe-backed VPS storage should produce 20,000 to 100,000+ IOPS for 4K random reads at queue depth 32, with sub-millisecond average latency. SATA SSD-backed instances typically land in the 5,000 to 20,000 IOPS range. If random write IOPS are dramatically lower than reads, the underlying storage may use a write-through cache or be experiencing host-level write amplification.

8. Stress Testing with stress-ng

While sysbench and fio measure peak performance, stress-ng pushes the system to its limits to expose stability issues, thermal throttling on the host, and noisy neighbor effects under sustained load.

CPU stress test

CPU stress

stress-ng --cpu $(nproc) --cpu-method matrixprod --metrics-brief --timeout 300s

Memory stress test

Caution: Be careful with allocation percentages; over-allocating will trigger the OOM killer.

Memory stress

stress-ng --vm 2 --vm-bytes 75% --vm-method all --verify --metrics-brief --timeout 300s

Disk I/O stress test

Disk stress

stress-ng --hdd 2 --hdd-bytes 1G --metrics-brief --timeout 300s

Combined system stress

The canonical "everything at once" test:

Combined stress

stress-ng \
  --cpu $(nproc) \
  --io 2 \
  --vm 1 \
  --vm-bytes 25% \
  --hdd 1 \
  --hdd-bytes 1G \
  --timeout 600s \
  --metrics-brief

While stress-ng is running, open a second SSH session and monitor the system:

Live monitoring

top
vmstat 1
iostat -xz 1

Watch for CPU steal time (%st in top), which indicates the hypervisor is preempting your vCPUs to schedule other tenants. See Basic Resource Monitoring for more on these tools.

9. Interpreting Steal Time

CPU steal time is one of the most useful indicators of host contention on a shared VPS. Brief spikes during system events are normal. Consistent steal time above 5 percent during a sustained sysbench CPU run typically points to neighbor contention. If you see sustained high steal time, especially during off-peak hours when your own workload is quiet, that warrants a support ticket so the provider can investigate the host or migrate your instance. The companion guide Noisy Neighbor Symptoms vs. Real Performance Issues walks through the full diagnostic workflow.

10. Putting It All Together: A Quick Benchmark Script

Save this as quickbench.sh for repeatable baseline captures:

quickbench.sh

#!/bin/bash
set -e
LOG="bench-$(date +%Y%m%d-%H%M%S).log"
exec > >(tee -a "$LOG") 2>&1

echo "=== System Info ==="
lscpu | grep -E "Model name|^CPU\(s\)|Thread|MHz"
free -h
df -h /

echo "=== sysbench CPU ==="
sysbench cpu --cpu-max-prime=20000 --threads=$(nproc) --time=60 run

echo "=== sysbench memory ==="
sysbench memory --memory-block-size=1M --memory-total-size=10G run

echo "=== fio 4K random read ==="
fio --name=rr4k --filename=/tmp/fio-test --rw=randread --bs=4k \
    --size=4G --numjobs=1 --iodepth=32 --direct=1 \
    --ioengine=io_uring --runtime=30 --time_based --group_reporting

echo "=== fio 4K random write ==="
fio --name=rw4k --filename=/tmp/fio-test --rw=randwrite --bs=4k \
    --size=4G --numjobs=1 --iodepth=32 --direct=1 \
    --ioengine=io_uring --runtime=30 --time_based --group_reporting

rm -f /tmp/fio-test
echo "=== Done. Log saved to: $LOG ==="

Make it executable and run:

Run the script

chmod +x quickbench.sh
./quickbench.sh

The script logs results to a timestamped file so you can archive baselines for later comparison.

11. Common Pitfalls

Testing with caching enabled: Always pass --direct=1 to fio, or you are benchmarking RAM speed instead of the storage device.
Insufficient runtime: Anything under 30 seconds is too short to capture steady-state behavior. Queue dynamics, thermal effects, and write throttling all need time to stabilize.
Test data smaller than RAM: fio test sizes must exceed system memory to avoid page cache amplification of read numbers.
Single test runs: Always run benchmarks three or more times and evaluate consistency, not just peak values.
Benchmarking during deployment: Run on a clean, idle system, never while configuration management, package updates, or backups are running.
Ignoring filesystem overhead: For the most accurate disk numbers, benchmark against a raw block device when possible — see Choosing a Filesystem.

12. Comparing Across Providers or Plans

When comparing benchmarks across providers, use identical command parameters and similar instance sizes. Document everything:

vCPU count and model (lscpu)
Total RAM (free -h)
Advertised storage type (NVMe, SATA SSD, network block storage)
Geographic region or datacenter
Test timestamps, since load varies by time of day
Kernel version (uname -r) and OS release (cat /etc/os-release)

A spreadsheet with rows per provider and columns for each metric makes the comparison auditable when you revisit it months later.

13. Next Steps

After establishing baseline numbers, save them alongside your instance documentation. Re-run the same suite periodically and compare. Significant drift, particularly in random I/O latency or sustained CPU throughput, can signal host hardware issues or increased neighbor density and may justify opening a ticket or migrating to a different node.

For workload-specific benchmarking, consider following up with:

pgbench for PostgreSQL workloads
sysbench oltp for MySQL or MariaDB
wrk or k6 for HTTP application performance
iperf3 for network throughput between instances
redis-benchmark for Redis workloads

Synthetic benchmarks give you a baseline. Application-level benchmarks tell you whether the instance will actually handle what you are about to put on it. Run both.