Back to Cloud VPS Documentation
    Performance & Benchmarking

    Benchmarking a Cloud VPS with sysbench, fio, and stress-ng

    A reproducible workflow to measure CPU, memory, disk, and stability on any new instance — establish baselines, compare providers, and catch performance drift early.

    Applies to: All RamNode Cloud VPS plans | Debian, Ubuntu, AlmaLinux, Rocky, RHEL | Rev. 2026

    1. Introduction

    Before deploying production workloads on a VPS, you should understand what you are actually getting. Provider specifications give you a starting point, but real-world performance depends on factors like CPU steal time, storage backend architecture, neighbor density on the host, and the underlying hardware generation. Benchmarking lets you verify that the resources you are paying for are actually delivering.

    This guide covers three industry-standard tools for benchmarking a cloud VPS:

    • sysbench for CPU and memory performance
    • fio for disk I/O characterization
    • stress-ng for stability and stress testing

    By the end, you will have a reproducible benchmarking workflow you can run on any new instance to establish baselines and detect performance drift over time.

    2. When to Benchmark

    • Immediately after provisioning, to establish a baseline
    • Before migrating production workloads to a new instance
    • When investigating performance complaints from users or applications
    • When comparing providers, regions, or plan tiers
    • Periodically (monthly or quarterly) to detect noisy neighbor degradation — see Noisy Neighbor Symptoms vs. Real Performance Issues

    3. Prerequisites

    • Root or sudo access to the VPS
    • At least 10 GB of free disk space on the device you plan to benchmark — see Diagnosing and Fixing Disk Space Issues
    • Network connectivity for package installation
    • A quiescent system: stop unnecessary services before testing to avoid skewed results

    4. Installing the Tools

    On Ubuntu or Debian:

    Install on Debian/Ubuntu
    apt update
    apt install -y sysbench fio stress-ng

    On AlmaLinux, Rocky Linux, or RHEL:

    Install on AlmaLinux/Rocky/RHEL
    dnf install -y epel-release
    dnf install -y sysbench fio stress-ng

    Verify the installed versions:

    Version check
    sysbench --version
    fio --version
    stress-ng --version

    5. CPU Benchmarking with sysbench

    The sysbench CPU test calculates prime numbers up to a configurable ceiling. While synthetic, the results correlate well with general compute throughput and make it straightforward to compare clock speed and core efficiency between instances.

    Single-thread CPU test

    Single-thread test
    sysbench cpu --cpu-max-prime=20000 --threads=1 --time=60 run

    Multi-threaded CPU test

    Match the thread count to the available vCPU count. Check vCPUs with nproc:

    Multi-threaded test
    THREADS=$(nproc)
    sysbench cpu --cpu-max-prime=20000 --threads=$THREADS --time=60 run

    Key Metrics

    • events per second: primary throughput metric, higher is better
    • total time: should match the --time value
    • latency (avg, 95th percentile): lower is better
    • threads fairness: low standard deviation across threads indicates consistent per-core performance

    Reference numbers: A modern dedicated CPU core on recent Ryzen, EPYC, or Xeon Sapphire Rapids hardware should produce roughly 1,500 to 3,000 events per second at --cpu-max-prime=20000 on a single thread. Shared or burstable vCPUs often land 30 to 60 percent lower. If your numbers are unexpectedly low, check Diagnosing High CPU Usage.

    6. Memory Benchmarking with sysbench

    The memory test measures both sequential and random memory access performance.

    Sequential write throughput

    Sequential write
    sysbench memory \
      --memory-block-size=1M \
      --memory-total-size=10G \
      --memory-oper=write \
      --memory-access-mode=seq \
      run

    Random read latency

    Random read
    sysbench memory \
      --memory-block-size=1K \
      --memory-total-size=10G \
      --memory-oper=read \
      --memory-access-mode=rnd \
      run

    Memory bandwidth varies significantly between hardware generations. DDR5-equipped hosts produce noticeably higher throughput than DDR4 systems, particularly at larger block sizes. Small-block random access is more sensitive to memory latency than to raw bandwidth.

    7. Disk I/O Benchmarking with fio

    fio is the most flexible and widely trusted disk benchmark available. Disk performance is not a single number — it varies by block size, queue depth, read and write mix, and access pattern. You need multiple tests to characterize a storage device meaningfully.

    Critical fio Parameters

    • --direct=1: bypasses the OS page cache so you measure storage, not RAM
    • --ioengine=io_uring: preferred on kernels 5.1+; fall back to libaio
    • --bs: block size; 4K for random I/O, 1M for sequential
    • --iodepth: queue depth; 1 for latency, 32–64 for peak throughput
    • --numjobs: parallel workers; equal to vCPUs to maximize load
    • --size: should exceed RAM to defeat caching
    • --runtime with --time_based: caps each test at a fixed duration

    Test 1: 4K Random Read IOPS

    The most important metric for database, web, and general application workloads.

    4K random read
    fio --name=randread4k \
        --filename=/tmp/fio-test \
        --rw=randread \
        --bs=4k \
        --size=4G \
        --numjobs=1 \
        --iodepth=32 \
        --direct=1 \
        --ioengine=io_uring \
        --runtime=60 \
        --time_based \
        --group_reporting

    Test 2: 4K Random Write IOPS

    4K random write
    fio --name=randwrite4k \
        --filename=/tmp/fio-test \
        --rw=randwrite \
        --bs=4k \
        --size=4G \
        --numjobs=1 \
        --iodepth=32 \
        --direct=1 \
        --ioengine=io_uring \
        --runtime=60 \
        --time_based \
        --group_reporting

    Test 3: Mixed 70/30 Random Read/Write

    Approximates a typical database workload:

    Mixed 70/30
    fio --name=mixedrw \
        --filename=/tmp/fio-test \
        --rw=randrw \
        --rwmixread=70 \
        --bs=4k \
        --size=4G \
        --numjobs=4 \
        --iodepth=32 \
        --direct=1 \
        --ioengine=io_uring \
        --runtime=60 \
        --time_based \
        --group_reporting

    Test 4: Sequential Throughput

    Useful for large file workloads, backups, and streaming:

    Sequential 1M read
    fio --name=seqread1m \
        --filename=/tmp/fio-test \
        --rw=read \
        --bs=1M \
        --size=4G \
        --numjobs=1 \
        --iodepth=8 \
        --direct=1 \
        --ioengine=io_uring \
        --runtime=60 \
        --time_based \
        --group_reporting

    Test 5: Single-Queue Latency

    Critical for understanding interactive responsiveness:

    Latency probe
    fio --name=latency \
        --filename=/tmp/fio-test \
        --rw=randread \
        --bs=4k \
        --size=4G \
        --numjobs=1 \
        --iodepth=1 \
        --direct=1 \
        --ioengine=io_uring \
        --runtime=30 \
        --time_based

    Clean up

    Remove test file
    rm /tmp/fio-test

    Reading fio Output

    • IOPS: operations per second, higher is better
    • BW: throughput in MiB/s
    • clat: per-operation completion latency; review average and 99th percentile
    • clat 99.99: tail latency, important for user-facing applications

    Reference numbers: Modern NVMe-backed VPS storage should produce 20,000 to 100,000+ IOPS for 4K random reads at queue depth 32, with sub-millisecond average latency. SATA SSD-backed instances typically land in the 5,000 to 20,000 IOPS range. If random write IOPS are dramatically lower than reads, the underlying storage may use a write-through cache or be experiencing host-level write amplification.

    8. Stress Testing with stress-ng

    While sysbench and fio measure peak performance, stress-ng pushes the system to its limits to expose stability issues, thermal throttling on the host, and noisy neighbor effects under sustained load.

    CPU stress test

    CPU stress
    stress-ng --cpu $(nproc) --cpu-method matrixprod --metrics-brief --timeout 300s

    Memory stress test

    Caution: Be careful with allocation percentages; over-allocating will trigger the OOM killer.

    Memory stress
    stress-ng --vm 2 --vm-bytes 75% --vm-method all --verify --metrics-brief --timeout 300s

    Disk I/O stress test

    Disk stress
    stress-ng --hdd 2 --hdd-bytes 1G --metrics-brief --timeout 300s

    Combined system stress

    The canonical "everything at once" test:

    Combined stress
    stress-ng \
      --cpu $(nproc) \
      --io 2 \
      --vm 1 \
      --vm-bytes 25% \
      --hdd 1 \
      --hdd-bytes 1G \
      --timeout 600s \
      --metrics-brief

    While stress-ng is running, open a second SSH session and monitor the system:

    Live monitoring
    top
    vmstat 1
    iostat -xz 1

    Watch for CPU steal time (%st in top), which indicates the hypervisor is preempting your vCPUs to schedule other tenants. See Basic Resource Monitoring for more on these tools.

    9. Interpreting Steal Time

    CPU steal time is one of the most useful indicators of host contention on a shared VPS. Brief spikes during system events are normal. Consistent steal time above 5 percent during a sustained sysbench CPU run typically points to neighbor contention. If you see sustained high steal time, especially during off-peak hours when your own workload is quiet, that warrants a support ticket so the provider can investigate the host or migrate your instance. The companion guide Noisy Neighbor Symptoms vs. Real Performance Issues walks through the full diagnostic workflow.

    10. Putting It All Together: A Quick Benchmark Script

    Save this as quickbench.sh for repeatable baseline captures:

    quickbench.sh
    #!/bin/bash
    set -e
    LOG="bench-$(date +%Y%m%d-%H%M%S).log"
    exec > >(tee -a "$LOG") 2>&1
    
    echo "=== System Info ==="
    lscpu | grep -E "Model name|^CPU\(s\)|Thread|MHz"
    free -h
    df -h /
    
    echo "=== sysbench CPU ==="
    sysbench cpu --cpu-max-prime=20000 --threads=$(nproc) --time=60 run
    
    echo "=== sysbench memory ==="
    sysbench memory --memory-block-size=1M --memory-total-size=10G run
    
    echo "=== fio 4K random read ==="
    fio --name=rr4k --filename=/tmp/fio-test --rw=randread --bs=4k \
        --size=4G --numjobs=1 --iodepth=32 --direct=1 \
        --ioengine=io_uring --runtime=30 --time_based --group_reporting
    
    echo "=== fio 4K random write ==="
    fio --name=rw4k --filename=/tmp/fio-test --rw=randwrite --bs=4k \
        --size=4G --numjobs=1 --iodepth=32 --direct=1 \
        --ioengine=io_uring --runtime=30 --time_based --group_reporting
    
    rm -f /tmp/fio-test
    echo "=== Done. Log saved to: $LOG ==="

    Make it executable and run:

    Run the script
    chmod +x quickbench.sh
    ./quickbench.sh

    The script logs results to a timestamped file so you can archive baselines for later comparison.

    11. Common Pitfalls

    1. Testing with caching enabled: Always pass --direct=1 to fio, or you are benchmarking RAM speed instead of the storage device.
    2. Insufficient runtime: Anything under 30 seconds is too short to capture steady-state behavior. Queue dynamics, thermal effects, and write throttling all need time to stabilize.
    3. Test data smaller than RAM: fio test sizes must exceed system memory to avoid page cache amplification of read numbers.
    4. Single test runs: Always run benchmarks three or more times and evaluate consistency, not just peak values.
    5. Benchmarking during deployment: Run on a clean, idle system, never while configuration management, package updates, or backups are running.
    6. Ignoring filesystem overhead: For the most accurate disk numbers, benchmark against a raw block device when possible — see Choosing a Filesystem.

    12. Comparing Across Providers or Plans

    When comparing benchmarks across providers, use identical command parameters and similar instance sizes. Document everything:

    • vCPU count and model (lscpu)
    • Total RAM (free -h)
    • Advertised storage type (NVMe, SATA SSD, network block storage)
    • Geographic region or datacenter
    • Test timestamps, since load varies by time of day
    • Kernel version (uname -r) and OS release (cat /etc/os-release)

    A spreadsheet with rows per provider and columns for each metric makes the comparison auditable when you revisit it months later.

    13. Next Steps

    After establishing baseline numbers, save them alongside your instance documentation. Re-run the same suite periodically and compare. Significant drift, particularly in random I/O latency or sustained CPU throughput, can signal host hardware issues or increased neighbor density and may justify opening a ticket or migrating to a different node.

    For workload-specific benchmarking, consider following up with:

    • pgbench for PostgreSQL workloads
    • sysbench oltp for MySQL or MariaDB
    • wrk or k6 for HTTP application performance
    • iperf3 for network throughput between instances
    • redis-benchmark for Redis workloads

    Synthetic benchmarks give you a baseline. Application-level benchmarks tell you whether the instance will actually handle what you are about to put on it. Run both.

    Related Reading