Applies to: All RamNode VPS Plans | Ubuntu / Debian / CentOS / AlmaLinux | Rev. 2025
High CPU usage is one of the most common performance complaints on a VPS. The root cause can range from a legitimate spike in traffic to a runaway process, a poorly optimized script, or a compromised server silently mining cryptocurrency. This guide walks through a structured diagnostic process: how to spot the problem, understand what the numbers mean, and decide on the appropriate response.
1. Using top and htop to Identify Offending Processes
top — The Universal Starting Point
The top command is available on every Linux system without installation. Launch it with:
topKey columns to focus on:
| Column | What It Tells You |
|---|---|
| %CPU | CPU percentage consumed by this process. Can exceed 100% on multi-core systems (200% = 2 full cores) |
| %MEM | Percentage of physical RAM in use |
| PID | Process ID — required for kill commands and deeper investigation |
| USER | The account running the process. www-data or nobody often indicates a web process; root may indicate a system task |
| COMMAND | The executable name. Use e field or press c to toggle the full command path |
| TIME+ | Cumulative CPU time consumed since the process started — useful for spotting runaway processes |
Useful top shortcuts: Press P to sort by CPU. Press M to sort by memory. Press k to kill a process by PID. Press 1 to expand per-core CPU view.
htop — An Interactive Alternative
htop provides a more readable, color-coded interface with mouse support. Install it if not already present:
apt install htop -ydnf install htop -yhtophtop advantages over top:
- Horizontal bars at the top show per-core utilization at a glance — immediately reveals whether one core is pegged vs. all cores under load
- Scroll through the process list and kill processes without memorizing keyboard shortcuts
- F4 (Filter) narrows the list to a specific process name, e.g.,
php-fpmorpython3 - Tree view (F5) shows parent/child relationships, revealing which master process spawned multiple workers
TIP: On a fresh VPS you may not have htop installed. top is always available and is sufficient for initial triage.
2. Understanding Load Average vs. CPU Count
What Load Average Actually Means
The three load average numbers shown in top and /proc/loadavg represent the average number of runnable (or uninterruptible) tasks over the past 1 minute, 5 minutes, and 15 minutes respectively. A common misconception is that these numbers directly map to CPU percentage — they do not.
uptime
# Output: 14:32:11 up 22 days, load average: 2.41, 1.87, 1.55
cat /proc/loadavg
# Output: 2.41 1.87 1.55 3/312 18842Interpreting Load Relative to CPU Count
The key formula: a load average equal to the number of logical CPUs represents 100% utilization. A load higher than your CPU count means processes are waiting for CPU time.
nproc
# Or get more detail
lscpu | grep '^CPU(s):'| Scenario | Interpretation |
|---|---|
| Load = CPU count | System is fully utilized — acceptable if short-lived |
| Load < CPU count | Headroom exists — CPU is not the bottleneck |
| Load 2× CPU count | Significant pressure — processes queuing for CPU time |
| Load 4×+ CPU count | Severe overload — expect sluggish SSH, slow response times |
WARNING: A 1-VPS plan with 1 vCPU and a load average of 1.0 is sitting at 100% utilization. A load of 2.0 means half the tasks are waiting. This is why high load on small VPS plans causes noticeable degradation faster than on dedicated servers.
Reading the Trend
Compare all three numbers together. A load of 8.0 / 4.0 / 2.0 on a 4-core system is decreasing — the spike may be passing. A load of 1.5 / 3.0 / 4.5 is increasing — something is accumulating and requires immediate attention.
3. Distinguishing User, System, and I/O Wait
The CPU Breakdown in top
The summary line beginning with %Cpu(s) in top breaks CPU usage into several categories. Press 1 to expand to per-core view:
%Cpu(s): 45.2 us, 8.1 sy, 0.0 ni, 38.5 id, 7.9 wa, 0.0 hi, 0.2 si, 0.0 st| Field | What It Means |
|---|---|
| us (user) | CPU time spent running user-space code. High values point to application-level workloads: PHP, Python, Node.js, etc. |
| sy (system) | CPU time in kernel-space. High values suggest frequent system calls — disk I/O, network operations, or context switching |
| ni (nice) | CPU time for lower-priority user-space processes. Usually low. |
| id (idle) | Remaining free CPU. Subtract from 100 to get rough total utilization. |
| wa (iowait) | Time the CPU waited for I/O. High wa (above 20–30%) suggests a disk bottleneck rather than a true CPU problem. |
| hi (hw irq) | Hardware interrupt requests — usually near zero unless heavy network traffic. |
| si (sw irq) | Software interrupts. High values can indicate heavy network processing. |
| st (steal) | CPU time stolen by the hypervisor. Persistent steal above 5–10% suggests overselling. |
Diagnostic Decision Tree
- High us (user): Application code is the problem. Find which process via
top → Psort, then profile the application. - High sy (system): Kernel is doing heavy work. Check for excessive forks, context switching, or filesystem churn with
vmstat 1 5. - High wa (iowait): I/O is blocking. Use
iostat -x 1 5oriotopto find the disk-hungry process. Do not confuse with a CPU problem. - High st (steal): The hypervisor is throttling your VPS. Consider upgrading your plan or contacting support if steal is consistently elevated.
4. Common Culprits
Runaway PHP Processes
PHP-FPM worker processes are one of the most frequent CPU offenders on VPS stacks running WordPress, WooCommerce, or Drupal. A slow database query, an infinite loop in a plugin, or a traffic surge can cause workers to pile up.
# Count active PHP-FPM workers
ps aux | grep php-fpm | grep -v grep | wc -l
# See which PHP processes are consuming CPU
ps aux --sort=-%cpu | grep php | head -20
# Check PHP-FPM pool status (if status endpoint is enabled)
curl http://127.0.0.1/status?fullWarning signs specific to PHP:
- Dozens of php-fpm workers all stuck at the same CPU% with identical memory footprint
- Workers accumulating over time without dying (check TIME+ in top — values above several minutes are suspicious)
- Access logs showing a flood of POST requests to
/xmlrpc.phporwp-login.php— bot traffic triggering PHP execution
WARNING: A common PHP-FPM trap: setting pm.max_children too high consumes all available RAM, which causes the kernel to swap, which causes iowait to spike, making the system appear CPU-bound when it is actually memory-bound.
Stuck or Looping Cron Jobs
Cron jobs that fail to exit, are scheduled too frequently, or run longer than their interval can stack up and consume significant CPU.
# Check running cron-related processes
ps aux | grep -E '(cron|curl|wget|php|python|bash)' | grep -v grep
# View all user crontabs
for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l 2>/dev/null; done
# Check system-level cron jobs
cat /etc/crontab
ls /etc/cron.d/ /etc/cron.hourly/ /etc/cron.daily/Symptoms of a stuck cron job:
- Multiple instances of the same script in
psoutput with increasing PID numbers - Process has a high TIME+ value relative to how long it should legitimately run
- CPU spikes occur on a predictable schedule correlating with a cron entry
Cryptocurrency Miners from Compromised Servers
Cryptocurrency miners are frequently deployed on compromised servers via vulnerabilities in web applications, exposed Docker APIs, or weak SSH credentials. They typically manifest as sustained 80–100% CPU usage.
# Look for known miner process names
ps aux | grep -iE '(xmrig|minerd|cpuminer|kworker|kthreadd)' | grep -v grep
# Check for processes with no associated file on disk (deleted binaries)
ls -la /proc/*/exe 2>/dev/null | grep deleted
# Unusual outbound network connections from high-CPU processes
ss -tulpn | grep -v '127.0.0.1'
netstat -antp | grep ESTABLISHED
# Check for recently modified or new binaries in common drop locations
find /tmp /var/tmp /dev/shm -type f -executable 2>/dev/null
# Inspect process binary path (replace PID)
ls -la /proc/PID/exe
cat /proc/PID/cmdline | tr '\0' ' 'DANGER: Miners often disguise themselves with names that resemble legitimate system processes such as kworker, sshd, or java. Always verify suspicious high-CPU processes by checking their actual binary path via /proc/PID/exe rather than trusting the COMMAND column alone.
Red flags that suggest a miner rather than a legitimate process:
- Process binary resolves to
/tmp,/dev/shm, or a hidden directory - CPU usage is consistently high (85–99%) across all cores, sustained over hours
- The process was started recently but the server has been running for weeks
- Active outbound connections on non-standard ports (TCP 3333, 4444, 5555, 7777, 14444 — common mining pool ports)
- No log entries or shell history explaining when or how the process started
- crontab,
/etc/rc.local, or systemd unit files contain entries pointing to the executable
5. When to Kill vs. When to Investigate
Decision Framework
The appropriate response depends on whether you understand what the process is and whether it is expected. Killing first and investigating later is reasonable in a production emergency — but investigation must still follow.
| Scenario | Indicators |
|---|---|
| Kill Immediately |
|
| Investigate First |
|
How to Kill a Process Safely
# Graceful termination (allows process to clean up) — try first
kill -15 PID
# Force kill (use if -15 has no effect after a few seconds)
kill -9 PID
# Kill all processes matching a name
pkill -9 processname
# Kill all PHP-FPM workers and let the master restart them
pkill -9 php-fpm && systemctl restart php8.1-fpmDANGER: Sending kill -9 to a database process (MySQL, PostgreSQL, Redis) without a graceful shutdown can corrupt data files or require crash recovery on the next start. Use systemctl stop servicename instead, which sends the correct signal sequence.
Post-Kill Investigation Steps
Whether the process was legitimate or malicious, document what happened and prevent recurrence:
- For PHP/application spikes: Review slow query logs (
/var/log/mysql/slow.log), PHP-FPM access logs, and application error logs in the timeframe surrounding the spike. - For stuck cron jobs: Add a lock mechanism using
flockto prevent concurrent execution, and set a maximum runtime withtimeoutprefix in the crontab entry. - For suspected miners/compromise: Do not just kill the process. Check for persistence mechanisms in crontab,
/etc/rc.local,/etc/cron.d/, and systemd unit files. Consider the server compromised until proven otherwise and review Fail2ban and auth.log for unauthorized access. - In all cases: Record the PID, binary path, user, and network connections before terminating if possible. Use
cp /proc/PID/exe /tmp/evidence.binto preserve the binary for analysis.
6. Quick Reference: Diagnostic Commands
| Command | Purpose |
|---|---|
| top -b -n 1 | Single-snapshot output to stdout — useful for logging |
| ps aux --sort=-%cpu | head | Top CPU-consuming processes sorted at point-in-time |
| htop | Interactive process viewer with per-core bars |
| uptime | Load averages and system uptime |
| nproc | Number of logical CPUs (denominator for load averages) |
| vmstat 1 5 | CPU breakdown, context switches, and memory per second for 5 samples |
| iostat -x 1 5 | Per-disk I/O stats to confirm or rule out iowait as root cause |
| iotop | Real-time per-process disk I/O — requires root |
| ss -tulpn | Open ports and associated processes |
| netstat -antp | All TCP connections with PIDs — identify miner pool connections |
| find /tmp /var/tmp -executable | Scan for executable files in world-writable directories |
| ls -la /proc/PID/exe | Resolve true binary path for a process |
| strace -p PID | Trace system calls of a live process |
| lsof -p PID | List all files and sockets open by a process |
Need more help?: Open a support ticket at my.ramnode.com or consult the RamNode knowledge base for additional guides on server hardening, PHP-FPM tuning, and incident response procedures.
