Back to Cloud VPS Documentation
    Troubleshooting

    Diagnosing and Fixing Disk Space Issues

    A systematic process for finding what is eating your disk, identifying the usual suspects, and safely reclaiming space — without breaking your running services.

    Part 1: Getting Your Bearings with df

    Before you start hunting, you need a high-level view of what is mounted and how full each filesystem is.

    Basic disk usage overview

    Check disk usage
    df -h

    The -h flag gives human-readable output. The column to watch is Use%. Anything above 85% warrants attention; above 95% is urgent.

    Example output
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/vda1        40G   38G  1.6G  96% /
    tmpfs           2.0G  512M  1.5G  26% /dev/shm
    /dev/vda2       100G   42G   58G  43% /var/lib/docker

    Check inodes too

    You can run out of inodes before you run out of bytes. This happens when a directory contains millions of tiny files — common with mail spools, session storage, or misconfigured PHP applications.

    Check inode usage
    df -i

    If IUse% is near 100% on a filesystem that shows plenty of free space, inode exhaustion is your problem.

    Part 2: Drilling Down with du

    du (disk usage) reports actual space consumed by files and directories. It's your primary tool for narrowing down bloat.

    Find top-level hogs

    Largest directories under /
    du -h --max-depth=1 / 2>/dev/null | sort -rh | head -20

    Drill down recursively

    Once you identify a large directory, keep going one level at a time:

    Drill into /var
    du -h --max-depth=1 /var | sort -rh | head -20

    Find individual large files

    Files larger than 500MB
    find / -xdev -type f -size +500M 2>/dev/null | sort -k5 -rn

    The -xdev flag prevents crossing filesystem boundaries, avoiding Docker overlays and network mounts.

    Size-sorted with human-readable output

    Top large files in /var
    find /var -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -20 | awk '{printf "%.1fM %s\n", $1/1048576, $2}'

    Count files in a directory (inode diagnosis)

    Count files
    find /var/spool/clientmqueue -type f | wc -l

    If this returns millions of files, you've found your inode problem.

    Part 3: Interactive Exploration with ncdu

    ncdu is a terminal-based, interactive disk usage browser that makes drill-down much faster than repeated du invocations.

    Install ncdu
    # Debian/Ubuntu
    apt install ncdu
    
    # RHEL/CentOS/AlmaLinux
    dnf install ncdu
    
    # Or download a static binary (no dependencies)
    curl -Lo ncdu.tar.gz https://dev.yorhel.nl/download/ncdu-2.7-linux-x86_64.tar.gz
    tar -xf ncdu.tar.gz
    mv ncdu /usr/local/bin/
    Scan and browse
    ncdu /

    Navigation

    • Arrow keys or j/k to move
    • Enter to enter a directory
    • q to go back / quit
    • d to delete (asks confirmation)
    • e to show hidden files

    Scan to file for later analysis

    Scan once, browse later
    ncdu -o /tmp/disk-scan.json /
    # Later:
    ncdu -f /tmp/disk-scan.json
    Scan remote server over SSH
    ssh user@server 'ncdu -o- /' | ncdu -f-

    Usual Suspects: Log Files

    Unrotated or misconfigured logs are the single most common cause of surprise disk consumption on VPS instances.

    Find the largest log files
    find /var/log -type f -name "*.log" -printf '%s %p\n' | sort -rn | head -20 | awk '{printf "%.1fM %s\n", $1/1048576, $2}'
    Check for uncompressed rotated logs
    find /var/log -type f -name "*.log.*" ! -name "*.gz" | head -20

    Common culprits

    • /var/log/auth.log — brute force SSH attempts can bloat to gigabytes
    • /var/log/nginx/access.log — high-traffic sites with no rotation
    • /var/log/mysql/error.log
    • Application logs in /var/www or custom paths

    Safely truncate a live log file

    Important: Do not rm a log file while a process has it open — the process will keep writing to the deleted inode and space will not be reclaimed.

    Truncate without closing file descriptor
    # Truncate to zero
    truncate -s 0 /var/log/nginx/access.log
    
    # Or redirect an empty string (same effect)
    > /var/log/nginx/access.log

    Fix the root cause with logrotate

    Check and test logrotate
    # Check logrotate config
    cat /etc/logrotate.conf
    ls /etc/logrotate.d/
    
    # Test a specific config without rotating
    logrotate --debug /etc/logrotate.d/nginx
    
    # Force immediate rotation
    logrotate --force /etc/logrotate.d/nginx
    Example logrotate config for Nginx
    /var/log/nginx/*.log {
        daily
        missingok
        rotate 7
        compress
        delaycompress
        notifempty
        create 0640 www-data adm
        sharedscripts
        postrotate
            nginx -s reopen
        endscript
    }

    Usual Suspects: Old Kernel Packages

    On Debian/Ubuntu, kernel updates leave old kernels installed. Each takes 150–300 MB. A system running for a year or two can accumulate a gigabyte or more.

    Debian/Ubuntu kernel cleanup
    # List all installed kernels
    dpkg --list | grep -E 'linux-image|linux-headers' | awk '{print $2, $3}'
    
    # See which kernel is currently running
    uname -r
    
    # Automated removal of kernels not needed for the current boot
    apt autoremove --purge
    RHEL-based kernel cleanup
    # List installed kernels
    rpm -q kernel
    
    # Keep only the 2 most recent kernels
    dnf remove $(dnf repoquery --installonly --latest-limit=-2 -q)

    Before removing kernels: Make sure your system boots successfully on the current kernel. Verify you can access the rescue console in case a reboot is needed.

    Usual Suspects: Package Manager Caches

    Package managers cache downloaded packages locally. On an active system, these caches can grow to several gigabytes.

    Debian/Ubuntu (APT)

    Clean APT cache
    # Check cache size
    du -sh /var/cache/apt/archives/
    
    # Remove only obsolete packages
    apt autoclean
    
    # Remove all cached packages (safe — they can be re-downloaded)
    apt clean

    RHEL/CentOS/AlmaLinux (DNF/YUM)

    Clean DNF cache
    # Check cache size
    du -sh /var/cache/dnf/
    
    # Clean all cached data
    dnf clean all

    Language-specific caches

    Python, npm, Composer
    # Python pip
    pip cache info
    pip cache purge
    
    # npm
    npm cache verify
    npm cache clean --force
    
    # Composer (PHP)
    composer clear-cache
    du -sh ~/.cache/composer/

    Usual Suspects: Orphaned Docker Data

    Docker is one of the most aggressive accumulators of silent disk usage. Stopped containers, dangling images, unused volumes, and stale build cache can quietly consume tens of gigabytes.

    Docker disk usage overview
    # Get a complete overview
    docker system df
    
    # Verbose breakdown per object
    docker system df -v

    Targeted cleanup

    Prune Docker resources
    # Prune dangling images only (safe default)
    docker image prune
    
    # Prune ALL unused images (including tagged but not running)
    docker image prune -a
    
    # Remove stopped containers
    docker container prune
    
    # Remove unused volumes (review first!)
    docker volume ls
    docker volume prune
    
    # Remove unused networks
    docker network prune
    
    # Clean build cache
    docker builder prune
    docker builder prune --keep-storage 1GB

    Nuclear option: docker system prune -a --volumes removes everything Docker is not actively using. Do not run this on a system with intentionally stopped containers or databases stored in Docker volumes without external backups.

    Usual Suspects: Temp Files & Core Dumps

    Temporary files

    Clean temp files
    # Check temp directory sizes
    du -sh /tmp /var/tmp
    
    # Find old files in /tmp (not modified in over 7 days)
    find /tmp -type f -mtime +7 -delete
    
    # systemd-tmpfiles manages /tmp cleanup on systemd systems
    systemd-tmpfiles --clean

    Core dumps

    Application crashes leave core dump files that can be enormous.

    Find and clean core dumps
    # Find core dumps
    find / -xdev -name "core" -o -name "core.[0-9]*" 2>/dev/null | head -20
    
    # Check systemd's coredump storage
    ls -lh /var/lib/systemd/coredump/
    coredumpctl list
    
    # Remove all stored coredumps
    rm -rf /var/lib/systemd/coredump/*

    Usual Suspects: Mail Spool

    Undelivered or bounced mail can accumulate in the mail spool, consuming both space and inodes.

    Inspect and clean mail spool
    # Check postfix queue sizes
    postqueue -p | tail -1
    
    # Check the mail spool directory
    du -sh /var/spool/postfix/deferred /var/spool/postfix/active /var/spool/postfix/bounce
    
    # Flush the deferred queue (retry all)
    postqueue -f
    
    # Delete all messages in the deferred queue (be careful)
    postsuper -d ALL deferred

    Part 5: Reclaiming Space Safely

    Verify space was actually freed

    Deleted files may not immediately show as free space if a running process still has them open. This is common with log files that were deleted instead of truncated.

    Find deleted files still held open
    # Find deleted files still held open by running processes
    lsof +L1 | grep deleted

    If restarting the process isn't an option:

    Truncate via /proc
    # Get the file descriptor path from lsof output (PID and FD number)
    # Then truncate via /proc
    truncate -s 0 /proc/<PID>/fd/<FD>

    Check filesystem free blocks

    Verify filesystem state
    # ext4 filesystems
    tune2fs -l /dev/vda1 | grep -E "Block count|Free blocks|Block size"
    
    # XFS filesystems
    xfs_info /dev/vda1

    Set Up Monitoring to Prevent Recurrence

    /usr/local/bin/disk-check.sh
    #!/bin/bash
    THRESHOLD=85
    ALERT_EMAIL="you@example.com"
    
    df -h | awk 'NR>1' | while read line; do
        USAGE=$(echo "$line" | awk '{print $5}' | tr -d '%')
        MOUNT=$(echo "$line" | awk '{print $6}')
        if [ "$USAGE" -gt "$THRESHOLD" ]; then
            echo "DISK ALERT: $MOUNT is at ${USAGE}% on $(hostname)" | \
                mail -s "Disk Space Warning" "$ALERT_EMAIL"
        fi
    done
    Add to cron
    # Check every 30 minutes
    */30 * * * * /usr/local/bin/disk-check.sh

    For a more complete solution, consider integrating with your existing monitoring stack. Prometheus with node_exporter exposes disk metrics out of the box. If you're running Grafana, set threshold alerts on node_filesystem_avail_bytes.

    Quick Reference: Space Recovery Cheatsheet

    ActionCommandRisk
    Truncate a live logtruncate -s 0 /path/to/app.log
    Safe
    Force log rotationlogrotate --force /etc/logrotate.d/app
    Safe
    Clean APT cacheapt clean
    Safe
    Clean DNF cachednf clean all
    Safe
    Remove old kernelsapt autoremove --purge
    Low
    Prune dangling Docker imagesdocker image prune
    Safe
    Prune all unused Docker imagesdocker image prune -a
    Medium
    Prune Docker volumesdocker volume prune
    High
    Full Docker system prunedocker system prune -a --volumes
    High
    Delete deferred mail queuepostsuper -d ALL deferred
    Medium
    Clear pip cachepip cache purge
    Safe
    Clear npm cachenpm cache clean --force
    Safe

    Rule of thumb: Always run df -h before and after each cleanup step to confirm space was reclaimed. Start with safe operations and work toward impactful ones only if necessary.