Back to Deployment Guides

    Deploy Ollama on Your RamNode VPS

    Run large language models locally with CPU-optimized configuration. Perfect for development, testing, and moderate production workloads.

    30-45 min
    Setup Time
    8GB+ RAM
    Recommended
    Intermediate
    Difficulty
    Port 11434
    Default Port

    Overview

    Ollama is a powerful framework for running large language models locally. While traditionally GPU-accelerated, it can run effectively on CPU-only systems with proper configuration. This guide covers deploying Ollama on RamNode VPS infrastructure, optimized for CPU performance.

    Why No GPU Plans?

    RamNode currently focuses on CPU-based VPS hosting without GPU acceleration. However, modern CPUs can effectively run smaller language models with proper optimization:

    Advantages

    • • Cost-effective
    • • Reliable
    • • Easier deployment

    Trade-offs

    • • Slower inference
    • • Limited to smaller models

    Sweet Spot

    • • 7B parameter models
    • • 16GB+ RAM systems

    System Requirements Analysis

    Memory Requirements by Model Size

    Model SizeMemory Requirement
    3B Models (Phi-2, Llama2-3B)~4GB RAM
    7B Models (Llama2, Mistral-7B)~8GB RAM
    13B Models (Llama2-13B)~16GB RAM
    20B+ Models~32GB+ RAM

    CPU Optimization Factors

    • Thread Count: Ollama scales with CPU cores (4-8 cores optimal)
    • Cache Size: Larger L3 cache improves model loading
    • Memory Bandwidth: Critical for model inference speed
    • Architecture: x86_64 with AVX2 support (standard on RamNode)

    Installation Process

    Step 1: Server Preparation

    # Update system packages
    sudo apt update && sudo apt upgrade -y
    # Install essential packages
    sudo apt install -y curl wget git htop iotop unzip
    # Install performance monitoring tools
    sudo apt install -y sysstat nethogs
    # Optimize system for memory-intensive workloads
    echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
    echo 'vm.vfs_cache_pressure=50' | sudo tee -a /etc/sysctl.conf
    sudo sysctl -p

    Step 2: Install Ollama

    # Download and install Ollama
    curl -fsSL https://ollama.ai/install.sh | sh
    # Verify installation
    ollama --version
    # Check service status
    sudo systemctl status ollama
    # Enable auto-start
    sudo systemctl enable ollama

    Step 3: Configure Ollama Service

    # Create service override directory sudo mkdir -p /etc/systemd/system/ollama.service.d # Create optimization config sudo tee /etc/systemd/system/ollama.service.d/override.conf << EOF [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_NUM_PARALLEL=1" Environment="OLLAMA_MAX_LOADED_MODELS=1" Environment="OLLAMA_FLASH_ATTENTION=1" Environment="OLLAMA_LLM_LIBRARY=cpu" ExecStart= ExecStart=/usr/local/bin/ollama serve EOF # Reload systemd and restart sudo systemctl daemon-reload sudo systemctl restart ollama

    Step 4: Firewall and Security Setup

    # Install and configure UFW
    sudo apt install -y ufw
    # Allow SSH (adjust port if needed)
    sudo ufw allow 22/tcp
    # Allow Ollama API (default port 11434)
    sudo ufw allow 11434/tcp
    # Enable firewall
    sudo ufw --force enable
    # Check status
    sudo ufw status

    Model Deployment

    Small Models

    4-8GB RAM systems

    # Phi-2 (2.7B) - General tasks
    ollama pull phi
    # TinyLlama (1.1B) - Ultra-light
    ollama pull tinyllama
    # Mistral 3B
    ollama pull mistral:3b

    Medium Models

    8-16GB RAM systems

    # Llama 2 7B - Standard
    ollama pull llama2
    # Mistral 7B - High perf
    ollama pull mistral
    # CodeLlama 7B - Code gen
    ollama pull codellama

    Large Models

    16GB+ RAM systems

    # Llama 2 13B - Enhanced
    ollama pull llama2:13b
    # CodeLlama 13B
    ollama pull codellama:13b
    # Mistral OpenOrca
    ollama pull mistral:openorca

    Model Management Commands

    # List available models
    ollama list
    # Show model information
    ollama show llama2
    # Remove model to free space
    ollama rm tinyllama
    # Update model to latest version
    ollama pull llama2:latest
    # Check model sizes
    du -sh ~/.ollama/models/blobs/*

    Performance Optimization

    CPU Optimization

    # Set CPU governor to performance
    echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    # Install and configure cpufrequtils
    sudo apt install -y cpufrequtils
    echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils
    sudo systemctl restart cpufrequtils

    Memory Optimization - Create Swap File

    # Create 8GB swap file (adjust based on RAM)
    sudo fallocate -l 8G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfile
    # Make permanent
    echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
    # Verify
    free -h

    Disk I/O Optimization

    # Optimize for SSD (if applicable)
    sudo systemctl enable fstrim.timer
    # Mount options for performance
    # Add to /etc/fstab: noatime,discard for SSD partitions

    Network Configuration

    Remote Access Setup

    # Edit systemd service for external access sudo tee /etc/systemd/system/ollama.service.d/network.conf << EOF [Service] Environment="OLLAMA_HOST=0.0.0.0:11434" Environment="OLLAMA_ORIGINS=*" EOF sudo systemctl daemon-reload sudo systemctl restart ollama

    Reverse Proxy with Nginx (Optional)

    # Install Nginx sudo apt install -y nginx # Create Ollama proxy config sudo tee /etc/nginx/sites-available/ollama << EOF server { listen 80; server_name your-domain.com; location / { proxy_pass http://127.0.0.1:11434; proxy_set_header Host \$host; proxy_set_header X-Real-IP \$remote_addr; proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto \$scheme; # Handle streaming responses proxy_buffering off; proxy_cache off; } } EOF # Enable site sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/ sudo rm /etc/nginx/sites-enabled/default sudo nginx -t sudo systemctl restart nginx

    Testing and Validation

    Basic Functionality Test

    # Test local API curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false }' # Test chat interface ollama run llama2 "Explain containers in simple terms"

    Performance Benchmarking Script

    cat > benchmark.sh << 'EOF' #!/bin/bash echo "Starting Ollama performance benchmark..." prompts=( "Hello world" "Write a short story about a robot learning to paint" "Explain quantum computing in detail with examples" ) for i in "${!prompts[@]}"; do echo "Test $((i+1)): ${prompts[i]:0:30}..." start_time=$(date +%s) response=$(curl -s -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d "{\"model\": \"llama2\", \"prompt\": \"${prompts[i]}\", \"stream\": false}") end_time=$(date +%s) duration=$((end_time - start_time)) echo "Duration: ${duration} seconds" echo "---" done EOF chmod +x benchmark.sh ./benchmark.sh

    Monitoring and Maintenance

    System Monitoring Script

    cat > monitor_ollama.sh << 'EOF' #!/bin/bash echo "=== Ollama System Status ===" echo "Service Status:" systemctl status ollama --no-pager -l echo -e "\n=== Resource Usage ===" echo "Memory:" free -h | grep -E "Mem|Swap" echo -e "\nCPU Usage:" top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1 echo -e "\nDisk Usage:" df -h / | tail -1 echo -e "\n=== Ollama Models ===" /usr/local/bin/ollama list echo -e "\n=== Active Connections ===" ss -tuln | grep :11434 EOF chmod +x monitor_ollama.sh

    Log Management

    # Set up log rotation for Ollama sudo tee /etc/logrotate.d/ollama << EOF /var/log/ollama.log { daily missingok rotate 7 compress delaycompress copytruncate notifempty } EOF # View Ollama logs journalctl -u ollama -n 100 --no-pager

    Troubleshooting

    Out of Memory Errors

    # Check memory usage
    cat /proc/meminfo | grep -E "MemTotal|MemFree|MemAvailable"
    # Monitor memory during model loading
    watch -n 1 'free -h && ps aux | grep ollama'
    # Solution: Add swap or use smaller model
    sudo swapon --show

    Slow Response Times

    # Check CPU usage
    htop
    # Monitor I/O wait
    iostat -x 1
    # Check for CPU throttling
    cat /proc/cpuinfo | grep MHz
    # Solution: CPU governor optimization
    echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

    Service Won't Start

    # Check service logs
    journalctl -u ollama -f
    # Check port conflicts
    sudo netstat -tulpn | grep :11434
    # Reset service
    sudo systemctl stop ollama
    sudo systemctl reset-failed ollama
    sudo systemctl start ollama

    Model Loading Failures

    # Check available space
    df -h ~/.ollama/
    # Verify model integrity
    ollama list
    ollama show model_name
    # Re-download if corrupted
    ollama rm model_name
    ollama pull model_name

    Security Hardening

    Basic Security Measures

    # Create dedicated ollama user sudo adduser --system --no-create-home ollama # Secure the service sudo tee /etc/systemd/system/ollama.service.d/security.conf << EOF [Service] User=ollama Group=ollama NoNewPrivileges=yes PrivateTmp=yes ProtectSystem=strict ProtectHome=yes ReadWritePaths=/home/ollama EOF # Create ollama home directory with proper permissions sudo mkdir -p /home/ollama/.ollama sudo chown -R ollama:ollama /home/ollama

    API Rate Limiting with Nginx

    # Add rate limiting to nginx config # Add inside server block limit_req_zone $binary_remote_addr zone=ollama:10m rate=10r/m; location / { limit_req zone=ollama burst=5 nodelay; # ... existing proxy configuration }

    Backup and Recovery

    Backup Script

    cat > backup_ollama.sh << 'EOF' #!/bin/bash BACKUP_DIR="/backup/ollama-$(date +%Y%m%d)" OLLAMA_DIR="/home/ollama/.ollama" echo "Creating Ollama backup..." mkdir -p "$BACKUP_DIR" # Backup models and configurations tar -czf "$BACKUP_DIR/ollama-models.tar.gz" "$OLLAMA_DIR/models" cp /etc/systemd/system/ollama.service.d/*.conf "$BACKUP_DIR/" 2>/dev/null || true # Backup model list ollama list > "$BACKUP_DIR/model-list.txt" echo "Backup completed: $BACKUP_DIR" du -sh "$BACKUP_DIR" EOF chmod +x backup_ollama.sh

    Recovery Script

    cat > restore_ollama.sh << 'EOF' #!/bin/bash BACKUP_DIR="$1" if [ -z "$BACKUP_DIR" ]; then echo "Usage: $0 <backup_directory>" exit 1 fi echo "Restoring Ollama from $BACKUP_DIR..." # Stop service sudo systemctl stop ollama # Restore models tar -xzf "$BACKUP_DIR/ollama-models.tar.gz" -C / # Restore configurations sudo cp "$BACKUP_DIR"/*.conf /etc/systemd/system/ollama.service.d/ # Reload and restart sudo systemctl daemon-reload sudo systemctl start ollama echo "Restore completed" EOF chmod +x restore_ollama.sh

    Advanced Configuration

    Code Generation Optimization

    # Optimize for code generation sudo tee /etc/systemd/system/ollama.service.d/code-optimization.conf << EOF [Service] Environment="OLLAMA_NUM_PARALLEL=1" Environment="OLLAMA_MAX_LOADED_MODELS=1" Environment="OLLAMA_CONTEXT_SIZE=4096" EOF # Recommended models for coding ollama pull codellama:7b ollama pull deepseek-coder:6.7b

    Chat/Assistant Optimization

    # Optimize for conversational AI sudo tee /etc/systemd/system/ollama.service.d/chat-optimization.conf << EOF [Service] Environment="OLLAMA_CONTEXT_SIZE=2048" Environment="OLLAMA_BATCH_SIZE=512" EOF # Recommended chat models ollama pull mistral:7b ollama pull neural-chat:7b

    Multi-Model Setup

    # Configure for multiple model support sudo tee /etc/systemd/system/ollama.service.d/multi-model.conf << EOF [Service] Environment="OLLAMA_MAX_LOADED_MODELS=2" Environment="OLLAMA_PARALLEL_MODELS=2" EOF

    Key Takeaways

    • 16GB RAM minimum recommended for production 7B models
    • CPU optimization critical for acceptable performance
    • Model selection important - match model size to available resources
    • Monitoring essential for maintaining optimal performance
    • Security hardening required for production deployments