Deploy Ollama on Your RamNode VPS

Run large language models locally with CPU-optimized configuration. Perfect for development, testing, and moderate production workloads.

30-45 min

Setup Time

8GB+ RAM

Recommended

Intermediate

Difficulty

Port 11434

Default Port

Overview

Ollama is a powerful framework for running large language models locally. While traditionally GPU-accelerated, it can run effectively on CPU-only systems with proper configuration. This guide covers deploying Ollama on RamNode VPS infrastructure, optimized for CPU performance.

Why No GPU Plans?

RamNode currently focuses on CPU-based VPS hosting without GPU acceleration. However, modern CPUs can effectively run smaller language models with proper optimization:

Advantages

• Cost-effective
• Reliable
• Easier deployment

Trade-offs

• Slower inference
• Limited to smaller models

Sweet Spot

• 7B parameter models
• 16GB+ RAM systems

Recommended RamNode VPS Plans

Standard VPS 4GB-8GB (Testing/Development)

Small models (3B-7B), testing, development

• 4 vCPU Cores
• 4-8 GB RAM
• 120-180 GB SSD
• 8-10TB bandwidth

Response Time

60-180 seconds

for complex queries

$20-40/mo

RECOMMENDED

Standard VPS 8GB-16GB (Production)

Running 7B models (Llama 2, Mistral) with good performance

• 4-8 vCPU Cores
• 8-16 GB RAM
• 180-260 GB SSD
• 10-12TB bandwidth

Response Time

30-90 seconds

for complex queries

$40-80/mo

Premium VPS 16GB-32GB (Best Performance)

Multiple concurrent models, 13B models, production workloads

• 8-16 vCPU Cores
• 16-32 GB RAM
• 400-600 GB NVMe
• 12-14TB bandwidth

Response Time

15-60 seconds

for complex queries

$96-192/mo

View All VPS Plans

System Requirements Analysis

Memory Requirements by Model Size

Model Size	Memory Requirement
3B Models (Phi-2, Llama2-3B)	~4GB RAM
7B Models (Llama2, Mistral-7B)	~8GB RAM
13B Models (Llama2-13B)	~16GB RAM
20B+ Models	~32GB+ RAM

CPU Optimization Factors

Thread Count: Ollama scales with CPU cores (4-8 cores optimal)
Cache Size: Larger L3 cache improves model loading
Memory Bandwidth: Critical for model inference speed
Architecture: x86_64 with AVX2 support (standard on RamNode)

Installation Process

Step 1: Server Preparation

# Update system packages

sudo apt update && sudo apt upgrade -y

# Install essential packages

sudo apt install -y curl wget git htop iotop unzip

# Install performance monitoring tools

sudo apt install -y sysstat nethogs

# Optimize system for memory-intensive workloads

echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf

echo 'vm.vfs_cache_pressure=50' | sudo tee -a /etc/sysctl.conf

sudo sysctl -p

Step 2: Install Ollama

# Download and install Ollama

curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation

ollama --version

# Check service status

sudo systemctl status ollama

# Enable auto-start

sudo systemctl enable ollama

Step 3: Configure Ollama Service

# Create service override directory sudo mkdir -p /etc/systemd/system/ollama.service.d # Create optimization config sudo tee /etc/systemd/system/ollama.service.d/override.conf << EOF [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_NUM_PARALLEL=1" Environment="OLLAMA_MAX_LOADED_MODELS=1" Environment="OLLAMA_FLASH_ATTENTION=1" Environment="OLLAMA_LLM_LIBRARY=cpu" ExecStart= ExecStart=/usr/local/bin/ollama serve EOF # Reload systemd and restart sudo systemctl daemon-reload sudo systemctl restart ollama

Step 4: Firewall and Security Setup

# Install and configure UFW

sudo apt install -y ufw

# Allow SSH (adjust port if needed)

sudo ufw allow 22/tcp

# Allow Ollama API (default port 11434)

sudo ufw allow 11434/tcp

# Enable firewall

sudo ufw --force enable

# Check status

sudo ufw status

Model Deployment

Small Models

4-8GB RAM systems

# Phi-2 (2.7B) - General tasks

ollama pull phi

# TinyLlama (1.1B) - Ultra-light

ollama pull tinyllama

# Mistral 3B

ollama pull mistral:3b

Medium Models

8-16GB RAM systems

# Llama 2 7B - Standard

ollama pull llama2

# Mistral 7B - High perf

ollama pull mistral

# CodeLlama 7B - Code gen

ollama pull codellama

Large Models

16GB+ RAM systems

# Llama 2 13B - Enhanced

ollama pull llama2:13b

# CodeLlama 13B

ollama pull codellama:13b

# Mistral OpenOrca

ollama pull mistral:openorca

Model Management Commands

# List available models

ollama list

# Show model information

ollama show llama2

# Remove model to free space

ollama rm tinyllama

# Update model to latest version

ollama pull llama2:latest

# Check model sizes

du -sh ~/.ollama/models/blobs/*

Performance Optimization

CPU Optimization

# Set CPU governor to performance

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Install and configure cpufrequtils

sudo apt install -y cpufrequtils

echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils

sudo systemctl restart cpufrequtils

Memory Optimization - Create Swap File

# Create 8GB swap file (adjust based on RAM)

sudo fallocate -l 8G /swapfile

sudo chmod 600 /swapfile

sudo mkswap /swapfile

sudo swapon /swapfile

# Make permanent

echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

# Verify

free -h

Disk I/O Optimization

# Optimize for SSD (if applicable)

sudo systemctl enable fstrim.timer

# Mount options for performance

# Add to /etc/fstab: noatime,discard for SSD partitions

Network Configuration

Remote Access Setup

# Edit systemd service for external access sudo tee /etc/systemd/system/ollama.service.d/network.conf << EOF [Service] Environment="OLLAMA_HOST=0.0.0.0:11434" Environment="OLLAMA_ORIGINS=*" EOF sudo systemctl daemon-reload sudo systemctl restart ollama

Reverse Proxy with Nginx (Optional)

# Install Nginx sudo apt install -y nginx # Create Ollama proxy config sudo tee /etc/nginx/sites-available/ollama << EOF server { listen 80; server_name your-domain.com; location / { proxy_pass http://127.0.0.1:11434; proxy_set_header Host \$host; proxy_set_header X-Real-IP \$remote_addr; proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto \$scheme; # Handle streaming responses proxy_buffering off; proxy_cache off; } } EOF # Enable site sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/ sudo rm /etc/nginx/sites-enabled/default sudo nginx -t sudo systemctl restart nginx

Testing and Validation

Basic Functionality Test

# Test local API curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false }' # Test chat interface ollama run llama2 "Explain containers in simple terms"

Performance Benchmarking Script

cat > benchmark.sh << 'EOF' #!/bin/bash echo "Starting Ollama performance benchmark..." prompts=( "Hello world" "Write a short story about a robot learning to paint" "Explain quantum computing in detail with examples" ) for i in "${!prompts[@]}"; do echo "Test $((i+1)): ${prompts[i]:0:30}..." start_time=$(date +%s) response=$(curl -s -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d "{\"model\": \"llama2\", \"prompt\": \"${prompts[i]}\", \"stream\": false}") end_time=$(date +%s) duration=$((end_time - start_time)) echo "Duration: ${duration} seconds" echo "---" done EOF chmod +x benchmark.sh ./benchmark.sh

Monitoring and Maintenance

System Monitoring Script

cat > monitor_ollama.sh << 'EOF' #!/bin/bash echo "=== Ollama System Status ===" echo "Service Status:" systemctl status ollama --no-pager -l echo -e "\n=== Resource Usage ===" echo "Memory:" free -h | grep -E "Mem|Swap" echo -e "\nCPU Usage:" top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1 echo -e "\nDisk Usage:" df -h / | tail -1 echo -e "\n=== Ollama Models ===" /usr/local/bin/ollama list echo -e "\n=== Active Connections ===" ss -tuln | grep :11434 EOF chmod +x monitor_ollama.sh

Log Management

# Set up log rotation for Ollama sudo tee /etc/logrotate.d/ollama << EOF /var/log/ollama.log { daily missingok rotate 7 compress delaycompress copytruncate notifempty } EOF # View Ollama logs journalctl -u ollama -n 100 --no-pager

Troubleshooting

Out of Memory Errors

# Check memory usage

cat /proc/meminfo | grep -E "MemTotal|MemFree|MemAvailable"

# Monitor memory during model loading

watch -n 1 'free -h && ps aux | grep ollama'

# Solution: Add swap or use smaller model

sudo swapon --show

Slow Response Times

# Check CPU usage

htop

# Monitor I/O wait

iostat -x 1

# Check for CPU throttling

cat /proc/cpuinfo | grep MHz

# Solution: CPU governor optimization

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Service Won't Start

# Check service logs

journalctl -u ollama -f

# Check port conflicts

sudo netstat -tulpn | grep :11434

# Reset service

sudo systemctl stop ollama

sudo systemctl reset-failed ollama

sudo systemctl start ollama

Model Loading Failures

# Check available space

df -h ~/.ollama/

# Verify model integrity

ollama list

ollama show model_name

# Re-download if corrupted

ollama rm model_name

ollama pull model_name

Security Hardening

Basic Security Measures

# Create dedicated ollama user sudo adduser --system --no-create-home ollama # Secure the service sudo tee /etc/systemd/system/ollama.service.d/security.conf << EOF [Service] User=ollama Group=ollama NoNewPrivileges=yes PrivateTmp=yes ProtectSystem=strict ProtectHome=yes ReadWritePaths=/home/ollama EOF # Create ollama home directory with proper permissions sudo mkdir -p /home/ollama/.ollama sudo chown -R ollama:ollama /home/ollama

API Rate Limiting with Nginx

# Add rate limiting to nginx config # Add inside server block limit_req_zone $binary_remote_addr zone=ollama:10m rate=10r/m; location / { limit_req zone=ollama burst=5 nodelay; # ... existing proxy configuration }

Backup and Recovery

Backup Script

cat > backup_ollama.sh << 'EOF' #!/bin/bash BACKUP_DIR="/backup/ollama-$(date +%Y%m%d)" OLLAMA_DIR="/home/ollama/.ollama" echo "Creating Ollama backup..." mkdir -p "$BACKUP_DIR" # Backup models and configurations tar -czf "$BACKUP_DIR/ollama-models.tar.gz" "$OLLAMA_DIR/models" cp /etc/systemd/system/ollama.service.d/*.conf "$BACKUP_DIR/" 2>/dev/null || true # Backup model list ollama list > "$BACKUP_DIR/model-list.txt" echo "Backup completed: $BACKUP_DIR" du -sh "$BACKUP_DIR" EOF chmod +x backup_ollama.sh

Recovery Script

cat > restore_ollama.sh << 'EOF' #!/bin/bash BACKUP_DIR="$1" if [ -z "$BACKUP_DIR" ]; then echo "Usage: $0 <backup_directory>" exit 1 fi echo "Restoring Ollama from $BACKUP_DIR..." # Stop service sudo systemctl stop ollama # Restore models tar -xzf "$BACKUP_DIR/ollama-models.tar.gz" -C / # Restore configurations sudo cp "$BACKUP_DIR"/*.conf /etc/systemd/system/ollama.service.d/ # Reload and restart sudo systemctl daemon-reload sudo systemctl start ollama echo "Restore completed" EOF chmod +x restore_ollama.sh

Advanced Configuration

Code Generation Optimization

# Optimize for code generation sudo tee /etc/systemd/system/ollama.service.d/code-optimization.conf << EOF [Service] Environment="OLLAMA_NUM_PARALLEL=1" Environment="OLLAMA_MAX_LOADED_MODELS=1" Environment="OLLAMA_CONTEXT_SIZE=4096" EOF # Recommended models for coding ollama pull codellama:7b ollama pull deepseek-coder:6.7b

Chat/Assistant Optimization

# Optimize for conversational AI sudo tee /etc/systemd/system/ollama.service.d/chat-optimization.conf << EOF [Service] Environment="OLLAMA_CONTEXT_SIZE=2048" Environment="OLLAMA_BATCH_SIZE=512" EOF # Recommended chat models ollama pull mistral:7b ollama pull neural-chat:7b

Multi-Model Setup

# Configure for multiple model support sudo tee /etc/systemd/system/ollama.service.d/multi-model.conf << EOF [Service] Environment="OLLAMA_MAX_LOADED_MODELS=2" Environment="OLLAMA_PARALLEL_MODELS=2" EOF

Key Takeaways

16GB RAM minimum recommended for production 7B models
CPU optimization critical for acceptable performance
Model selection important - match model size to available resources
Monitoring essential for maintaining optimal performance
Security hardening required for production deployments

View VPS Plans More Deployment Guides