Database Guide

    ClickHouse OLAP Database

    Deploy a lightning-fast columnar database for real-time analytics on RamNode VPS. Process billions of rows in seconds with exceptional query performance.

    Ubuntu 22.04/24.04
    OLAP Analytics
    ⏱️ 30-45 minutes

    Prerequisites & VPS Selection

    ClickHouse is a high-performance, open-source columnar database management system designed for online analytical processing (OLAP). Developed by Yandex, it excels at processing analytical queries on massive datasets with exceptional speed.

    Before starting, ensure you have:

    • Ubuntu 22.04 LTS (recommended) or 24.04 LTS
    • Root or sudo access to your server
    • At least 50GB of available disk space (SSD recommended)
    • Basic familiarity with Linux command line

    Dev/Testing

    • • 4GB RAM
    • • 2 CPU cores
    • • 50GB SSD

    Small Prod

    • • 8GB RAM
    • • 4 CPU cores
    • • 100GB SSD

    Medium Prod

    • • 16GB RAM
    • • 6 CPU cores
    • • 200GB NVMe

    Large Prod

    • • 32GB+ RAM
    • • 8+ CPU cores
    • • 500GB+ NVMe
    2

    Initial Server Setup

    Connect to your VPS via SSH and update the system:

    Update System
    apt update && apt upgrade -y
    Install Essential Utilities
    apt install -y curl wget gnupg2 apt-transport-https ca-certificates software-properties-common
    Set Timezone (Optional)
    timedatectl set-timezone America/New_York
    3

    Install ClickHouse

    Add the official ClickHouse repository and install:

    Add ClickHouse GPG Key
    curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
    Add Repository
    echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | tee /etc/apt/sources.list.d/clickhouse.list
    Install ClickHouse
    apt update
    apt install -y clickhouse-server clickhouse-client

    💡 Tip: During installation, you'll be prompted to set a password for the default user. Choose a strong password and save it securely.

    Enable and Start Service
    systemctl enable clickhouse-server
    systemctl start clickhouse-server
    Verify Service Status
    systemctl status clickhouse-server
    4

    Basic Configuration

    Configure ClickHouse for network access and security. Configuration files are in /etc/clickhouse-server/.

    Configure Network Access

    By default, ClickHouse only listens on localhost. To allow remote connections:

    Create Network Configuration
    nano /etc/clickhouse-server/config.d/network.xml
    network.xml
    <clickhouse>
        <listen_host>::</listen_host>
        <listen_host>0.0.0.0</listen_host>
    </clickhouse>

    Set Default User Password

    Create Password Configuration
    nano /etc/clickhouse-server/users.d/default-password.xml

    For better security, use SHA256 hashed passwords:

    Generate Password Hash
    echo -n 'YOUR_STRONG_PASSWORD' | sha256sum
    default-password.xml
    <clickhouse>
        <users>
            <default>
                <password_sha256_hex>your_generated_hash</password_sha256_hex>
            </default>
        </users>
    </clickhouse>

    Configure Data Directory (Optional)

    Create Custom Data Directory
    mkdir -p /data/clickhouse
    chown clickhouse:clickhouse /data/clickhouse
    data-paths.xml
    <clickhouse>
        <path>/data/clickhouse/</path>
    </clickhouse>
    Apply Changes
    systemctl restart clickhouse-server
    5

    Testing the Installation

    Connect to ClickHouse and verify the installation:

    Connect to ClickHouse
    clickhouse-client --password
    Test Queries
    SELECT version();
    SELECT now();
    SHOW DATABASES;
    Expected Output
    ClickHouse client version X.X.X.X (official build).
    Connecting to localhost:9000 as user default.
    Connected to ClickHouse server version X.X.X.
    
    your-server :)
    Exit Client
    EXIT;
    6

    Firewall Configuration

    Secure your ClickHouse installation with UFW:

    Configure UFW
    apt install -y ufw
    
    # Set default policies
    ufw default deny incoming
    ufw default allow outgoing
    
    # Allow SSH first!
    ufw allow 22/tcp
    
    # Allow ClickHouse HTTP interface from specific IPs
    ufw allow from YOUR_IP_ADDRESS to any port 8123 proto tcp
    
    # Allow ClickHouse native protocol from specific IPs
    ufw allow from YOUR_IP_ADDRESS to any port 9000 proto tcp
    
    # Enable firewall
    ufw enable

    ⚠️ Important: Replace YOUR_IP_ADDRESS with your actual IP. Allowing public access to ClickHouse ports is a security risk!

    7

    Creating Your First Database

    Create a database and table using the powerful MergeTree engine:

    Create Database
    CREATE DATABASE analytics;
    USE analytics;
    Create Table with MergeTree Engine
    CREATE TABLE events (
        event_date Date,
        event_time DateTime,
        user_id UInt32,
        event_type String,
        url String,
        response_time UInt32
    ) ENGINE = MergeTree()
    PARTITION BY toYYYYMM(event_date)
    ORDER BY (event_date, event_time, user_id);
    Insert Sample Data
    INSERT INTO events VALUES
        ('2024-01-15', '2024-01-15 10:30:00', 12345, 'page_view', '/home', 150),
        ('2024-01-15', '2024-01-15 10:31:00', 12345, 'click', '/products', 80),
        ('2024-01-15', '2024-01-15 10:32:00', 67890, 'page_view', '/about', 120);
    Query Data
    SELECT * FROM events;
    
    -- Analytical query
    SELECT
        event_type,
        count() AS event_count,
        avg(response_time) AS avg_response
    FROM events
    GROUP BY event_type
    ORDER BY event_count DESC;
    8

    Performance Optimization

    Tune ClickHouse for optimal performance based on your VPS resources.

    Memory Configuration (8GB RAM example)

    Create memory.xml
    nano /etc/clickhouse-server/config.d/memory.xml
    memory.xml
    <clickhouse>
        <max_server_memory_usage>6442450944</max_server_memory_usage>
        <max_memory_usage>4294967296</max_memory_usage>
    </clickhouse>

    Query Complexity Limits

    Create query-limits.xml
    nano /etc/clickhouse-server/users.d/query-limits.xml
    query-limits.xml
    <clickhouse>
        <profiles>
            <default>
                <max_execution_time>60</max_execution_time>
                <max_rows_to_read>1000000000</max_rows_to_read>
                <max_bytes_to_read>10000000000</max_bytes_to_read>
            </default>
        </profiles>
    </clickhouse>

    Enable Query Logging

    query-log.xml
    <clickhouse>
        <query_log>
            <database>system</database>
            <table>query_log</table>
            <partition_by>toYYYYMM(event_date)</partition_by>
            <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        </query_log>
    </clickhouse>

    SSD/NVMe Optimization

    storage.xml (MergeTree settings)
    <clickhouse>
        <merge_tree>
            <parts_to_delay_insert>150</parts_to_delay_insert>
            <parts_to_throw_insert>300</parts_to_throw_insert>
            <max_parts_in_total>100000</max_parts_in_total>
        </merge_tree>
    </clickhouse>
    Apply Changes
    systemctl restart clickhouse-server
    9

    Security Hardening

    Create application-specific users instead of using the default user:

    Create Application User

    app-user.xml
    <clickhouse>
        <users>
            <app_user>
                <password_sha256_hex>your_hashed_password</password_sha256_hex>
                <networks>
                    <ip>APP_SERVER_IP/32</ip>
                </networks>
                <profile>default</profile>
                <quota>default</quota>
                <databases>
                    <database>analytics</database>
                </databases>
            </app_user>
        </users>
    </clickhouse>

    Create Read-Only User (for BI Tools)

    readonly-user.xml
    <clickhouse>
        <users>
            <readonly_user>
                <password_sha256_hex>your_hashed_password</password_sha256_hex>
                <networks>
                    <ip>BI_TOOL_IP/32</ip>
                </networks>
                <profile>readonly</profile>
                <quota>default</quota>
            </readonly_user>
        </users>
        <profiles>
            <readonly>
                <readonly>1</readonly>
            </readonly>
        </profiles>
    </clickhouse>

    Enable HTTPS

    Generate SSL Certificate
    mkdir -p /etc/clickhouse-server/certs
    openssl req -newkey rsa:2048 -nodes -keyout /etc/clickhouse-server/certs/server.key -x509 -days 365 -out /etc/clickhouse-server/certs/server.crt
    chown -R clickhouse:clickhouse /etc/clickhouse-server/certs
    https.xml
    <clickhouse>
        <https_port>8443</https_port>
        <openSSL>
            <server>
                <certificateFile>/etc/clickhouse-server/certs/server.crt</certificateFile>
                <privateKeyFile>/etc/clickhouse-server/certs/server.key</privateKeyFile>
                <verificationMode>none</verificationMode>
                <loadDefaultCAFile>true</loadDefaultCAFile>
                <cacheSessions>true</cacheSessions>
                <disableProtocols>sslv2,sslv3</disableProtocols>
                <preferServerCiphers>true</preferServerCiphers>
            </server>
        </openSSL>
    </clickhouse>
    10

    Backup and Recovery

    Set up automated backups to protect your data:

    Create Backup Directory
    mkdir -p /backup/clickhouse
    chown clickhouse:clickhouse /backup/clickhouse
    Create Backup Script
    nano /usr/local/bin/clickhouse-backup.sh
    clickhouse-backup.sh
    #!/bin/bash
    
    BACKUP_DIR="/backup/clickhouse"
    DATE=$(date +%Y%m%d_%H%M%S)
    RETENTION_DAYS=7
    
    # Create backup
    clickhouse-client --query="BACKUP DATABASE analytics TO File('/backup/clickhouse/analytics_${DATE}')" --password=YOUR_PASSWORD
    
    # Clean old backups
    find ${BACKUP_DIR} -name "analytics_*" -type d -mtime +${RETENTION_DAYS} -exec rm -rf {} \;
    
    echo "Backup completed: analytics_${DATE}"
    Make Executable and Schedule
    chmod +x /usr/local/bin/clickhouse-backup.sh
    
    # Add to crontab
    crontab -e
    
    # Add this line for daily 2 AM backups:
    0 2 * * * /usr/local/bin/clickhouse-backup.sh >> /var/log/clickhouse-backup.log 2>&1

    Using clickhouse-backup Tool

    Install clickhouse-backup
    wget https://github.com/Altinity/clickhouse-backup/releases/latest/download/clickhouse-backup-linux-amd64.tar.gz
    tar -xzvf clickhouse-backup-linux-amd64.tar.gz
    mv clickhouse-backup /usr/local/bin/
    chmod +x /usr/local/bin/clickhouse-backup
    Backup Operations
    # Create backup
    clickhouse-backup create
    
    # List backups
    clickhouse-backup list
    
    # Restore a backup
    clickhouse-backup restore BACKUP_NAME
    11

    Monitoring and Maintenance

    Monitor ClickHouse performance and maintain optimal operation:

    Check Resource Usage
    SELECT
        formatReadableSize(total_memory_usage) AS memory,
        formatReadableSize(total_bytes) AS disk
    FROM system.metrics
    WHERE metric LIKE '%memory%' OR metric LIKE '%bytes%';
    View Recent Queries
    SELECT
        query_start_time,
        query_duration_ms,
        read_rows,
        formatReadableSize(read_bytes) AS read_size,
        query
    FROM system.query_log
    WHERE type = 'QueryFinish'
    ORDER BY query_start_time DESC
    LIMIT 10;
    Identify Slow Queries
    SELECT
        query_duration_ms,
        read_rows,
        formatReadableSize(read_bytes) AS data_read,
        query
    FROM system.query_log
    WHERE type = 'QueryFinish'
    AND query_duration_ms > 1000
    ORDER BY query_duration_ms DESC
    LIMIT 10;

    Maintenance Tasks

    Optimize Tables
    OPTIMIZE TABLE analytics.events FINAL;
    Check Table Sizes
    SELECT
        database,
        table,
        formatReadableSize(sum(bytes)) AS size,
        sum(rows) AS rows
    FROM system.parts
    WHERE active
    GROUP BY database, table
    ORDER BY sum(bytes) DESC;

    Log Rotation

    Configure Log Rotation
    nano /etc/logrotate.d/clickhouse-server
    logrotate config
    /var/log/clickhouse-server/*.log {
        daily
        rotate 7
        compress
        delaycompress
        missingok
        notifempty
        create 0640 clickhouse clickhouse
        sharedscripts
        postrotate
            systemctl reload clickhouse-server > /dev/null 2>&1 || true
        endscript
    }
    12

    Connecting Applications

    Connect your applications to ClickHouse using various clients:

    Troubleshooting

    Best Practices

    1. Partitioning: Always partition large tables by date or logical dimension

    2. Ordering Key: Choose ORDER BY columns based on query patterns

    3. Data Types: Use specific types (UInt32 vs UInt64 when possible)

    4. Materialized Views: Create for frequently aggregated data

    5. Regular Optimization: Schedule OPTIMIZE TABLE during low-traffic

    6. Monitoring: Regularly review query logs and system metrics

    7. Backups: Maintain automated, tested backup procedures

    8. Security: Strong passwords, limited network access, role-specific users

    9. Resources: Monitor memory and disk usage, adjust limits

    10. Updates: Keep ClickHouse updated for performance and security