Self-Hosted AI Stack Series
    Part 5 of 8

    Tabby — Self-Hosted Code Completion

    Replace GitHub Copilot with your own private AI code assistant.

    35 minutes
    8GB recommended
    Prerequisites

    Completed Part 1 (Ollama), VS Code or JetBrains IDE, Docker

    Time to Complete

    30–40 minutes

    Recommended Plan

    8GB ($40/mo) recommended for responsive code completion

    Looking for a quick-start guide? Check out our standalone Tabby Deployment Guide for a streamlined setup walkthrough.

    Introduction

    GitHub Copilot costs $19/user/month and sends your proprietary code to Microsoft's servers. For a team of 10, that's $190/month — and every line of your codebase is processed externally.

    Tabby runs on your VPS, uses open models, and keeps every line of code private. The ROI case practically makes itself.

    💰 Cost comparison: GitHub Copilot Business for 5 users: $95/month. Tabby on your RamNode VPS: $0/user — unlimited users, zero code exposure.

    Tabby vs Continue.dev

    FeatureTabbyContinue.dev
    ArchitectureDedicated server + IDE extensionVS Code extension only
    Model servingBuilt-in, optimizedConnects to Ollama
    Repo contextIndexes your codebaseFile-level context
    Team featuresMulti-user, analyticsSingle user
    IDE supportVS Code, JetBrains, VimVS Code only

    Deploying Tabby

    mkdir -p ~/ai-stack/tabby && cd ~/ai-stack/tabby
    docker-compose.yml
    version: "3.8"
    
    services:
      tabby:
        image: tabbyml/tabby:latest
        container_name: tabby
        restart: unless-stopped
        command: serve --model StarCoder-1B --device cpu
        ports:
          - "8080:8080"
        volumes:
          - tabby-data:/data
        environment:
          - TABBY_DISABLE_USAGE_COLLECTION=1
    
    volumes:
      tabby-data:
    docker compose up -d

    Tabby dashboard available at http://your-server-ip:8080.

    Model Selection for Code Completion

    ModelSizeRAMLatencyQuality
    StarCoder-1B1B~2 GBFastGood for basic completion
    StarCoder-3B3B~4 GBModerateBetter suggestions
    CodeLlama-7B7B~6 GBSlowerBest quality
    DeepSeek-Coder-1.3B1.3B~2 GBFastExcellent for size

    Remember to account for RAM used by Ollama (Part 1). On an 8GB VPS, use StarCoder-1B or DeepSeek-Coder-1.3B alongside Ollama.

    IDE Integration

    VS Code

    1. Install the Tabby extension from the VS Code marketplace
    2. Open Settings → search "Tabby"
    3. Set Server URL to http://your-server-ip:8080
    4. Set your authentication token (from Tabby dashboard)
    5. Start typing — completions appear inline automatically

    JetBrains IDEs

    1. Go to Settings → Plugins → Marketplace
    2. Search and install "Tabby"
    3. Configure the server endpoint under Settings → Tools → Tabby
    4. Works with IntelliJ, PyCharm, WebStorm, and all JetBrains IDEs

    Repository Context

    Tabby can index your codebase for context-aware completions that understand your project's patterns, naming conventions, and architecture:

    1. Navigate to the Tabby dashboard → Repositories
    2. Add your Git repositories (supports GitHub, GitLab, or local paths)
    3. Configure indexing schedules (hourly/daily)
    4. Select which repositories and branches to index

    This dramatically improves suggestion quality — completions reference your actual code patterns rather than generic suggestions.

    Team Configuration

    Set up multi-user access:

    • Create user accounts from the Tabby dashboard
    • Generate per-user authentication tokens
    • View usage analytics per developer
    • Configure different models for different teams (Python team might prefer CodeLlama, TypeScript team might prefer StarCoder)

    Performance Optimization

    Completions should feel instant — aim for under 500ms latency:

    Model4GB VPS8GB VPSFeel
    StarCoder-1B~200ms~150msInstant
    DeepSeek-1.3B~300ms~200msSnappy
    StarCoder-3B~800ms~400msNoticeable
    CodeLlama-7BToo slow~700msAcceptable

    Continue.dev Alternative

    If you prefer a simpler setup that connects directly to Ollama (Part 1):

    1. Install the Continue extension in VS Code
    2. Open ~/.continue/config.json
    3. Configure your Ollama endpoint:
    ~/.continue/config.json
    {
      "models": [{
        "title": "Ollama - Mistral",
        "provider": "ollama",
        "model": "mistral",
        "apiBase": "http://your-server-ip:11434"
      }],
      "tabAutocompleteModel": {
        "title": "DeepSeek Coder",
        "provider": "ollama",
        "model": "deepseek-coder:6.7b",
        "apiBase": "http://your-server-ip:11434"
      }
    }

    What's Next?

    Your developers now have private AI code assistance that rivals Copilot — at a fraction of the cost and with zero code exposure. In Part 6: CrewAI, we go beyond single-model interactions with multi-agent AI workflows.