Jez | Jeremy Dawes
Back to projects
mcp active

Context MCP Server

Self-hosted documentation RAG with 18 MCP tools. Semantic search across 40+ curated technical doc sources with hybrid vector + keyword search.

MCPCloudflare WorkersVectorizeWorkers AID1

Overview

A self-hosted documentation RAG system that provides intelligent search across curated technical documentation. Built to replace cluttered tools like Context7 with a focused, manageable set of docs you actually use.

18
MCP Tools
40+
Doc Sources
23K+
Indexed Chunks
3
Search Types

The Problem

Documentation search tools accumulate clutter:

  • Context7 has thousands of docs you never use
  • Finding relevant information takes too long
  • No control over what’s indexed
  • Stale documentation causes incorrect answers

AI assistants need access to current, curated documentation.

Solution

Complete control over your documentation index:

Search3

Hybrid, vector, keyword

Sources5

CRUD + scraping

Content4

Chunks, pages, context

Discovery3

Related, suggest, compare

Stats2

Freshness, metrics

Export1

Markdown to R2

Key Capabilities

Three search modes for different needs:

ModeBest For
HybridGeneral queries (vector + BM25 keyword)
VectorConceptual/semantic questions
KeywordExact function/API name lookups

Intelligent Scraping Pipeline

Multi-stage content processing:

  1. URL Discovery - Firecrawl /map or browser crawling
  2. Content Extraction - Firecrawl (primary), Browser Rendering, or fetch
  3. Regex Cleanup - Remove nav, footers, UI chrome
  4. AI Cleanup - Workers AI (Qwen 32B) removes marketing fluff
  5. Semantic Chunking - Split at H2/H3 boundaries
  6. Embedding - BGE-base-en-v1.5 via Workers AI
  7. Indexing - Store in Vectorize for similarity search

Auto-Updates

Configurable refresh schedules:

  • Hourly - Fast-changing API docs
  • Daily - Most documentation
  • Weekly - Stable reference material

Change detection via ETag/Last-Modified headers and content hashing.

MCP Tools (18 total)

Search:

  • search_docs - Hybrid semantic + keyword search
  • lookup_api - Exact function/class name lookup
  • search_code - Find code examples

Source Management:

  • list_sources - View indexed documentation
  • add_source - Add new documentation URL
  • update_source - Modify source config
  • delete_source - Remove documentation
  • scrape_source - Trigger re-indexing

Content Retrieval:

  • get_chunk - Retrieve specific chunk by ID
  • get_full_page - Get complete page content
  • get_page_context - Get chunk with surrounding context

Discovery:

  • get_related_chunks - Find similar content
  • suggest_related - AI-powered suggestions
  • compare_chunks - Compare two chunks

Architecture

Cloudflare Workers
├── Admin UI (React SPA)
├── REST API (Hono)
├── MCP Server (Durable Objects)
└── Queue Handler (async processing)

Storage:
├── D1 (metadata, sources, pages, chunks)
├── Vectorize (768-dim embeddings)
├── R2 (markdown exports)
└── KV (OAuth sessions)

Features

  • 18 MCP Tools - Complete documentation access
  • Hybrid Search - Vector + keyword for accuracy
  • AI Cleanup - < 1% cruft rate in indexed content
  • Auto-Updates - Cron-triggered re-indexing
  • Admin UI - Visual source management
  • OAuth - Google authentication with allowlist
  • Markdown Export - Download docs as combined markdown

Quality Metrics

Production results from indexed sources:

SourceChunksCruft Rate
Vercel AI SDK1,1990.25%
Hono5250%
shadcn/ui4900.6%

Two-stage cleanup (regex + AI) achieves < 1% cruft across all sources.

Use Cases

Claude Code Integration - Add @context MCP server for instant doc search

API Lookups - Find exact function signatures and parameters

Learning New Frameworks - Search concepts across multiple doc sources

Code Examples - Find implementation patterns and snippets

Version Tracking - Keep docs current with auto-updates

Interested in a similar solution?

Let's talk about your project