Jez | Jeremy Dawes
Back to projects
workflow active

Document Semantic Search

Process documents and build semantic search with OpenAI embeddings, Gemini, and Qdrant vector database. 780+ views on n8n.

n8nOpenAIGeminiQdrant

Overview

An n8n workflow for building semantic search over your documents. Processes PDFs, docs, and text files, creates embeddings with OpenAI, stores them in Qdrant, and enables AI-powered search with Gemini.

780+
Views
2
AI Models
Qdrant
Vector DB

How It Works

Ingestion Pipeline

  1. Upload Documents - PDFs, DOCX, TXT, or URLs
  2. Text Extraction - Convert to plain text
  3. Chunking - Split into semantic chunks
  4. Embeddings - Generate vectors with OpenAI
  5. Storage - Store in Qdrant vector database

Search Pipeline

  1. Query - Receive search query
  2. Embed Query - Convert to vector
  3. Vector Search - Find similar chunks in Qdrant
  4. Context Assembly - Gather relevant chunks
  5. AI Answer - Gemini synthesises response

Workflow Components

Ingestion3

Doc processing pipeline

Embeddings1

OpenAI ada-002

Vector DB1

Qdrant storage

Search1

Semantic matching

AI1

Gemini answers

Output1

Formatted response

Features

  • Multi-Format Support - PDF, DOCX, TXT, Markdown
  • Smart Chunking - Preserves semantic boundaries
  • Hybrid Search - Combines vector + keyword search
  • Source Attribution - Links back to original documents
  • Incremental Updates - Add new docs without full reindex

Architecture

Documents → Text Extraction → Chunking → OpenAI Embeddings

                                          Qdrant DB

Query → Embed Query → Vector Search → Gemini Answer

Example Query

User: “What’s our refund policy for software subscriptions?”

System:

Based on the company policies document:

**Software Subscription Refunds**

- Full refund available within 14 days of purchase
- Pro-rata refund for annual plans cancelled after 14 days
- No refund for monthly plans (cancel before renewal)

Special cases:
- Technical issues preventing use: Full refund at any time
- Billing errors: Immediate correction + refund

📄 Source: Company-Policies-2024.pdf (Page 12)

Use Cases

Knowledge Base - Search internal documentation

Legal/Compliance - Find relevant policy sections

Research - Search academic papers and reports

Customer Support - Find answers in product docs

Configuration

ComponentOptions
Embedding Modeltext-embedding-ada-002, text-embedding-3-small
Chunk Size500-2000 tokens
Overlap50-200 tokens
Vector DBQdrant (self-hosted or cloud)
LLMGemini Pro, GPT-4

Interested in a similar solution?

Let's talk about your project