Tax consultancy with client files
Case worker queries client history in natural language instead of clicking through 15 folders. Important: GDPR-compliant read-only access with audit log, on-premise option for highly sensitive data.
LLM Integration · RAG
RAG (Retrieval Augmented Generation) connects Notion, Confluence, Google Drive, SharePoint, your PDF archive and your Slack history with ChatGPT, Claude or a custom bot. Answers come with source citation and confidence score. No guessed answers, no invented facts.
What you get
We don't just connect sources to an LLM — we build the full retrieval pipeline with update automation, reranking and monitoring. So the system works over months, not just in the demo.
What it's used for
Case worker queries client history in natural language instead of clicking through 15 folders. Important: GDPR-compliant read-only access with audit log, on-premise option for highly sensitive data.
Customer bot finds the right product from thousands of items based on natural description (material, size, price range, intended use). Answers with product links and stock status.
New employees ask the knowledge base instead of interrupting experienced colleagues. We index Slack threads, Notion, Confluence, GitHub issues. Source citation per answer so everyone knows who originally had the idea.
Technician asks AI in the field about specifications, installation instructions, warranty conditions. PDF archive with thousands of data sheets becomes searchable — even when the exact word isn't in the document.
Concierge bot answers guest questions 24/7. Accesses internal FAQ, insider recommendations, public data about Mallorca activities. Escalates to you on complexity.
How it works
One week. We walk through all knowledge sources with you and one to two key employees. Rating by quality, freshness and sensitivity. Result: list of three to seven sources that go into phase 1.
We build the indexing pipeline. Chunking strategy per source (long documents differ from FAQ), embedding model choice depending on language and domain, vector database setup (pgvector if you already use Postgres, otherwise Qdrant self-hosted).
We build a test suite from 20-50 real queries you give us beforehand. Success criterion: 80%+ of queries answered correctly with the right source. If not, we iterate on chunking, reranking, prompt.
The system goes live, we track query volume, success rate, drift. Monthly report. Update automation runs in the background. We adjust the pipeline to new sources or new use cases as needed.
Pricing
Entry with one knowledge source (e.g. only Notion or only your PDF archive): from 1,500 EUR one-off plus 99 EUR/mo hosting. Multiple sources or large data volumes (more than 50,000 documents) from 3,500 EUR setup. Monitoring reports and drift detection included. LLM costs (OpenAI, Anthropic) run separately on your account.
See pricing and packagesFAQ
RAG (Retrieval Augmented Generation) fetches the relevant documents from your knowledge base on each query and gives them to the LLM as context. Fine-tuning bakes your knowledge into the model. RAG advantage: you can keep knowledge up to date without retraining, you have source citation, you stay in control. Fine-tuning needs more setup and is unusable with dynamic data. For 95% of SMB cases, RAG is the right choice.
Default is pgvector if you already use Postgres (no extra service, good up to a few million vectors). Qdrant for larger data volumes or special filter requirements, self-hosted possible. Weaviate when you need multi-modal (text plus images). We decide based on your data volume and hosting preference, not based on hype.
Embeddings: Voyage AI v3 (standard, very good for German and English), OpenAI text-embedding-3-large (if you use OpenAI anyway), Cohere embed-multilingual-v3 if you need many languages. LLM answer layer: Claude Sonnet 4.6 for complex answers with long context, GPT-4 for quick queries, local models (Llama 3.3, Mistral) for highest data sensitivity.
Vector database hosting: 0 EUR if pgvector on your existing Postgres, 49-149 EUR/mo if separate service. Embedding updates: 5-30 EUR per 100,000 chunks (one-time per update). LLM costs per query: 0.005-0.05 EUR depending on model and answer length. At 1,000 queries per month you're at 10-50 EUR LLM costs plus hosting. We give you a projection beforehand based on your expected volume.
Simple setups (one source, fewer than 10,000 documents, standard use case): three to four weeks from kick-off to production. More complex setups (multiple sources, permissions, custom reranker): six to ten weeks. After the audit we give you an honest estimate with milestones.
Then we go on-premise or into your own cloud (AWS account, Hetzner, etc). Embedding model and LLM can run locally (Llama, Mistral via Ollama). No data leaves your infrastructure. We do a GDPR risk analysis beforehand and document the setup so it's auditable.
Next step
We look at which knowledge sources you have, whether RAG is the right lever for your case and which tools we'd recommend. No sales pressure. If the answer is *RAG doesn't fit*, we say that too.