Case Study

Internal AI Knowledge Platform

Built an internal AI platform for querying company knowledge across documents, SharePoint, and systems using retrieval-augmented generation. Significantly reduced information retrieval time.

AI Python PostgreSQL Vector DB

Problem

Information was spread across SharePoint, emails, documents, and internal systems. Staff wasted large amounts of time searching for answers that existed somewhere in the organisation but were effectively inaccessible.

The technical support and operations teams were hit hardest. They needed fast, accurate answers from a knowledge base that spanned years of accumulated documentation.

Approach

Built a retrieval-augmented generation platform. The system ingests documents, creates embeddings, stores them in a vector database, retrieves relevant context at query time, and generates answers with source attribution.

Pipeline

  1. Ingestion. Python workers pull documents from SharePoint, file shares, and internal systems on a schedule
  2. Processing. Documents are chunked, cleaned, and embedded using sentence transformers
  3. Storage. Embeddings stored in PostgreSQL with pgvector for efficient similarity search
  4. Retrieval. User queries are embedded and matched against the knowledge base using hybrid search (semantic + keyword)
  5. Generation. LLM synthesises an answer from retrieved context, with citations back to source documents

Architecture

Ingestion layer: Python workers with connectors for SharePoint, file systems, and internal APIs

Vector storage: PostgreSQL + pgvector. Chosen for data residency requirements and existing infrastructure

LLM reasoning layer: generates answers from retrieved context with source attribution

Deployment: secure internal deployment behind existing authentication

Results

  • Significantly reduced information retrieval time across support and operations
  • Improved operational efficiency. Staff get answers in seconds rather than searching for minutes
  • Natural language querying. Non-technical users can search internal data without knowing where to look
  • Source attribution. Every answer links back to the original document, building trust in the system

Lessons

The biggest win wasn’t the AI. It was making information findable. Most of the knowledge already existed. It was just buried in places nobody could search effectively.

pgvector was the right choice for our constraints. Data residency meant we couldn’t use managed vector databases, and keeping everything in PostgreSQL simplified the operational burden significantly.

Let's build something together

I'm always interested in hearing about new projects, particularly around AI systems, security, and infrastructure.