Internal AI Knowledge Platform
Built an internal AI platform for querying company knowledge across documents, SharePoint, and systems using retrieval-augmented generation. Significantly reduced information retrieval time.
Problem
Information was spread across SharePoint, emails, documents, and internal systems. Staff wasted large amounts of time searching for answers that existed somewhere in the organisation but were effectively inaccessible.
The technical support and operations teams were hit hardest. They needed fast, accurate answers from a knowledge base that spanned years of accumulated documentation.
Approach
Built a retrieval-augmented generation platform. The system ingests documents, creates embeddings, stores them in a vector database, retrieves relevant context at query time, and generates answers with source attribution.
Pipeline
- Ingestion. Python workers pull documents from SharePoint, file shares, and internal systems on a schedule
- Processing. Documents are chunked, cleaned, and embedded using sentence transformers
- Storage. Embeddings stored in PostgreSQL with pgvector for efficient similarity search
- Retrieval. User queries are embedded and matched against the knowledge base using hybrid search (semantic + keyword)
- Generation. LLM synthesises an answer from retrieved context, with citations back to source documents
Architecture
Ingestion layer: Python workers with connectors for SharePoint, file systems, and internal APIs
Vector storage: PostgreSQL + pgvector. Chosen for data residency requirements and existing infrastructure
LLM reasoning layer: generates answers from retrieved context with source attribution
Deployment: secure internal deployment behind existing authentication
Results
- Significantly reduced information retrieval time across support and operations
- Improved operational efficiency. Staff get answers in seconds rather than searching for minutes
- Natural language querying. Non-technical users can search internal data without knowing where to look
- Source attribution. Every answer links back to the original document, building trust in the system
Lessons
The biggest win wasn’t the AI. It was making information findable. Most of the knowledge already existed. It was just buried in places nobody could search effectively.
pgvector was the right choice for our constraints. Data residency meant we couldn’t use managed vector databases, and keeping everything in PostgreSQL simplified the operational burden significantly.