Skip to main content
AI / ML

ResearchRAG

Secure AI-Powered Document Research Platform

The Problem

A federal research organization needed to enable analysts to query thousands of classified and unclassified documents using natural language — without exposing sensitive PII across access tiers.

What We Built

We built a Cloud Run microservices platform with Azure Blob Storage for document ingestion, Document Intelligence for parsing, and OpenAI for natural language querying. Google Cloud DLP handles PII detection and sensitive data controls. A classification-based RBAC system ensures analysts only see documents matching their clearance. Vector search powered by Vertex AI embeddings enables fast semantic retrieval across the corpus.

Architecture

INGESTIONPROCESSINGRETRIEVALAzure Blob StorageDoc IntelligenceCloud DLPVertex AI EmbedBigQueryRBAC EngineVector SearchFastAPIAzure OpenAI

Technology Stack

Cloud RunAzure Blob StorageDocument IntelligenceAzure OpenAIBigQueryGoogle Cloud DLPVertex AI EmbeddingsPythonFastAPI

Outcomes

PII detection integrated into document processing workflows
Fast semantic search across large document collections
Role-aware access aligned to clearance and need-to-know
Reduced manual review burden for analysts

Need something similar?

Let's discuss how we can build a solution tailored to your requirements.