Podcast RAG
Production RAG for 500+ Hours of Audio

Search and chat with 500+ hours of podcast content. Whisper transcription, hybrid retrieval, reranking, and streaming responses with citations.
Overview
A production RAG system that goes beyond tutorials. Hybrid search (vector + keyword), reranking pipeline, proper chunking with overlap, and timestamped citations back to source audio.
Challenge
Tutorial RAG systems fail in production: naive chunking loses context, pure vector search misses keywords, no reranking means noisy results. I wanted to build RAG that actually works.
Approach
Built a data pipeline: YouTube download, Whisper transcription on A100 GPU, smart chunking with 25% overlap at segment boundaries.
Used OpenAI text-embedding-3-large (3072 dimensions) for high-quality semantic search. Pinecone for vector storage at scale.
Implemented hybrid retrieval: vector search (top-20) + keyword search (top-20) merged and reranked to top-5. Best of both worlds.
Added timestamped citations. Every answer links back to the specific moment in the podcast.
Outcome
The system handles 500+ hours of content with sub-second retrieval. Hybrid search catches both semantic matches and exact keywords. Reranking eliminates noise. Citations build trust.