Build Your Own Podcast RAG Chat with Gemini API
Have you ever wanted to have an intelligent conversation with the wisdom contained in your favorite podcasts? Imagine being able to ask questions and getting contextual answers with direct quotes and episode references.
In this guide, you’ll learn how to build a complete Retrieval-Augmented Generation (RAG) chat application that brings podcast transcripts to life using cutting-edge AI technology. We’ll use Google Gemini AI for both generating embeddings and providing conversational responses, Qdrant for vector similarity search, and a modern React frontend.
What is RAG and Why It Matters for Podcasts
Retrieval-Augmented Generation (RAG) combines the power of large language models with information retrieval from a knowledge base. For podcasts, this means:
- Semantic Search: Finding relevant content across hundreds of hours of audio transcripts
- Contextual Answers: Providing responses grounded in actual podcast content
- Source Citations: Always knowing where the information came from
- Scalability: Handling large volumes of text efficiently
Tech Stack Overview
Our application uses a carefully selected stack of modern technologies:
Frontend
- React 18 with TypeScript for type safety
- Tailwind CSS for responsive, modern UI
- React Query for efficient data fetching and caching
- React Router for navigation
Backend
- Node.js with Express and TypeScript
- Qdrant vector database for similarity search
- Google Gemini AI for embeddings and chat responses
- Winston for structured logging
Processing Pipeline
- Python scripts for transcript processing
- LangChain for text chunking and processing
- TikToken for intelligent text splitting
Prerequisites
Before we start, ensure you have:
- Node.js (v18 or higher)
- Python 3.8+
- Docker & Docker Compose (for local development)
- Google Gemini API key
Step 1: Project Setup
Let’s start by cloning and setting up the project structure:
git clone https://github.com/TMFNK/Founders-Podcast-RAG-Chat-Gemini.git
cd Founders-Podcast-RAG-Chat-Gemini
Create environment variables:
cp .env.example .env
Edit .env with your configuration:
# Google Gemini API Key
GOOGLE_GEMINI_API_KEY=your-gemini-api-key-here
VITE_GOOGLE_GEMINI_API_KEY=your-gemini-api-key-here
# Qdrant Configuration
QDRANT_URL_LOCAL=http://localhost:6333
QDRANT_URL_PRODUCTION=https://your-qdrant-instance.onrender.com
# Environment
NODE_ENV=development
Step 2: Setting Up the Vector Database
Qdrant is our vector database that will store podcast transcript embeddings. For local development:
docker run -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrant:v1.7.4
Install Python dependencies and set up the collection:
pip install -r requirements.txt
python scripts/setup_qdrant.py
Step 3: Processing Podcast Transcripts
The magic happens in the processing pipeline. Let’s examine the key components:
Transcript Processing Script
The process_transcripts.py script handles the heavy lifting:
import os
from qdrant_client import QdrantClient
from google.generativeai import configure, embed_content
import json
# Configure Gemini AI
configure(api_key=os.getenv('GOOGLE_GEMINI_API_KEY'))
def process_transcript(file_path):
"""Process a single podcast transcript"""
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# Extract metadata from filename
filename = os.path.basename(file_path)
episode_num = extract_episode_number(filename)
title = extract_title(filename)
# Split into chunks
chunks = split_into_chunks(content)
# Generate embeddings and store
for i, chunk in enumerate(chunks):
embedding = embed_content(
model="models/embedding-001",
content=chunk,
task_type="retrieval_document"
)
# Store in Qdrant
client.upsert(
collection_name="podcast_transcripts",
points=[{
"id": f"{episode_num}_{i}",
"vector": embedding['embedding'],
"payload": {
"episode": episode_num,
"title": title,
"chunk": chunk,
"chunk_id": i
}
}]
)
Chunking Strategy
Effective text chunking is crucial for RAG performance:
import tiktoken
def split_into_chunks(text, chunk_size=1000, overlap=200):
"""Split text into overlapping chunks using token counting"""
encoding = tiktoken.get_encoding("cl100k_base")
tokens = encoding.encode(text)
chunks = []
start = 0
while start < len(tokens):
end = min(start + chunk_size, len(tokens))
chunk_tokens = tokens[start:end]
chunk_text = encoding.decode(chunk_tokens)
chunks.append(chunk_text)
start = end - overlap
return chunks
Step 4: Building the Chat API
The backend API handles chat requests and semantic search:
Search Endpoint
// api/search.ts
import { Request, Response } from "express";
import { QdrantClient } from "@qdrant/js-client-rest";
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_GEMINI_API_KEY!);
const qdrant = new QdrantClient({ url: process.env.QDRANT_URL_LOCAL });
export const searchTranscripts = async (req: Request, res: Response) => {
const { query, limit = 5 } = req.body;
// Generate embedding for the query
const embeddingModel = genAI.getGenerativeModel({ model: "embedding-001" });
const result = await embeddingModel.embedContent(query);
const queryEmbedding = result.embedding.values;
// Search Qdrant
const searchResults = await qdrant.search("podcast_transcripts", {
vector: queryEmbedding,
limit,
with_payload: true,
});
res.json(searchResults);
};
Chat Endpoint
// api/chat.ts
export const chatWithAI = async (req: Request, res: Response) => {
const { question, context } = req.body;
// Retrieve relevant context
const searchResults = await qdrant.search("podcast_transcripts", {
vector: await generateEmbedding(question),
limit: 5,
});
const contextText = searchResults
.map((result) => result.payload?.chunk)
.join("\n\n");
// Generate response with Gemini
const model = genAI.getGenerativeModel({ model: "gemini-pro" });
const prompt = `Context from podcast transcripts:\n${contextText}\n\nQuestion: ${question}\n\nAnswer based on the context provided:`;
const result = await model.generateContent(prompt);
const response = result.response.text();
res.json({
response,
sources: searchResults.map((r) => ({
episode: r.payload?.episode,
title: r.payload?.title,
score: r.score,
})),
});
};
Step 5: Creating the Frontend
The React frontend provides an intuitive chat interface:
// src/components/ChatInterface.tsx
import { useState } from "react";
import { useMutation } from "@tanstack/react-query";
export const ChatInterface = () => {
const [messages, setMessages] = useState([]);
const [input, setInput] = useState("");
const chatMutation = useMutation({
mutationFn: async (question: string) => {
const response = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ question }),
});
return response.json();
},
onSuccess: (data) => {
setMessages((prev) => [
...prev,
{ role: "user", content: input },
{ role: "assistant", content: data.response, sources: data.sources },
]);
setInput("");
},
});
const handleSubmit = (e) => {
e.preventDefault();
if (input.trim()) {
chatMutation.mutate(input);
}
};
return (
<div className="flex flex-col h-screen">
<div className="flex-1 overflow-y-auto p-4">
{messages.map((msg, idx) => (
<div
key={idx}
className={`mb-4 ${
msg.role === "user" ? "text-right" : "text-left"
}`}>
<div
className={`inline-block p-3 rounded-lg ${
msg.role === "user" ? "bg-blue-500 text-white" : "bg-gray-200"
}`}>
{msg.content}
{msg.sources && (
<div className="mt-2 text-sm">
{msg.sources.map((source, sidx) => (
<div key={sidx}>
Episode {source.episode}: {source.title}
</div>
))}
</div>
)}
</div>
</div>
))}
</div>
<form onSubmit={handleSubmit} className="p-4 border-t">
<div className="flex">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
className="flex-1 p-2 border rounded-l-lg"
placeholder="Ask about the podcast content..."
/>
<button
type="submit"
disabled={chatMutation.isPending}
className="px-4 py-2 bg-blue-500 text-white rounded-r-lg disabled:opacity-50">
{chatMutation.isPending ? "Thinking..." : "Send"}
</button>
</div>
</form>
</div>
);
};
Step 6: Running the Application
Start all services using Docker Compose:
docker-compose up -d
Process your podcast transcripts:
docker-compose exec processor python scripts/process_transcripts.py --init
Start the development servers:
# Backend
cd api && npm start
# Frontend
npm run dev
Visit http://localhost:5173 to start chatting with your podcast transcripts!
Understanding the Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend API │ │ Vector DB │
│ (React/TS) │◄──►│ (Node/Express)│◄──►│ (Qdrant) │
│ │ │ │ │ │
│ • Chat UI │ │ • RAG Logic │ │ • Embeddings │
│ • Search UI │ │ • Rate Limiting │ │ • Similarity │
│ • Real-time │ │ • API Validation│ │ • Metadata │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐
│ AI Models │
│ (Google Gemini)│
│ │
│ • Text Generation│
│ • Embeddings │
└─────────────────┘
Performance Optimization Tips
- Incremental Processing: Only process new transcripts to avoid reprocessing everything
- Chunk Size Tuning: Experiment with different chunk sizes (500-2000 tokens)
- Embedding Caching: Cache embeddings to avoid recomputation
- Database Indexing: Ensure proper vector indexing in Qdrant
- Response Streaming: Implement streaming for better UX with long responses
Deployment to Production
For production deployment:
- Database: Deploy Qdrant to Render or another cloud provider
- Backend: Deploy the API with environment variables
- Frontend: Build and deploy the static site
- Environment Variables: Set production URLs and API keys
Troubleshooting Common Issues
- Qdrant Connection Failed: Check Docker container status and port mapping
- API Key Issues: Verify Gemini API key validity and quota
- Processing Errors: Check transcript file encoding (should be UTF-8)
- Search Results Poor: Adjust chunk size or overlap parameters
Extending the Application
Ideas for enhancement:
- Multi-podcast Support: Add support for different podcast series
- Audio Integration: Add audio playback with timestamps
- User Authentication: Add user accounts and conversation history
- Advanced Search: Implement filters by episode, topic, or speaker
- Export Features: Allow exporting chat conversations
This RAG chat application demonstrates the power of combining vector databases with large language models to create intelligent, context-aware applications. The same principles can be applied to documentation, customer support, or any domain with large amounts of text content.