Build Your Own Podcast RAG Chat with Gemini API

Have you ever wanted to have an intelligent conversation with the wisdom contained in your favorite podcasts? Imagine being able to ask questions and getting contextual answers with direct quotes and episode references.

In this guide, you’ll learn how to build a complete Retrieval-Augmented Generation (RAG) chat application that brings podcast transcripts to life using cutting-edge AI technology. We’ll use Google Gemini AI for both generating embeddings and providing conversational responses, Qdrant for vector similarity search, and a modern React frontend.

What is RAG and Why It Matters for Podcasts

Retrieval-Augmented Generation (RAG) combines the power of large language models with information retrieval from a knowledge base. For podcasts, this means:

Semantic Search: Finding relevant content across hundreds of hours of audio transcripts
Contextual Answers: Providing responses grounded in actual podcast content
Source Citations: Always knowing where the information came from
Scalability: Handling large volumes of text efficiently

Tech Stack Overview

Our application uses a carefully selected stack of modern technologies:

Frontend

React 18 with TypeScript for type safety
Tailwind CSS for responsive, modern UI
React Query for efficient data fetching and caching
React Router for navigation

Backend

Node.js with Express and TypeScript
Qdrant vector database for similarity search
Google Gemini AI for embeddings and chat responses
Winston for structured logging

Processing Pipeline

Python scripts for transcript processing
LangChain for text chunking and processing
TikToken for intelligent text splitting

Prerequisites

Before we start, ensure you have:

Node.js (v18 or higher)
Python 3.8+
Docker & Docker Compose (for local development)
Google Gemini API key

Step 1: Project Setup

Let’s start by cloning and setting up the project structure:

git clone https://github.com/TMFNK/Founders-Podcast-RAG-Chat-Gemini.git
cd Founders-Podcast-RAG-Chat-Gemini

Create environment variables:

cp .env.example .env

Edit .env with your configuration:

# Google Gemini API Key
GOOGLE_GEMINI_API_KEY=your-gemini-api-key-here
VITE_GOOGLE_GEMINI_API_KEY=your-gemini-api-key-here

# Qdrant Configuration
QDRANT_URL_LOCAL=http://localhost:6333
QDRANT_URL_PRODUCTION=https://your-qdrant-instance.onrender.com

# Environment
NODE_ENV=development

Step 2: Setting Up the Vector Database

Qdrant is our vector database that will store podcast transcript embeddings. For local development:

docker run -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrant:v1.7.4

Install Python dependencies and set up the collection:

pip install -r requirements.txt
python scripts/setup_qdrant.py

Step 3: Processing Podcast Transcripts

The magic happens in the processing pipeline. Let’s examine the key components:

Transcript Processing Script

The process_transcripts.py script handles the heavy lifting:

import os
from qdrant_client import QdrantClient
from google.generativeai import configure, embed_content
import json

# Configure Gemini AI
configure(api_key=os.getenv('GOOGLE_GEMINI_API_KEY'))

def process_transcript(file_path):
    """Process a single podcast transcript"""
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()

    # Extract metadata from filename
    filename = os.path.basename(file_path)
    episode_num = extract_episode_number(filename)
    title = extract_title(filename)

    # Split into chunks
    chunks = split_into_chunks(content)

    # Generate embeddings and store
    for i, chunk in enumerate(chunks):
        embedding = embed_content(
            model="models/embedding-001",
            content=chunk,
            task_type="retrieval_document"
        )

        # Store in Qdrant
        client.upsert(
            collection_name="podcast_transcripts",
            points=[{
                "id": f"{episode_num}_{i}",
                "vector": embedding['embedding'],
                "payload": {
                    "episode": episode_num,
                    "title": title,
                    "chunk": chunk,
                    "chunk_id": i
                }
            }]
        )

Chunking Strategy

Effective text chunking is crucial for RAG performance:

import tiktoken

def split_into_chunks(text, chunk_size=1000, overlap=200):
    """Split text into overlapping chunks using token counting"""
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)

    chunks = []
    start = 0

    while start < len(tokens):
        end = min(start + chunk_size, len(tokens))
        chunk_tokens = tokens[start:end]
        chunk_text = encoding.decode(chunk_tokens)
        chunks.append(chunk_text)
        start = end - overlap

    return chunks

Step 4: Building the Chat API

The backend API handles chat requests and semantic search:

Search Endpoint

// api/search.ts
import { Request, Response } from "express";
import { QdrantClient } from "@qdrant/js-client-rest";
import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_GEMINI_API_KEY!);
const qdrant = new QdrantClient({ url: process.env.QDRANT_URL_LOCAL });

export const searchTranscripts = async (req: Request, res: Response) => {
  const { query, limit = 5 } = req.body;

  // Generate embedding for the query
  const embeddingModel = genAI.getGenerativeModel({ model: "embedding-001" });
  const result = await embeddingModel.embedContent(query);
  const queryEmbedding = result.embedding.values;

  // Search Qdrant
  const searchResults = await qdrant.search("podcast_transcripts", {
    vector: queryEmbedding,
    limit,
    with_payload: true,
  });

  res.json(searchResults);
};

Chat Endpoint

// api/chat.ts
export const chatWithAI = async (req: Request, res: Response) => {
  const { question, context } = req.body;

  // Retrieve relevant context
  const searchResults = await qdrant.search("podcast_transcripts", {
    vector: await generateEmbedding(question),
    limit: 5,
  });

  const contextText = searchResults
    .map((result) => result.payload?.chunk)
    .join("\n\n");

  // Generate response with Gemini
  const model = genAI.getGenerativeModel({ model: "gemini-pro" });
  const prompt = `Context from podcast transcripts:\n${contextText}\n\nQuestion: ${question}\n\nAnswer based on the context provided:`;

  const result = await model.generateContent(prompt);
  const response = result.response.text();

  res.json({
    response,
    sources: searchResults.map((r) => ({
      episode: r.payload?.episode,
      title: r.payload?.title,
      score: r.score,
    })),
  });
};

Step 5: Creating the Frontend

The React frontend provides an intuitive chat interface:

// src/components/ChatInterface.tsx
import { useState } from "react";
import { useMutation } from "@tanstack/react-query";

export const ChatInterface = () => {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState("");

  const chatMutation = useMutation({
    mutationFn: async (question: string) => {
      const response = await fetch("/api/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ question }),
      });
      return response.json();
    },
    onSuccess: (data) => {
      setMessages((prev) => [
        ...prev,
        { role: "user", content: input },
        { role: "assistant", content: data.response, sources: data.sources },
      ]);
      setInput("");
    },
  });

  const handleSubmit = (e) => {
    e.preventDefault();
    if (input.trim()) {
      chatMutation.mutate(input);
    }
  };

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, idx) => (
          <div
            key={idx}
            className={`mb-4 ${
              msg.role === "user" ? "text-right" : "text-left"
            }`}>
            <div
              className={`inline-block p-3 rounded-lg ${
                msg.role === "user" ? "bg-blue-500 text-white" : "bg-gray-200"
              }`}>
              {msg.content}
              {msg.sources && (
                <div className="mt-2 text-sm">
                  {msg.sources.map((source, sidx) => (
                    <div key={sidx}>
                      Episode {source.episode}: {source.title}
                    </div>
                  ))}
                </div>
              )}
            </div>
          </div>
        ))}
      </div>

      <form onSubmit={handleSubmit} className="p-4 border-t">
        <div className="flex">
          <input
            type="text"
            value={input}
            onChange={(e) => setInput(e.target.value)}
            className="flex-1 p-2 border rounded-l-lg"
            placeholder="Ask about the podcast content..."
          />
          <button
            type="submit"
            disabled={chatMutation.isPending}
            className="px-4 py-2 bg-blue-500 text-white rounded-r-lg disabled:opacity-50">
            {chatMutation.isPending ? "Thinking..." : "Send"}
          </button>
        </div>
      </form>
    </div>
  );
};

Step 6: Running the Application

Start all services using Docker Compose:

docker-compose up -d

Process your podcast transcripts:

docker-compose exec processor python scripts/process_transcripts.py --init

Start the development servers:

# Backend
cd api && npm start

# Frontend
npm run dev

Visit http://localhost:5173 to start chatting with your podcast transcripts!

Understanding the Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend API   │    │   Vector DB     │
│   (React/TS)    │◄──►│   (Node/Express)│◄──►│   (Qdrant)      │
│                 │    │                 │    │                 │
│ • Chat UI       │    │ • RAG Logic     │    │ • Embeddings    │
│ • Search UI     │    │ • Rate Limiting │    │ • Similarity    │
│ • Real-time     │    │ • API Validation│    │ • Metadata      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │  AI Models      │
                    │  (Google Gemini)│
                    │                 │
                    │ • Text Generation│
                    │ • Embeddings     │
                    └─────────────────┘

Performance Optimization Tips

Incremental Processing: Only process new transcripts to avoid reprocessing everything
Chunk Size Tuning: Experiment with different chunk sizes (500-2000 tokens)
Embedding Caching: Cache embeddings to avoid recomputation
Database Indexing: Ensure proper vector indexing in Qdrant
Response Streaming: Implement streaming for better UX with long responses

Deployment to Production

For production deployment:

Database: Deploy Qdrant to Render or another cloud provider
Backend: Deploy the API with environment variables
Frontend: Build and deploy the static site
Environment Variables: Set production URLs and API keys

Troubleshooting Common Issues

Qdrant Connection Failed: Check Docker container status and port mapping
API Key Issues: Verify Gemini API key validity and quota
Processing Errors: Check transcript file encoding (should be UTF-8)
Search Results Poor: Adjust chunk size or overlap parameters

Extending the Application

Ideas for enhancement:

Multi-podcast Support: Add support for different podcast series
Audio Integration: Add audio playback with timestamps
User Authentication: Add user accounts and conversation history
Advanced Search: Implement filters by episode, topic, or speaker
Export Features: Allow exporting chat conversations

This RAG chat application demonstrates the power of combining vector databases with large language models to create intelligent, context-aware applications. The same principles can be applied to documentation, customer support, or any domain with large amounts of text content.

MbitAI