If you've ever tried to programmatically access YouTube video transcripts, you know the pain. There's no official endpoint in the YouTube Data API v3 for captions text. You either scrape, reverse-engineer undocumented endpoints, or give up.
I didn't want to give up. I was building ScripTube, a tool that lets anyone paste a YouTube URL and get the full transcript instantly. Here's how I built the backend API using Next.js API routes.
The Problem
YouTube's Data API lets you list caption tracks for a video, but actually downloading the caption text requires OAuth on behalf of the video owner. That's useless if you want transcripts of videos you don't own.
The workaround: YouTube serves auto-generated and manual captions to every viewer through an internal endpoint. Several open-source libraries tap into this.
The Stack
- Next.js 14 (App Router)
-
youtube-transcript npm package (or
youtube-transcript-apifor Python) - Vercel for deployment
- Rate limiting via Upstash Redis
Step 1: The API Route
Create app/api/transcript/route.ts:
import { NextRequest, NextResponse } from 'next/server';
import { YoutubeTranscript } from 'youtube-transcript';
export async function POST(req: NextRequest) {
try {
const { url } = await req.json();
if (!url) {
return NextResponse.json({ error: 'URL is required' }, { status: 400 });
}
const videoId = extractVideoId(url);
if (!videoId) {
return NextResponse.json({ error: 'Invalid YouTube URL' }, { status: 400 });
}
const transcript = await YoutubeTranscript.fetchTranscript(videoId);
const formatted = transcript.map((entry) => entry.text).join(' ');
return NextResponse.json({ videoId, transcript: formatted, segments: transcript });
} catch (error) {
return NextResponse.json({ error: 'Failed to fetch transcript.' }, { status: 500 });
}
}
Step 2: Extracting the Video ID
YouTube URLs come in many flavors. You need a robust parser:
function extractVideoId(url: string): string | null {
const patterns = [
/(?:youtube\.com\/watch\?v=)([a-zA-Z0-9_-]{11})/,
/(?:youtu\.be\/)([a-zA-Z0-9_-]{11})/,
/(?:youtube\.com\/embed\/)([a-zA-Z0-9_-]{11})/,
];
for (const pattern of patterns) {
const match = url.match(pattern);
if (match) return match[1];
}
if (/^[a-zA-Z0-9_-]{11}$/.test(url)) return url;
return null;
}
Step 3: Rate Limiting
Without rate limiting, your API will get hammered. I use Upstash Redis:
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, '60 s'),
});
const ip = req.headers.get('x-forwarded-for') ?? '127.0.0.1';
const { success } = await ratelimit.limit(ip);
if (!success) {
return NextResponse.json({ error: 'Rate limit exceeded.' }, { status: 429 });
}
Step 4: Formatting the Output
Raw transcript data comes as segments with text, offset, and duration. Serve both plain text and timestamped versions:
const plainText = transcript.map((s) => s.text).join(' ').replace(/\s+/g, ' ').trim();
const withTimestamps = transcript.map((s) => ({
time: formatTimestamp(s.offset),
text: s.text,
}));
function formatTimestamp(ms: number): string {
const totalSeconds = Math.floor(ms / 1000);
const minutes = Math.floor(totalSeconds / 60);
const seconds = totalSeconds % 60;
return `${minutes}:${seconds.toString().padStart(2, '0')}`;
}
Gotchas I Hit
- Not all videos have transcripts. Some creators disable captions entirely.
- Auto-generated captions have errors. Especially with technical terms and heavy accents.
- YouTube occasionally changes internal endpoints. Pin your dependency versions.
- Long videos = large payloads. A 3-hour video can be 50K+ words.
Deployment
Deploy to Vercel with vercel --prod. The API routes become serverless functions automatically.
This is essentially what powers ScripTube. The architecture is simple — the complexity is in the edge cases.
If you're building something similar, start simple. One input. One button. Ship it.
Top comments (0)