Building a YouTube Transcript API with Next.js

#nextjs #api #youtube #webdev

If you've ever tried to programmatically access YouTube video transcripts, you know the pain. There's no official endpoint in the YouTube Data API v3 for captions text. You either scrape, reverse-engineer undocumented endpoints, or give up.

I didn't want to give up. I was building ScripTube, a tool that lets anyone paste a YouTube URL and get the full transcript instantly. Here's how I built the backend API using Next.js API routes.

The Problem

YouTube's Data API lets you list caption tracks for a video, but actually downloading the caption text requires OAuth on behalf of the video owner. That's useless if you want transcripts of videos you don't own.

The workaround: YouTube serves auto-generated and manual captions to every viewer through an internal endpoint. Several open-source libraries tap into this.

The Stack

Next.js 14 (App Router)
youtube-transcript npm package (or youtube-transcript-api for Python)
Vercel for deployment
Rate limiting via Upstash Redis

Step 1: The API Route

Create app/api/transcript/route.ts:

import { NextRequest, NextResponse } from 'next/server';
import { YoutubeTranscript } from 'youtube-transcript';

export async function POST(req: NextRequest) {
  try {
    const { url } = await req.json();
    if (!url) {
      return NextResponse.json({ error: 'URL is required' }, { status: 400 });
    }
    const videoId = extractVideoId(url);
    if (!videoId) {
      return NextResponse.json({ error: 'Invalid YouTube URL' }, { status: 400 });
    }
    const transcript = await YoutubeTranscript.fetchTranscript(videoId);
    const formatted = transcript.map((entry) => entry.text).join(' ');
    return NextResponse.json({ videoId, transcript: formatted, segments: transcript });
  } catch (error) {
    return NextResponse.json({ error: 'Failed to fetch transcript.' }, { status: 500 });
  }
}

Step 2: Extracting the Video ID

YouTube URLs come in many flavors. You need a robust parser:

function extractVideoId(url: string): string | null {
  const patterns = [
    /(?:youtube\.com\/watch\?v=)([a-zA-Z0-9_-]{11})/,
    /(?:youtu\.be\/)([a-zA-Z0-9_-]{11})/,
    /(?:youtube\.com\/embed\/)([a-zA-Z0-9_-]{11})/,
  ];
  for (const pattern of patterns) {
    const match = url.match(pattern);
    if (match) return match[1];
  }
  if (/^[a-zA-Z0-9_-]{11}$/.test(url)) return url;
  return null;
}

Step 3: Rate Limiting

Without rate limiting, your API will get hammered. I use Upstash Redis:

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '60 s'),
});

const ip = req.headers.get('x-forwarded-for') ?? '127.0.0.1';
const { success } = await ratelimit.limit(ip);
if (!success) {
  return NextResponse.json({ error: 'Rate limit exceeded.' }, { status: 429 });
}

Step 4: Formatting the Output

Raw transcript data comes as segments with text, offset, and duration. Serve both plain text and timestamped versions:

const plainText = transcript.map((s) => s.text).join(' ').replace(/\s+/g, ' ').trim();

const withTimestamps = transcript.map((s) => ({
  time: formatTimestamp(s.offset),
  text: s.text,
}));

function formatTimestamp(ms: number): string {
  const totalSeconds = Math.floor(ms / 1000);
  const minutes = Math.floor(totalSeconds / 60);
  const seconds = totalSeconds % 60;
  return `${minutes}:${seconds.toString().padStart(2, '0')}`;
}

Gotchas I Hit

Not all videos have transcripts. Some creators disable captions entirely.
Auto-generated captions have errors. Especially with technical terms and heavy accents.
YouTube occasionally changes internal endpoints. Pin your dependency versions.
Long videos = large payloads. A 3-hour video can be 50K+ words.