After building ScripTube (scriptube.me) as a web app, the most requested feature was a browser extension. Users wanted to extract transcripts without leaving the YouTube page they were already on.
Here's how I built a Chrome extension for YouTube transcript extraction, and what I learned along the way.
The Architecture
A Chrome extension has three main components:
-
Manifest (
manifest.json) — declares permissions and entry points - Content Script — runs on YouTube pages, interacts with the DOM
- Popup — the UI that appears when you click the extension icon
Step 1: The Manifest
{
"manifest_version": 3,
"name": "YouTube Transcript Extractor",
"version": "1.0.0",
"description": "Extract transcripts from YouTube videos in one click",
"permissions": ["activeTab"],
"action": {
"default_popup": "popup.html",
"default_icon": "icon.png"
},
"content_scripts": [{
"matches": ["*://www.youtube.com/watch*"],
"js": ["content.js"]
}]
}
Note: we only need activeTab permission. No broad host permissions needed, which keeps the permission prompt minimal during installation.
Step 2: The Content Script
The content script runs on YouTube watch pages and exposes a method to extract the transcript:
// content.js
async function extractTranscript() {
// Click the "..." menu button
const menuButton = document.querySelector(
'button.yt-spec-button-shape-next--icon-button'
);
// Instead of DOM manipulation, use YouTube's internal API
// The transcript data is available in the page's ytInitialData
const scripts = document.querySelectorAll('script');
for (const script of scripts) {
if (script.textContent.includes('captionTracks')) {
const match = script.textContent.match(
/"captionTracks":(\[.*?\])/
);
if (match) {
const tracks = JSON.parse(match[1]);
const englishTrack = tracks.find(
t => t.languageCode === 'en'
) || tracks[0];
if (englishTrack) {
const response = await fetch(englishTrack.baseUrl);
const xml = await response.text();
return parseTranscriptXML(xml);
}
}
}
}
return null;
}
function parseTranscriptXML(xml) {
const parser = new DOMParser();
const doc = parser.parseFromString(xml, 'text/xml');
const texts = doc.querySelectorAll('text');
return Array.from(texts).map(node => ({
text: node.textContent
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/'/g, "'")
.replace(/"/g, '"'),
start: parseFloat(node.getAttribute('start')),
duration: parseFloat(node.getAttribute('dur')),
}));
}
// Listen for messages from the popup
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
if (request.action === 'getTranscript') {
extractTranscript().then(transcript => {
sendResponse({ transcript });
});
return true; // Keep the message channel open for async response
}
});
Step 3: The Popup UI
<!-- popup.html -->
<!DOCTYPE html>
<html>
<head>
<style>
body {
width: 400px;
padding: 16px;
font-family: -apple-system, sans-serif;
}
#transcript {
max-height: 300px;
overflow-y: auto;
white-space: pre-wrap;
font-size: 13px;
line-height: 1.5;
padding: 12px;
background: #f5f5f5;
border-radius: 8px;
margin-top: 12px;
}
button {
background: #ff0000;
color: white;
border: none;
padding: 10px 20px;
border-radius: 6px;
cursor: pointer;
font-size: 14px;
width: 100%;
}
button:hover { background: #cc0000; }
.copy-btn { background: #333; margin-top: 8px; }
.copy-btn:hover { background: #555; }
</style>
</head>
<body>
<h3>📝 Transcript Extractor</h3>
<button id="extract">Extract Transcript</button>
<button id="copy" class="copy-btn" style="display:none">
Copy to Clipboard
</button>
<div id="transcript"></div>
<script src="popup.js"></script>
</body>
</html>
// popup.js
document.getElementById('extract').addEventListener('click', async () => {
const [tab] = await chrome.tabs.query({
active: true,
currentWindow: true,
});
if (!tab.url.includes('youtube.com/watch')) {
document.getElementById('transcript').textContent =
'Please navigate to a YouTube video first.';
return;
}
document.getElementById('extract').textContent = 'Extracting...';
chrome.tabs.sendMessage(
tab.id,
{ action: 'getTranscript' },
(response) => {
const el = document.getElementById('transcript');
if (response?.transcript) {
const text = response.transcript
.map((s) => s.text)
.join(' ');
el.textContent = text;
document.getElementById('copy').style.display = 'block';
} else {
el.textContent = 'No transcript available for this video.';
}
document.getElementById('extract').textContent =
'Extract Transcript';
}
);
});
document.getElementById('copy').addEventListener('click', () => {
const text = document.getElementById('transcript').textContent;
navigator.clipboard.writeText(text).then(() => {
document.getElementById('copy').textContent = 'Copied!';
setTimeout(() => {
document.getElementById('copy').textContent = 'Copy to Clipboard';
}, 2000);
});
});
Gotchas and Lessons
1. YouTube is a Single Page Application. When a user navigates between videos, the page doesn't reload — YouTube uses client-side routing. Your content script needs to handle this. The simplest approach: re-extract data each time the popup opens rather than trying to track page changes.
2. The page data changes. YouTube frequently updates their internal data structure. The captionTracks approach works today, but might need adjustments. Build with this fragility in mind — add error handling and fallback methods.
3. Manifest V3 is async. If you're used to Manifest V2, the service worker model takes adjustment. The popup's lifecycle is short — it dies when closed. Don't store state in the popup.
4. Keep permissions minimal. Users are wary of extension permissions. activeTab is all you need for this use case. No need for broad host access.
Publish to Chrome Web Store
- Zip the extension folder
- Pay the one-time $5 developer fee
- Submit through the Chrome Developer Dashboard
- Wait for review (usually 1-3 business days)
The Web App Alternative
Building and maintaining a Chrome extension adds complexity — review processes, platform-specific APIs, update distribution. For many users, a web app like ScripTube (scriptube.me) is simpler: paste the URL, get the transcript, done. No installation required.
But for power users who extract transcripts frequently, the extension removes one step (copying the URL) and that convenience matters.
Top comments (0)