DEV Community

Mate Technologies
Mate Technologies

Posted on

๐Ÿ”— Build a Professional Link Extractor GUI in Python (Step-by-Step)

In this tutorial, weโ€™ll build LinkVault, a professional desktop application that:

โœ… Recursively scans folders
โœ… Extracts URLs from .txt, .pdf, and .html files
โœ… Removes duplicate links automatically
โœ… Shows a smooth animated progress bar
โœ… Lets users export or copy links
โœ… Uses a modern PySide6 GUI

This guide is beginner-friendly and explains why each part exists.

๐Ÿงฐ Requirements

Before starting, install the dependencies:

pip install PySide6 PyPDF2

๐Ÿ“ Project Structure
linkvault/
โ”œโ”€โ”€ main.py
โ”œโ”€โ”€ logo.ico

๐Ÿง  Step 1: Import Required Modules

We start by importing Pythonโ€™s built-in modules and PySide6 components.

import os
import sys
import re
import subprocess
import platform
Enter fullscreen mode Exit fullscreen mode

Why these?

os, sys โ†’ file system access

re โ†’ extract URLs using regex

platform, subprocess โ†’ open links cross-platform

PySide6 GUI Imports
from PySide6.QtWidgets import (
    QApplication, QWidget, QFileDialog, QVBoxLayout, QHBoxLayout,
    QPushButton, QLabel, QLineEdit, QListWidget, QProgressBar,
    QMessageBox, QCheckBox
)
from PySide6.QtCore import Qt, QThread, Signal, QTimer
from PySide6.QtGui import QIcon, QGuiApplication
Enter fullscreen mode Exit fullscreen mode

These components allow us to build a modern desktop interface.

PDF Support

import PyPDF2
Enter fullscreen mode Exit fullscreen mode

This library lets us extract text (and links) from PDF files.

๐Ÿ“ฆ Step 2: Handle Bundled App Resources

When packaging with PyInstaller, files like icons need special handling.

def resource_path(file_name):
    base_path = getattr(sys, '_MEIPASS', os.path.dirname(os.path.abspath(__file__)))
    return os.path.join(base_path, file_name)
Enter fullscreen mode Exit fullscreen mode

This ensures logo.ico works both in development and packaged builds.

๐Ÿงต Step 3: Create a Worker Thread (Very Important!)
Why a Worker Thread?

If we scan files on the main GUI thread, the app will freeze.
We fix this using QThread.

Worker Thread Skeleton

class LinkExtractWorker(QThread):
    found = Signal(str)
    progress = Signal(int)
    finished = Signal()
Enter fullscreen mode Exit fullscreen mode

Signals allow safe communication from the worker to the UI.

Worker Initialization

def __init__(self, folder, file_types):
    super().__init__()
    self.folder = folder
    self.file_types = file_types
    self._running = True
    self.seen_links = set()
Enter fullscreen mode Exit fullscreen mode

seen_links prevents duplicates

_running allows cancellation

Stop the Worker Safely

def stop(self):
    self._running = False
Enter fullscreen mode Exit fullscreen mode

Walk the Folder Recursively

for root, dirs, files in os.walk(self.folder):
    for f in files:
        ext = os.path.splitext(f)[1].lower()
Enter fullscreen mode Exit fullscreen mode

We scan every subfolder automatically.

Filter Files by Type

if (ext == '.txt' and self.file_types['txt']) or \
   (ext == '.pdf' and self.file_types['pdf']) or \
   (ext in ['.html', '.htm'] and self.file_types['html']):
    all_files.append(os.path.join(root, f))
Enter fullscreen mode Exit fullscreen mode

Checkboxes control what gets scanned.

Extract URLs from Text & HTML

urls = re.findall(r'https?://[^\s"\'>]+', text)
Enter fullscreen mode Exit fullscreen mode

This regex matches most valid web URLs.

Extract URLs from PDFs

reader = PyPDF2.PdfReader(f)
for page in reader.pages:
    text = page.extract_text()

Enter fullscreen mode Exit fullscreen mode

We scan each page safely.

Emit Found Links

if url not in self.seen_links:
    self.seen_links.add(url)
    self.found.emit(url)
Enter fullscreen mode Exit fullscreen mode

No duplicates. Clean output.

Update Progress

percent = int((i + 1) / total_files * 100)
self.progress.emit(percent)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ–ฅ Step 4: Create the Main Application Window

class LinkExtractorApp(QWidget):
    def __init__(self):
        super().__init__()
Enter fullscreen mode Exit fullscreen mode

This class controls everything the user sees.

Window Setup

self.setWindowTitle("LinkVault โ€“ Professional Link Extractor")
self.setMinimumSize(1200, 680)
self.setWindowIcon(QIcon(resource_path("logo.ico")))
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ› Step 5: Build the User Interface
Folder Selection

self.path_input = QLineEdit()
self.path_input.setReadOnly(True)
Enter fullscreen mode Exit fullscreen mode

Users canโ€™t type paths manuallyโ€”only browse.

Buttons

browse_btn = QPushButton("๐Ÿ“‚ Browse Folder")
self.start_btn = QPushButton("๐Ÿš€ Extract Links")
self.cancel_btn = QPushButton("โน Cancel")
Enter fullscreen mode Exit fullscreen mode

Clear, emoji-based UX ๐Ÿ‘

File Type Filters

self.txt_checkbox = QCheckBox(".txt")
self.pdf_checkbox = QCheckBox(".pdf")
self.html_checkbox = QCheckBox(".html/.htm")
Enter fullscreen mode Exit fullscreen mode

Results List

self.results_list = QListWidget()
self.results_list.itemDoubleClicked.connect(self.open_item)
Enter fullscreen mode Exit fullscreen mode

Double-click opens links in the browser.

โœจ Step 6: Animated Progress Bar
Why Not Default?

We want a smooth glowing animation, not a jumpy bar.

Smooth Progress Logic

def update_progress_smooth(self):
    if self.smooth_value < self.target_progress:
        self.smooth_value += 1
Enter fullscreen mode Exit fullscreen mode

Glowing Gradient Effect

QProgressBar::chunk {
    background: qlineargradient(
        stop:0 #2563eb,
        stop:0.5 #60a5fa,
        stop:1 #2563eb
    );
}
Enter fullscreen mode Exit fullscreen mode

Looks professional and modern.

๐Ÿ“ค Step 7: Export & Clipboard Support
Export to TXT

with open(path, 'w', encoding='utf-8') as f:
    f.write(link + "\n")
Enter fullscreen mode Exit fullscreen mode

Copy to Clipboard

QGuiApplication.clipboard().setText(links)
Enter fullscreen mode Exit fullscreen mode

Works instantly across platforms.

๐ŸŒ Step 8: Open Links Cross-Platform

if platform.system() == "Windows":
    os.startfile(url)
elif platform.system() == "Darwin":
    subprocess.Popen(["open", url])
else:
    subprocess.Popen(["xdg-open", url])
Enter fullscreen mode Exit fullscreen mode

๐ŸŽจ Step 9: Apply Modern Styling

QWidget {
    background-color: #0f172a;
    color: #e5e7eb;
}
Enter fullscreen mode Exit fullscreen mode

Dark mode, rounded buttons, and soft colors.

โ–ถ Step 10: Run the App

if __name__ == "__main__":
    app = QApplication(sys.argv)
    window = LinkExtractorApp()
    window.show()
    sys.exit(app.exec())
Enter fullscreen mode Exit fullscreen mode

โœ… Final Features Recap

โœ” Recursive folder scanning
โœ” URL extraction from multiple formats
โœ” Duplicate removal
โœ” Cancelable background processing
โœ” Animated progress bar
โœ” Export & clipboard support
โœ” Modern UI

๐Ÿš€ Next Improvements (Optional)

CSV export

Domain grouping

Regex customization

Drag-and-drop folders

URL validation

LinkVault

Top comments (0)