Skip to content

Architecture

System architecture and module design of Loups.


๐Ÿ— System Overview

Loups is built as a modular Python application with clear separation of concerns:

graph TB
    subgraph "User Interface Layer"
        CLI[CLI Module<br/>loups.cli]
        API[Python API<br/>loups.Loups]
    end

    subgraph "Core Processing Layer"
        CORE[Core Engine<br/>loups.loups.Loups]
        SCAN[Template Scanner<br/>match_template_scan]
        OCR[OCR Engine<br/>EasyOCR]
        THUMB[Thumbnail Extractor<br/>thumbnail_extractor]
    end

    subgraph "Utility Layer"
        FRAME[Frame Utils<br/>frame_utils]
        GEOM[Geometry<br/>geometry]
        TIME[Time Utils<br/>MilliSecond]
    end

    subgraph "External Dependencies"
        CV[OpenCV<br/>Video I/O]
        EOCR[EasyOCR<br/>Text Recognition]
        SSIM[scikit-image<br/>SSIM Matching]
    end

    CLI --> CORE
    API --> CORE

    CORE --> SCAN
    CORE --> OCR
    CORE --> THUMB
    CORE --> TIME

    SCAN --> FRAME
    SCAN --> GEOM
    THUMB --> FRAME

    FRAME --> CV
    SCAN --> CV
    OCR --> EOCR
    THUMB --> SSIM

    style CLI fill:#00ffff,stroke:#000,color:#000
    style API fill:#00ffff,stroke:#000,color:#000
    style CORE fill:#00b8d4,stroke:#000,color:#fff
    style SCAN fill:#00b8d4,stroke:#000,color:#fff
    style OCR fill:#00b8d4,stroke:#000,color:#fff
    style THUMB fill:#00b8d4,stroke:#000,color:#fff

๐Ÿ“ฆ Module Breakdown

User Interface Layer

CLI Module (loups.cli)

Purpose: Command-line interface using Typer and Rich

Responsibilities: - Parse command-line arguments - Display rich progress bars and output - Handle user interactions - Route to main command or thumbnail subcommand

Key Components: - app: Typer application instance - main(): Main chapter scanning command - thumbnail_command(): Thumbnail extraction subcommand

Dependencies: - Typer (CLI framework) - Rich (terminal formatting) - Core Loups engine

Python API (loups.Loups)

Purpose: Programmatic interface for Python developers

Responsibilities: - Provide clean Python API - Manage video processing workflow - Return structured chapter data


Core Processing Layer

Core Engine (loups.loups.Loups)

Purpose: Main orchestration class

Responsibilities: - Initialize video capture - Coordinate template matching - Manage OCR extraction - Generate chapter timestamps - Handle logging and error management

Key Methods:

class Loups:
    def __init__(video_path, template_path, **kwargs)
    def scan() -> List[Chapter]
    def _process_frame(frame_num, frame) -> Optional[Match]
    def _extract_text(frame, match_region) -> str

Data Flow: 1. Load video and template 2. Scan frames for template matches 3. Extract text from matched regions via OCR 4. Generate timestamped chapters 5. Return structured results

Template Scanner (match_template_scan)

Purpose: OpenCV template matching logic

Responsibilities: - Perform template matching on video frames - Calculate match confidence scores - Identify match regions (bounding boxes) - Filter matches by confidence threshold

Algorithm:

def match_template(frame, template, threshold=0.8):
    """Match template against frame using cv2.matchTemplate."""
    # Normalize frame and template
    # Perform template matching (TM_CCOEFF_NORMED)
    # Find matches above threshold
    # Return match locations and confidence

OCR Engine (EasyOCR Integration)

Purpose: Text extraction from matched frames

Responsibilities: - Initialize EasyOCR reader - Extract text from frame regions - Apply confidence filtering - Sort text left-to-right - Combine into chapter titles

Configuration: - Languages: English (default) - Confidence threshold: 0.6 (configurable) - GPU acceleration: Auto-detected

Thumbnail Extractor (thumbnail_extractor)

Purpose: SSIM-based thumbnail extraction

Responsibilities: - Load thumbnail template - Scan video frames with SSIM matching - Find first match above threshold - Save matched frame as JPEG

Algorithm:

def extract_thumbnail(video_path, template_path, threshold=0.35):
    """Extract thumbnail using SSIM scoring."""
    # Load template
    # Iterate frames (limited by scan_duration)
    # Calculate SSIM for each frame
    # Return first frame above threshold
    # Save as JPEG


Utility Layer

Frame Utils (frame_utils)

Purpose: Video frame manipulation utilities

Functions: - extract_frame(video_path, frame_num) - Get specific frame - get_video_fps(video_path) - Get video frame rate - get_video_duration(video_path) - Get video length - resize_frame(frame, width, height) - Resize operations

Geometry (geometry)

Purpose: Bounding box and region calculations

Functions: - calculate_region(match_loc, template_size) - Compute match region - crop_region(frame, region) - Extract frame region - merge_overlapping_regions(regions) - Combine close matches

Time Utils (MilliSecond)

Purpose: Timestamp formatting

Class:

class MilliSecond:
    """Convert milliseconds to YouTube timestamp format."""

    def __init__(ms: int)
    def yt_format() -> str  # Returns "HH:MM:SS" or "MM:SS"


๐Ÿ”„ Data Flow

Chapter Scanning Flow

graph LR
    A[Video File] --> B[Load Video]
    C[Template File] --> D[Load Template]

    B --> E[Frame Iterator]
    D --> E

    E --> F{For Each Frame}
    F --> G[Template Match]

    G --> H{Match Found?}
    H -->|No| F
    H -->|Yes| I[Extract Region]

    I --> J[Run OCR]
    J --> K[Parse Text]
    K --> L[Create Chapter]

    L --> M{More Frames?}
    M -->|Yes| F
    M -->|No| N[Return Chapters]

    style A fill:#00ffff,stroke:#000,color:#000
    style C fill:#00ffff,stroke:#000,color:#000
    style N fill:#00ffff,stroke:#000,color:#000
    style G fill:#00b8d4,stroke:#000,color:#fff
    style J fill:#00b8d4,stroke:#000,color:#fff

Thumbnail Extraction Flow

graph LR
    A[Video File] --> B[Load Video]
    C[Thumbnail Template] --> D[Load Template]

    B --> E[Frame Sampler<br/>scan_duration, fps]
    D --> E

    E --> F{Sample Frame}
    F --> G[Resize to Template Size]

    G --> H[Calculate SSIM]
    H --> I{Score >= Threshold?}

    I -->|No| J{More Frames?}
    J -->|Yes| F
    J -->|No| K[No Match Found]

    I -->|Yes| L[Match Found]
    L --> M[Save JPEG]

    style A fill:#00ffff,stroke:#000,color:#000
    style C fill:#00ffff,stroke:#000,color:#000
    style M fill:#66bb6a,stroke:#000,color:#000
    style K fill:#ef5350,stroke:#000,color:#fff
    style H fill:#00b8d4,stroke:#000,color:#fff

๐Ÿ”Œ External Dependencies

OpenCV (opencv-python-headless)

Usage: - Video capture and frame extraction - Template matching (cv2.matchTemplate) - Image operations (resize, crop, color conversion)

Why headless? - Smaller package size - No GUI dependencies needed - Perfect for server/CLI use

EasyOCR

Usage: - Optical Character Recognition - Text detection and extraction - Confidence scoring

Features Used: - Multi-language support (English by default) - GPU acceleration when available - Bounding box detection - Confidence scores

scikit-image

Usage: - SSIM (Structural Similarity Index) calculation - Image comparison for thumbnail matching

Why SSIM? - Perceptually meaningful similarity metric - Robust to minor variations - Better than pixel-by-pixel comparison

Rich

Usage: - Beautiful terminal progress bars - Colored console output - Table formatting - Error display

Typer

Usage: - CLI framework - Argument parsing - Subcommand routing - Help generation


โš™ Configuration & Settings

Environment Variables

# Debug mode
LOUPS_DEBUG=1

# Custom OCR languages
LOUPS_OCR_LANG=en,es

# GPU usage
LOUPS_USE_GPU=0  # Disable GPU

Config File Support (Future)

# loups.yaml (planned)
defaults:
  threshold: 0.8
  ocr_confidence: 0.6
  log_level: INFO

templates:
  softball: path/to/softball_template.png
  podcast: path/to/podcast_template.png

๐Ÿšง Design Patterns

Separation of Concerns

Each module has a single, well-defined responsibility:

  • CLI - User interaction only
  • Core - Business logic orchestration
  • Utils - Reusable helper functions
  • Dependencies - External library wrappers

Dependency Injection

class Loups:
    def __init__(
        self,
        video_path: str,
        template_path: str,
        ocr_engine: Optional[OCREngine] = None,  # Injectable
        video_reader: Optional[VideoReader] = None  # Injectable
    ):
        self.ocr_engine = ocr_engine or DefaultOCREngine()
        self.video_reader = video_reader or OpenCVReader(video_path)

Error Handling

Consistent error handling throughout:

try:
    loups = Loups(video_path, template_path)
    chapters = loups.scan()
except FileNotFoundError as e:
    logger.error(f"File not found: {e}")
    raise
except OCRError as e:
    logger.error(f"OCR failed: {e}")
    raise
except Exception as e:
    logger.error(f"Unexpected error: {e}")
    raise

๐Ÿงช Testing Architecture

Test Structure

tests/
โ”œโ”€โ”€ test_loups.py              # Core Loups class
โ”œโ”€โ”€ test_cli.py                # CLI commands
โ”œโ”€โ”€ test_thumbnail.py          # Thumbnail extraction
โ”œโ”€โ”€ test_match_template.py     # Template matching
โ”œโ”€โ”€ test_frame_utils.py        # Frame utilities
โ”œโ”€โ”€ fixtures/                  # Test data
โ”‚   โ”œโ”€โ”€ test_video.mp4
โ”‚   โ”œโ”€โ”€ test_template.png
โ”‚   โ””โ”€โ”€ expected_output.txt
โ””โ”€โ”€ conftest.py                # Pytest configuration

Test Categories

Type Coverage Tools
Unit Tests Individual functions pytest
Integration Tests Module interactions pytest
E2E Tests Full workflow pytest + fixtures
CLI Tests Command execution Typer CliRunner