Architecture¶

System architecture and module design of Loups.

System Overview¶

Loups is built as a modular Python application with clear separation of concerns:

graph TB
    subgraph "User Interface Layer"
        CLI[CLI Module<br/>loups.cli]
        API[Python API<br/>loups.Loups]
    end

    subgraph "Core Processing Layer"
        CORE[Core Engine<br/>loups.loups.Loups]
        SCAN[Template Scanner<br/>match_template_scan]
        OCR[OCR Engine<br/>EasyOCR]
        THUMB[Thumbnail Extractor<br/>thumbnail_extractor]
    end

    subgraph "Utility Layer"
        FRAME[Frame Utils<br/>frame_utils]
        GEOM[Geometry<br/>geometry]
        TIME[Time Utils<br/>MilliSecond]
    end

    subgraph "External Dependencies"
        CV[OpenCV<br/>Video I/O]
        EOCR[EasyOCR<br/>Text Recognition]
        SSIM[scikit-image<br/>SSIM Matching]
    end

    CLI --> CORE
    API --> CORE

    CORE --> SCAN
    CORE --> OCR
    CORE --> THUMB
    CORE --> TIME

    SCAN --> FRAME
    SCAN --> GEOM
    THUMB --> FRAME

    FRAME --> CV
    SCAN --> CV
    OCR --> EOCR
    THUMB --> SSIM

    style CLI fill:#00ffff,stroke:#000,color:#000
    style API fill:#00ffff,stroke:#000,color:#000
    style CORE fill:#00b8d4,stroke:#000,color:#fff
    style SCAN fill:#00b8d4,stroke:#000,color:#fff
    style OCR fill:#00b8d4,stroke:#000,color:#fff
    style THUMB fill:#00b8d4,stroke:#000,color:#fff

Module Breakdown¶

User Interface Layer¶

CLI Module (`loups.cli`)¶

Purpose: Command-line interface using Typer and Rich

Responsibilities: - Parse command-line arguments - Display rich progress bars and output - Handle user interactions - Route to main command or thumbnail subcommand

Key Components: - app: Typer application instance - main(): Main chapter scanning command - thumbnail_command(): Thumbnail extraction subcommand

Dependencies: - Typer (CLI framework) - Rich (terminal formatting) - Core Loups engine

Python API (`loups.Loups`)¶

Purpose: Programmatic interface for Python developers

Responsibilities: - Provide clean Python API - Manage video processing workflow - Return structured chapter data

Core Processing Layer¶

Core Engine (`loups.loups.Loups`)¶

Purpose: Main orchestration class

Responsibilities: - Initialize video capture - Coordinate template matching - Manage OCR extraction - Generate chapter timestamps - Handle logging and error management

Key Methods:

class Loups:
    def __init__(video_path, template_path, **kwargs)
    def scan() -> List[Chapter]
    def _process_frame(frame_num, frame) -> Optional[Match]
    def _extract_text(frame, match_region) -> str

Data Flow: 1. Load video and template 2. Scan frames for template matches 3. Extract text from matched regions via OCR 4. Generate timestamped chapters 5. Return structured results

Template Scanner (`match_template_scan`)¶

Purpose: OpenCV template matching logic

Responsibilities: - Perform template matching on video frames - Calculate match confidence scores - Identify match regions (bounding boxes) - Filter matches by confidence threshold

Algorithm:

def match_template(frame, template, threshold=0.8):
    """Match template against frame using cv2.matchTemplate."""
    # Normalize frame and template
    # Perform template matching (TM_CCOEFF_NORMED)
    # Find matches above threshold
    # Return match locations and confidence

OCR Engine (EasyOCR Integration)¶

Purpose: Text extraction from matched frames

Responsibilities: - Initialize EasyOCR reader - Extract text from frame regions - Apply confidence filtering - Sort text left-to-right - Combine into chapter titles

Configuration: - Languages: English (default) - Confidence threshold: 0.6 (configurable) - GPU acceleration: Auto-detected

Thumbnail Extractor (`thumbnail_extractor`)¶

Purpose: SSIM-based thumbnail extraction

Responsibilities: - Load thumbnail template - Scan video frames with SSIM matching - Find first match above threshold - Save matched frame as JPEG

Algorithm:

def extract_thumbnail(video_path, template_path, threshold=0.35):
    """Extract thumbnail using SSIM scoring."""
    # Load template
    # Iterate frames (limited by scan_duration)
    # Calculate SSIM for each frame
    # Return first frame above threshold
    # Save as JPEG

Utility Layer¶

Frame Utils (`frame_utils`)¶

Purpose: Video frame manipulation utilities

Functions: - extract_frame(video_path, frame_num) - Get specific frame - get_video_fps(video_path) - Get video frame rate - get_video_duration(video_path) - Get video length - resize_frame(frame, width, height) - Resize operations

Geometry (`geometry`)¶

Purpose: Bounding box and region calculations

Functions: - calculate_region(match_loc, template_size) - Compute match region - crop_region(frame, region) - Extract frame region - merge_overlapping_regions(regions) - Combine close matches

Time Utils (`MilliSecond`)¶

Purpose: Timestamp formatting

Class:

class MilliSecond:
    """Convert milliseconds to YouTube timestamp format."""

    def __init__(ms: int)
    def yt_format() -> str  # Returns "HH:MM:SS" or "MM:SS"

Data Flow¶

Chapter Scanning Flow¶

graph LR
    A[Video File] --> B[Load Video]
    C[Template File] --> D[Load Template]

    B --> E[Frame Iterator]
    D --> E

    E --> F{For Each Frame}
    F --> G[Template Match]

    G --> H{Match Found?}
    H -->|No| F
    H -->|Yes| I[Extract Region]

    I --> J[Run OCR]
    J --> K[Parse Text]
    K --> L[Create Chapter]

    L --> M{More Frames?}
    M -->|Yes| F
    M -->|No| N[Return Chapters]

    style A fill:#00ffff,stroke:#000,color:#000
    style C fill:#00ffff,stroke:#000,color:#000
    style N fill:#00ffff,stroke:#000,color:#000
    style G fill:#00b8d4,stroke:#000,color:#fff
    style J fill:#00b8d4,stroke:#000,color:#fff

Thumbnail Extraction Flow¶

graph LR
    A[Video File] --> B[Load Video]
    C[Thumbnail Template] --> D[Load Template]

    B --> E[Frame Sampler<br/>scan_duration, fps]
    D --> E

    E --> F{Sample Frame}
    F --> G[Resize to Template Size]

    G --> H[Calculate SSIM]
    H --> I{Score >= Threshold?}

    I -->|No| J{More Frames?}
    J -->|Yes| F
    J -->|No| K[No Match Found]

    I -->|Yes| L[Match Found]
    L --> M[Save JPEG]

    style A fill:#00ffff,stroke:#000,color:#000
    style C fill:#00ffff,stroke:#000,color:#000
    style M fill:#66bb6a,stroke:#000,color:#000
    style K fill:#ef5350,stroke:#000,color:#fff
    style H fill:#00b8d4,stroke:#000,color:#fff

External Dependencies¶

OpenCV (`opencv-python-headless`)¶

Usage: - Video capture and frame extraction - Template matching (cv2.matchTemplate) - Image operations (resize, crop, color conversion)

Why headless? - Smaller package size - No GUI dependencies needed - Perfect for server/CLI use

EasyOCR¶

Usage: - Optical Character Recognition - Text detection and extraction - Confidence scoring

Features Used: - Multi-language support (English by default) - GPU acceleration when available - Bounding box detection - Confidence scores

scikit-image¶

Usage: - SSIM (Structural Similarity Index) calculation - Image comparison for thumbnail matching

Why SSIM? - Perceptually meaningful similarity metric - Robust to minor variations - Better than pixel-by-pixel comparison

Rich¶

Usage: - Beautiful terminal progress bars - Colored console output - Table formatting - Error display

Typer¶

Usage: - CLI framework - Argument parsing - Subcommand routing - Help generation

Configuration & Settings¶

Environment Variables¶

# Debug mode
LOUPS_DEBUG=1

# Custom OCR languages
LOUPS_OCR_LANG=en,es

# GPU usage
LOUPS_USE_GPU=0  # Disable GPU

Config File Support (Future)¶

# loups.yaml (planned)
defaults:
  threshold: 0.8
  ocr_confidence: 0.6
  log_level: INFO

templates:
  softball: path/to/softball_template.png
  podcast: path/to/podcast_template.png

Design Patterns¶

Separation of Concerns¶

Each module has a single, well-defined responsibility:

CLI - User interaction only
Core - Business logic orchestration
Utils - Reusable helper functions
Dependencies - External library wrappers

Dependency Injection¶

class Loups:
    def __init__(
        self,
        video_path: str,
        template_path: str,
        ocr_engine: Optional[OCREngine] = None,  # Injectable
        video_reader: Optional[VideoReader] = None  # Injectable
    ):
        self.ocr_engine = ocr_engine or DefaultOCREngine()
        self.video_reader = video_reader or OpenCVReader(video_path)

Error Handling¶

Consistent error handling throughout:

try:
    loups = Loups(video_path, template_path)
    chapters = loups.scan()
except FileNotFoundError as e:
    logger.error(f"File not found: {e}")
    raise
except OCRError as e:
    logger.error(f"OCR failed: {e}")
    raise
except Exception as e:
    logger.error(f"Unexpected error: {e}")
    raise

Testing Architecture¶

Test Structure¶

tests/
├── test_loups.py              # Core Loups class
├── test_cli.py                # CLI commands
├── test_thumbnail.py          # Thumbnail extraction
├── test_match_template.py     # Template matching
├── test_frame_utils.py        # Frame utilities
├── fixtures/                  # Test data
│   ├── test_video.mp4
│   ├── test_template.png
│   └── expected_output.txt
└── conftest.py                # Pytest configuration

Test Categories¶

Type	Coverage	Tools
Unit Tests	Individual functions	pytest
Integration Tests	Module interactions	pytest
E2E Tests	Full workflow	pytest + fixtures
CLI Tests	Command execution	Typer CliRunner

How It Works - Detailed implementation
Contributing - Contribution guide
API Reference - API documentation