Mason K Posted on May 31 Pick a better video thumbnail automatically with FFmpeg, PySceneDetect, and CLIP # ai # tutorial # video # python TL;DR We'll build a pipeline that takes any video file, extracts candidate frames with FFmpeg and PySceneDetect, filters out blurry ones with OpenCV, scores each candidate with OpenCLIP against a small prompt set, and picks the top-K thumbnails with a diversity constraint. ~200 lines of Python, GPU-accelerated, fully local. 📦 Code: github.com/USER/auto-thumbnail-picker (replace before publishing) The default thumbnail your encoder generates is "the middle frame." For most videos, the middle frame is a motion blur, a transition, or someone mid-blink. We can do much better with about an hour of effort. Here's the pipeline. Versions python 3.12 ffmpeg 7.1 pyscenedetect 0.6.x open-clip-torch 2.x opencv-python 4.x torch 2.x ( with CUDA or MPS ) Enter fullscreen mode Exit fullscreen mode The pipeline runs on CPU but is noticeably slower for the CLIP step. A consumer GPU (or Apple Silicon MPS) cuts the per-frame encode to a few milliseconds. What we're building Extract candidate frames (shot boundaries + uniform sampling). Filter blurry frames out with Laplacian variance. Score each remaining frame with OpenCLIP using a positive / negative prompt set. Apply structural rules (prefer frames with faces). Pick top-K with shot-level diversity. 1. Setup python -m venv .venv && source .venv/bin/activate pip install --break-system-packages \ open-clip-torch \ scenedetect[opencv] \ opencv-python-headless \ pillow \ torch torchvision Enter fullscreen mode Exit fullscreen mode For face detection we'll keep it simple and use OpenCV's built-in Haar cascades. If you need higher accuracy on small faces, swap to insightface later. 2. Extract candidate frames Two sources of candidates: shot boundaries (one per shot) and uniform sampling (one every N seconds). Combine both, then dedupe by timestamp. # extract.py import subprocess , os , pathlib
Back to Home

Pick a better video thumbnail automatically with FFmpeg, PySceneDetect, and CLIP
B
Blizine Admin
·2 min read·0 views
📰Dev.to — dev.to
B
Blizine Admin
View Profile Staff Writer