category: tutorial

Turn Any Video Into a Frame-by-Frame Breakdown

// a Claude Code skill that watches videos so you can dissect them

Jun 10, 2026 7 min read
🎥 🔍 🤖

> break down this video: https://...

What if you could drop a YouTube link or Instagram Reel into your terminal and get back two things: a frame-by-frame transcript of everything happening on screen, and an expert content-strategist analysis of why the video works (or doesn't)?

I built a Claude Code skill that does exactly that, and it's open source. This post covers what it does, how it works, and how to set it up for yourself in about two minutes.

Repo: github.com/MannJadwani/video-breakdown

What it does

You ask Claude Code something like:

break down this video: https://www.youtube.com/watch?v=...

And the skill:

  1. Downloads the video with yt-dlp (works with YouTube, Instagram Reels/posts, and anything else yt-dlp supports)
  2. Uploads it to the Gemini Files API and waits for processing
  3. Streams a structured breakdown from gemini-2.5-flash, one block per 1–2 second interval:
[Timestamp] MM:SS
[Action Description] What is happening in this moment.
[Scene Description] Setting, subjects, lighting, colors, composition, mood.
[Image Generation Prompt] A prompt to recreate this exact frame.
  1. Then — and this is the part I like most — Claude reads that transcript and analyzes it like a content strategist: hook analysis, structure and pacing with timestamps, audience and intent, strengths, weaknesses, and a prioritized list of improvements.

The raw transcript is useful on its own (the image-generation prompts let you re-storyboard a video shot by shot), but the analysis layer is the real deliverable. It's the difference between "here's what's in the video" and "here's why your first three seconds are losing viewers."

Why a skill and not just a script?

The repo actually started as a plain Python script. But Claude Code skills turn a script into something better: a capability Claude knows it has.

A skill is just a folder with a SKILL.md file — frontmatter describing when to use it, plus instructions for how. Once installed, you don't run anything manually. You mention a video URL in conversation, Claude recognizes the task matches the skill, bootstraps the environment, runs the pipeline, and writes the analysis.

The clean split:

The script extracts, Claude analyzes. The model never wastes effort transcribing what's on screen — Gemini does that — and the script never pretends to have opinions — Claude does that.

Setup

1. Get the skill

git clone https://github.com/MannJadwani/video-breakdown.git

# Global — available in every project:
cp -r video-breakdown/.claude/skills/video-breakdown ~/.claude/skills/

# — or — project-local, committed alongside one project:
cp -r video-breakdown/.claude/skills/video-breakdown your-project/.claude/skills/

# Optional: the /analyze-video slash command too:
cp video-breakdown/.claude/commands/analyze-video.md ~/.claude/commands/

2. Get a Gemini API key

Grab a free key at aistudio.google.com/apikey and put it in a .env file in your project root:

GEMINI_API_KEY=your_key_here

(The setup script scaffolds this file for you and gitignores it — never commit API keys.)

3. Install ffmpeg (if you don't have it)

yt-dlp needs ffmpeg to merge video and audio streams:

sudo apt install -y ffmpeg   # Debian/Ubuntu
brew install ffmpeg          # macOS

That's the only step needing sudo. Everything else — the Python venv, dependencies, .env scaffolding, .gitignore entries — is handled by an idempotent setup script that the skill runs automatically on first use:

bash .claude/skills/video-breakdown/scripts/setup.sh

It prints READY when good to go, or ACTION NEEDED with exactly what's missing. Safe to run any number of times; it only fixes what's broken.

Using it

Inside Claude Code (the intended way)

Just ask:

break down this video: https://www.instagram.com/reel/...

Or, with the slash command installed:

/analyze-video https://www.youtube.com/watch?v=...

Claude runs setup, extracts the transcript, and hands you the full strategist analysis — summary, beats with timestamps, hook critique, audience read, and concrete suggested improvements. Ask it to save everything to breakdown.md if you want an artifact.

Standalone (no Claude required)

The extraction script works fine on its own:

python3 -m venv .venv
.venv/bin/pip install -r .claude/skills/video-breakdown/requirements.txt
.venv/bin/python .claude/skills/video-breakdown/scripts/analyze.py "<VIDEO_URL>"

Flags:

Flag Purpose
--model gemini-2.5-flash Choose the Gemini model (default shown).
--output clip.mp4 Set the downloaded filename.
--cookies-from-browser firefox For private or login-walled content — reads cookies from your browser.

Tips from real use

  • Start short. Frame-by-frame analysis of a 20-minute video is slow and token-heavy. A 15–60 second clip is the sweet spot for a first run — which conveniently is exactly the length of the short-form content this is most useful for.
  • Instagram rate-limits hard. Public Reels work directly, but space out repeated downloads. For private posts, --cookies-from-browser firefox (or chrome) authenticates the download using your existing browser session.
  • "still processing..." is normal. Gemini's video processing time scales with clip length. The script polls until it's done — let it.
  • Recreate, don't just analyze. Each frame block includes a self-contained image-generation prompt. Feed those to an image model and you can rebuild a video's storyboard from scratch — useful for pitching a remake or studying a competitor's visual style.

How it's put together

The whole repo is small enough to read in one sitting:

.claude/
  skills/video-breakdown/
    SKILL.md              # what the skill is + how Claude should run it
    requirements.txt      # yt-dlp, google-genai, python-dotenv
    scripts/
      analyze.py          # download → upload → poll → stream breakdown
      setup.sh            # idempotent environment bootstrap
  commands/
    analyze-video.md      # the /analyze-video slash command
analyze.py                # the original standalone script
README.md

Want to build your own skill?

If you've been thinking about packaging your own tooling as a Claude Code skill, this repo is a decent minimal template: one SKILL.md, one script, one setup script, and a clear contract about what the code does versus what the model does.

Links

Clone it, break down a Reel, and tell me what your hook analysis says.

Mann Jadwani

Mann Jadwani

GenAI Gremlin. I build things that shouldn't work, but somehow do. Currently breaking prod at 3am.