name: summarize description: Content extraction skill for AI coding agents. YouTube transcripts, podcasts, PDFs, images (OCR), audio/video via the summarize CLI.

Summarize

Extract clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without hand-coding brittle scraper logic.

Use this skill when you need deterministic extraction for YouTube, podcast feeds, PDFs, scanned images, or local media files.

Terminology used in this file:

DOM: Document Object Model, the page element structure used by browser-based extractors.
OCR: Optical character recognition (extracting text from images/scans).
ANSI codes: Terminal color/control sequences; --plain removes them for machine parsing.

Setup

brew tap steipete/tap
brew install summarize

Claude Code: copy this skill folder into .claude/skills/summarize/
Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, optional dependencies, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the summarize skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Quick Start

Run one extraction flow end-to-end:

summarize --version
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
summarize --extract "/path/to/document.pdf" --plain

Use --extract --plain as the default pattern for deterministic, non-ANSI output.

Decision Tree: summarize vs Other Tools

Need content from the web?
  |
  +-- Static web page (article, docs, blog)?
  |     --> WebFetch (built-in, zero deps, faster)
  |     --> Jina r.jina.ai (zero install alternative)
  |     --> summarize ONLY if above tools fail or return garbage
  |
  +-- JS-heavy SPA / dynamic content?
  |     --> Crawl4AI crwl (full browser rendering)
  |     --> summarize will NOT help here (no JS rendering)
  |
  +-- Anti-bot / paywalled / Cloudflare-protected?
  |     --> summarize --firecrawl always (requires FIRECRAWL_API_KEY)
  |     --> browser-based workflow as fallback
  |
  +-- YouTube video?
  |     --> summarize --extract (ONLY option for transcript)
  |     --> Add --youtube web for captions-only (faster)
  |     --> Add --slides for visual slide extraction
  |
  +-- Podcast / RSS feed?
  |     --> summarize --extract (ONLY option)
  |     --> Supports Apple Podcasts, Spotify, RSS feeds, Podbean, etc.
  |
  +-- PDF (URL or local file)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: uvx/markitdown (brew install uv)
  |
  +-- Image (OCR)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: tesseract
  |
  +-- Audio / video file?
        --> summarize --extract (ONLY CLI option)
        --> Requires: whisper-cli (local) or OPENAI_API_KEY (cloud)

Rule of thumb: summarize is the default for media extraction (YouTube, podcasts, audio, video, images). For web pages, prefer WebFetch/Jina/Crawl4AI depending on DOM complexity (how hard the page structure is to parse). Use summarize for web only when other tools fail.

Extraction Mode (Primary)

--extract prints raw extracted content and exits. No LLM involved. Use this first. You can handle any downstream synthesis in your own workflow.

# Web page extraction (plain text, default)
summarize --extract "https://example.com" --plain

# Web page extraction (markdown format)
summarize --extract "https://example.com" --format md --plain

# YouTube transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain

# YouTube transcript with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain

# YouTube transcript formatted as markdown (requires LLM -- uses API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain

# YouTube slides + transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --slides --plain

# Podcast (RSS feed)
summarize --extract "https://feeds.example.com/podcast.xml" --plain

# Apple Podcasts episode
summarize --extract "https://podcasts.apple.com/us/podcast/EPISODE_ID" --plain

# PDF from URL
summarize --extract "https://example.com/document.pdf" --plain

# PDF from local file
summarize --extract "/path/to/document.pdf" --plain

# Image OCR
summarize --extract "/path/to/image.png" --plain

# Audio transcription
summarize --extract "/path/to/audio.mp3" --plain

# Video transcription
summarize --extract "/path/to/video.mp4" --plain

# Stdin (pipe content)
pbpaste | summarize --extract - --plain
cat document.pdf | summarize --extract - --plain

Always use --plain when extracting for agent consumption. It suppresses ANSI/OSC rendering.

Extraction defaults:

URLs default to --format md in extract mode
Files default to --format text
PDF requires uvx/markitdown (--preprocess auto, which is default)

LLM Summarization Mode (Secondary)

Use this mode only when you explicitly want summarize to perform synthesis itself.

# Summarize a URL (requires API key for the chosen model)
summarize "https://example.com" --model anthropic/claude-sonnet-4-5 --length long

# Summarize with a custom prompt
summarize "https://example.com" --prompt "Extract key technical decisions and their rationale"

# Summarize YouTube video
summarize "https://www.youtube.com/watch?v=VIDEO_ID" --length xl

# JSON output with metrics
summarize "https://example.com" --json --model openai/gpt-5-mini

API keys for LLM mode (set in ~/.summarize/config.json or env vars):

ANTHROPIC_API_KEY -- for anthropic/ models
OPENAI_API_KEY -- for openai/ models
GEMINI_API_KEY -- for google/ models
XAI_API_KEY -- for xai/ models

Dependency Matrix

Feature	Required Deps
Web page extraction	None
YouTube transcript (captions)	None (web mode)
YouTube transcript (no captions)	yt-dlp + whisper or API key
YouTube slides	yt-dlp + ffmpeg
Podcast transcription	yt-dlp + whisper or API key
PDF extraction	uvx/markitdown
Image OCR	tesseract
Audio/video transcription	whisper-cli (local) or OPENAI_API_KEY
Anti-bot sites (Firecrawl)	FIRECRAWL_API_KEY
Slide OCR	tesseract

What is not installed (by design):

whisper-cli / whisper.cpp -- heavy binary, install when audio transcription is needed
Firecrawl API key -- paid service, configure when anti-bot extraction is needed
LLM API keys in summarize config -- only add if you use LLM Summarization Mode

Key Flags Quick Reference

Flag	Purpose	Example
`--extract`	Raw content extraction, no LLM	`summarize --extract URL`
`--plain`	No ANSI rendering (agent-safe output)	Always use for agents
`--format md\|text`	Output format (md default for URLs in extract)	`--format md`
`--youtube auto\|web\|yt-dlp`	YouTube transcript source	`--youtube web` (captions only)
`--slides`	Extract video slides with ffmpeg	`--slides --slides-ocr`
`--timestamps`	Include timestamps in transcripts	`--timestamps`
`--firecrawl off\|auto\|always`	Firecrawl for anti-bot sites	`--firecrawl always`
`--preprocess off\|auto\|always`	Preprocessing (markitdown for PDFs)	Default `auto`
`--markdown-mode`	HTML-to-MD conversion mode	`--markdown-mode readability`
`--timeout`	Fetch/LLM timeout	`--timeout 2m`
`--verbose`	Debug output to stderr	Troubleshooting
`--json`	Structured JSON output with metrics	`--json`
`--length`	Summary length (LLM mode only)	`--length xl`
`--model`	LLM model (LLM mode only)	`--model anthropic/claude-sonnet-4-5`
`--max-extract-characters`	Limit extract output length	`--max-extract-characters 50000`
`--language\|--lang`	Output language	`--lang en`
`--video-mode`	Video handling mode	`--video-mode transcript`
`--transcriber`	Audio backend	`--transcriber whisper`

Verified Services (YouTube/Podcasts)

YouTube: All public videos with captions. Falls back to yt-dlp audio download + transcription for videos without captions.

Podcasts (verified):

Apple Podcasts
Spotify (best-effort; may fail for exclusives)
Amazon Music / Audible podcast pages
Podbean
Podchaser
RSS feeds (Podcasting 2.0 transcripts when available)
Embedded YouTube podcast pages

Common Patterns

1. YouTube Transcript for Analysis

# Quick: captions only (fastest, no deps beyond summarize)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --youtube web --plain

# Full: with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain

# Formatted as clean markdown (requires LLM API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain

2. Podcast Episode Transcript

# From RSS feed (transcribes latest episode)
summarize --extract "https://feeds.example.com/podcast.xml" --plain

# From Apple Podcasts link
summarize --extract "https://podcasts.apple.com/us/podcast/SHOW/EPISODE" --plain

3. PDF Content Extraction

# From URL
summarize --extract "https://example.com/report.pdf" --plain

# From local file
summarize --extract "/path/to/file.pdf" --plain

# Limit output length
summarize --extract "/path/to/huge.pdf" --max-extract-characters 50000 --plain

4. Image OCR

summarize --extract "/path/to/screenshot.png" --plain
summarize --extract "/path/to/scanned-doc.jpg" --plain

5. Anti-Bot Website (Firecrawl Fallback)

# Requires FIRECRAWL_API_KEY in env or config
summarize --extract "https://paywalled-site.com/article" --firecrawl always --plain

6. Batch Extraction (Shell Loop)

# Extract multiple YouTube videos
for url in "URL1" "URL2" "URL3"; do
  echo "=== $url ==="
  summarize --extract "$url" --plain
done

Error Handling

Symptom	Cause	Fix
`Missing uvx/markitdown`	PDF preprocessing not available	`brew install uv`
`does not support extracting binary files`	Preprocessing disabled for PDF	Use `--preprocess auto` (default) with uvx installed
YouTube returns empty transcript	No captions available, no yt-dlp/whisper	Install yt-dlp; for whisper fallback, install whisper-cli or set OPENAI_API_KEY
`FIRECRAWL_API_KEY not set`	Anti-bot mode requires Firecrawl	Set key in env or `~/.summarize/config.json`
Timeout on large content	Default 2m timeout too short	Use `--timeout 5m`
Audio transcription fails	No whisper backend available	Install whisper-cli locally or set OPENAI_API_KEY/FAL_KEY
Podcast extraction fails	Audio download failed	Check yt-dlp is installed and updated: `brew upgrade yt-dlp`
Garbled web extraction	JS-rendered content	summarize has no JS engine; use Crawl4AI instead

Configuration

Config file: ~/.summarize/config.json

{
  "model": "auto",
  "env": {
    "FIRECRAWL_API_KEY": "fc-..."
  },
  "ui": {
    "theme": "mono"
  }
}

Configure only what your workflow needs. If you use LLM Summarization Mode, add the required API keys.

Anti-Patterns

Do NOT	Do Instead
Use summarize for static web pages	WebFetch or Jina (faster, zero deps)
Use summarize for JS-heavy SPAs	Crawl4AI crwl (has browser rendering)
Use summarize's LLM mode as default	Use `--extract` and run synthesis in your own workflow unless explicitly required
Skip `--plain` for any non-interactive run	Always use `--plain` to avoid ANSI escape codes
Install whisper.cpp preemptively	Install only when audio transcription use case arises
Forget `--timeout` for large media	Podcasts/videos can take minutes; set `--timeout 5m`
Use summarize when WebFetch works	summarize is heavier; reserve for media and fallback
Use summarize for local repo/codebase search	Use your local knowledge search tools

Bundled Resources Index

Path	What	When to Load
`./UPDATES.md`	Structured changelog for AI agents	When checking for new features or updates
`./UPDATE-GUIDE.md`	Instructions for AI agents performing updates	When updating this skill
`./references/installation-guide.md`	Detailed install walkthrough for Claude Code and Codex CLI	First-time setup or environment repair
`./references/commands.md`	Full CLI flag reference with all options	When you need exact flag syntax or env var names

ナビゲーション

Skillsとは？

リンク

summarize