name: firecrawl-mcp description: > Using the Firecrawl MCP server to scrape, search, crawl, extract, and browse the web. Use this skill whenever the Firecrawl MCP tools are available and you need to retrieve web content, discover URLs on a site, search the web with full-page content retrieval, extract structured data from pages, perform autonomous multi-source web research, or interact with web pages through a remote browser sandbox. Trigger this skill for any task involving firecrawl_scrape, firecrawl_search, firecrawl_map, firecrawl_crawl, firecrawl_extract, firecrawl_agent, or firecrawl_browser_* tools. Also trigger when the user asks you to "scrape", "crawl", "map a site", "extract data from a page", "search with Firecrawl", or "use the browser sandbox", even if they don't mention Firecrawl by name — provided the MCP tools are connected.
Firecrawl MCP — Agent Skill
This skill governs how to use the Firecrawl MCP server tools effectively. It assumes the MCP server is already connected and authenticated.
Tool inventory
The Firecrawl MCP exposes 12 tools across five capabilities:
| Capability | Tools | Async? |
|---|---|---|
| Scrape | firecrawl_scrape | No |
| Search | firecrawl_search | No |
| Map | firecrawl_map | No |
| Crawl | firecrawl_crawl, firecrawl_check_crawl_status | Yes |
| Extract | firecrawl_extract | No |
| Agent | firecrawl_agent, firecrawl_agent_status | Yes |
| Browser | firecrawl_browser_create, firecrawl_browser_execute, firecrawl_browser_delete, firecrawl_browser_list | Session |
Choosing the right tool
Apply this decision tree top-to-bottom. Pick the first match.
- You have a single URL and need its content →
firecrawl_scrape - You need to find pages on the open web by query →
firecrawl_search - You need to discover URLs within a single domain →
firecrawl_map - You need content from many pages under one domain →
firecrawl_crawl - You need structured fields from one or more known URLs →
firecrawl_extract - You have a complex, open-ended research question spanning multiple unknown sources →
firecrawl_agent - You need to interact with a page (fill forms, click, authenticate) →
firecrawl_browser_*
When in doubt between scrape and search: if you already have the URL, scrape. If you need to find the URL first, search.
When in doubt between extract and scrape-with-JSON-format: firecrawl_extract
operates on multiple URLs and uses Firecrawl's server-side LLM. The JSON
format on firecrawl_scrape works on a single page and also uses server-side
extraction. Prefer firecrawl_extract when pulling uniform structured data
from several pages. Prefer scrape with JSON format when you want markdown
and structured data from the same single page in one call.
When in doubt between crawl and map-then-scrape: crawl is a single async job that handles traversal and scraping together. Map-then-scrape gives you more control (you can filter the URL list before scraping selectively). Prefer map-then-scrape when you only need a subset of pages; prefer crawl when you want everything under a domain up to a depth/limit.
Credit costs — be frugal
Every tool call consumes API credits. Minimise unnecessary calls.
| Tool | Base cost |
|---|---|
firecrawl_scrape | 1 credit per page |
firecrawl_search | 1 credit per result (+ scrape costs if scrapeOptions used) |
firecrawl_map | 1 credit per call (regardless of URL count returned) |
firecrawl_crawl | 1 credit per page crawled |
firecrawl_extract | Varies; LLM extraction adds cost |
firecrawl_agent | Varies by research scope |
firecrawl_browser_* | Session-based billing |
Additional surcharges: JSON mode adds 4 credits/page. Enhanced proxy adds 4 credits/page. PDF parsing adds 1 credit per PDF page.
Always set limit on crawl and map calls. The default crawl limit is 10,000
pages — a runaway crawl will burn through credits fast. Start with a low limit
(10–50) and increase only if needed.
Core patterns
Pattern 1: Scrape a known URL
{
"name": "firecrawl_scrape",
"arguments": {
"url": "https://example.com/pricing",
"formats": ["markdown"],
"onlyMainContent": true
}
}
Set onlyMainContent: true to strip nav, footer, and sidebar boilerplate.
This reduces token count and improves downstream processing.
Available formats: markdown, html, rawHtml, screenshot,
links, json, images, branding, summary.
Request only the formats you need. Multiple formats in one call are fine — the page is fetched once.
For pages that require JavaScript rendering or contain dynamic content,
Firecrawl handles this automatically. If a standard scrape fails or returns
incomplete content, consider using waitFor (milliseconds) to let JS finish,
or use actions for pages that need interaction before content appears.
→ For full scrape options, read references/scrape-options.md.
Pattern 2: Search the web
{
"name": "firecrawl_search",
"arguments": {
"query": "Rust async runtime benchmarks 2025",
"limit": 5
}
}
Without scrapeOptions, search returns metadata only (URL, title,
description, position). Add scrapeOptions to get full page content
from each result in one operation — but note this multiplies credit cost.
Time-based filtering with tbs: qdr:d (past day), qdr:w (past
week), qdr:m (past month). Essential for finding recent content.
Source types via sources: ["web"] (default), ["news"],
["images"], or combinations. The limit applies per source type.
Category filtering via categories: ["github"], ["research"],
["pdf"]. Narrows results to specific domains (GitHub repos, academic
sites, PDF documents respectively).
→ For full search options, read references/search-options.md.
Pattern 3: Map a site's URL structure
{
"name": "firecrawl_map",
"arguments": {
"url": "https://docs.example.com",
"search": "authentication",
"limit": 100
}
}
Map returns an array of URLs (with optional title/description). It does not return page content. Use it as a reconnaissance step before selective scraping.
The search parameter filters returned URLs by relevance to a term —
useful when you only need the authentication docs from a large site,
for instance.
Set ignoreQueryParameters: true to deduplicate URLs that differ only
by query string.
Pattern 4: Crawl an entire site (async)
{
"name": "firecrawl_crawl",
"arguments": {
"url": "https://docs.example.com",
"maxDiscoveryDepth": 2,
"limit": 50,
"deduplicateSimilarURLs": true
}
}
Crawl is asynchronous. It returns a job ID immediately. Poll with
firecrawl_check_crawl_status using that ID. Allow 15–30 seconds between
polls. The status will be scraping, completed, or failed.
By default, crawl stays within the URL's path hierarchy. Set
allowExternalLinks: true to follow links to other domains (use with
caution — credit implications). Set allowSubdomains: true to include
subdomains like blog.example.com when crawling example.com.
All scrape options (formats, onlyMainContent, actions, location, tags)
can be passed via scrapeOptions and apply to every page the crawler
visits.
→ For full crawl options, read references/crawl-options.md.
Pattern 5: Extract structured data
{
"name": "firecrawl_extract",
"arguments": {
"urls": ["https://example.com/product/1", "https://example.com/product/2"],
"prompt": "Extract the product name, price, and availability status",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"in_stock": { "type": "boolean" }
},
"required": ["name", "price"]
}
}
}
The schema follows JSON Schema format. If omitted, the LLM chooses
its own structure guided by prompt. Providing a schema is strongly
recommended for consistent, parseable output.
Pattern 6: Autonomous research agent (async)
{
"name": "firecrawl_agent",
"arguments": {
"prompt": "Find the pricing tiers and feature limits for Vercel, Netlify, and Cloudflare Pages. Compare them.",
"schema": {
"type": "object",
"properties": {
"providers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"tiers": { "type": "array", "items": { "type": "object" } }
}
}
}
}
}
}
}
The agent is async — it returns a job ID. Poll firecrawl_agent_status
every 15–30 seconds. Allow at least 2–3 minutes before treating it as
failed. The agent autonomously searches, navigates, and extracts.
Provide urls to focus the agent on specific pages. Omit urls to let
it search freely. The prompt is limited to 10,000 characters.
Best for: complex cross-site research where you don't know the exact URLs in advance, or where content is spread across many pages.
Pattern 7: Browser sandbox sessions
For interactive web tasks (form filling, authentication, multi-step navigation), use the browser sandbox.
Lifecycle:
firecrawl_browser_create— start a session (returns session ID)firecrawl_browser_execute— run code in the session (repeatable)firecrawl_browser_delete— destroy the session when finished
Always delete sessions when done. Sessions have TTLs but leaving them open wastes resources.
→ For full browser options and commands, read references/browser-options.md.
Handling asynchronous tools
Both firecrawl_crawl and firecrawl_agent are async. The workflow is:
- Call the tool → receive a job ID.
- Poll the status tool with that ID every 15–30 seconds.
- On
completed, the response includes the results. - On
failed, report the error. Consider retrying with adjusted parameters.
Do not poll more frequently than every 15 seconds — it wastes rate-limit budget and the status endpoints have their own rate limits.
Error handling
The MCP server handles retries internally with exponential backoff (default: 3 attempts, starting at 1s, doubling each time, capped at 10s). If a call still fails after retries, you will receive an error response.
Common errors:
- Rate limit exceeded: Back off and retry after the indicated delay. Check whether you're making unnecessary calls that can be consolidated.
- Credit limit warnings: The server emits warnings at configurable thresholds. If you see a credit warning, inform the user and stop non-essential operations.
- Timeout: Increase the
timeoutparameter or simplify the request (fewer actions, simpler schema, lower page count).
Anti-patterns
- Scraping then extracting the same page: Use
firecrawl_scrapewithformats: ["markdown", "json"]to get both in one call, or usefirecrawl_extractif you only need structured data. - Crawling an entire domain to find one page: Use
firecrawl_mapwith thesearchparameter first, then scrape the specific URL. - Polling status every 2 seconds: Wastes rate-limit budget. Use 15–30 second intervals.
- Omitting
limiton crawl: The default is 10,000 pages. Always set an explicit limit. - Using
firecrawl_agentfor single-page tasks: The agent is designed for multi-source research. For single pages,firecrawl_scrapeorfirecrawl_extractare faster, cheaper, and more predictable. - Requesting
rawHtmlwhenmarkdownsuffices:rawHtmlis large and rarely needed for LLM consumption. Usemarkdownby default;html(cleaned) if you need structure;rawHtmlonly for debugging or when you need the exact original markup. - Leaving browser sessions open: Always call
firecrawl_browser_deletewhen your task is complete. Usefirecrawl_browser_listto check for orphaned sessions.
Caching
Firecrawl caches scraped pages with a default freshness window of 2 days
(maxAge: 172800000 ms). Cached responses are significantly faster (up to
5×). Set maxAge: 0 to force a fresh scrape — but only when you genuinely
need the absolute latest content. A non-zero maxAge is almost always the
right choice.
Reference files
For detailed parameter documentation on each tool, read the appropriate reference file:
| File | Contents |
|---|---|
references/scrape-options.md | All firecrawl_scrape parameters, formats, actions, and location settings |
references/search-options.md | All firecrawl_search parameters, source types, categories, and scrape integration |
references/crawl-options.md | All firecrawl_crawl parameters, path filtering, scope, and status polling |
references/browser-options.md | Browser session lifecycle, execute languages, agent-browser commands, TTL config |
references/extract-agent-options.md | firecrawl_extract schema design and firecrawl_agent usage patterns |
Read these files when you need parameter-level detail beyond what this document covers. For most tasks, the patterns above are sufficient.