name: firecrawl-mcp description: > Using the Firecrawl MCP server to scrape, search, crawl, extract, and browse the web. Use this skill whenever the Firecrawl MCP tools are available and you need to retrieve web content, discover URLs on a site, search the web with full-page content retrieval, extract structured data from pages, perform autonomous multi-source web research, or interact with web pages through a remote browser sandbox. Trigger this skill for any task involving firecrawl_scrape, firecrawl_search, firecrawl_map, firecrawl_crawl, firecrawl_extract, firecrawl_agent, or firecrawl_browser_* tools. Also trigger when the user asks you to "scrape", "crawl", "map a site", "extract data from a page", "search with Firecrawl", or "use the browser sandbox", even if they don't mention Firecrawl by name — provided the MCP tools are connected.

Firecrawl MCP — Agent Skill

This skill governs how to use the Firecrawl MCP server tools effectively. It assumes the MCP server is already connected and authenticated.

Tool inventory

The Firecrawl MCP exposes 12 tools across five capabilities:

Capability	Tools	Async?
Scrape	`firecrawl_scrape`	No
Search	`firecrawl_search`	No
Map	`firecrawl_map`	No
Crawl	`firecrawl_crawl`, `firecrawl_check_crawl_status`	Yes
Extract	`firecrawl_extract`	No
Agent	`firecrawl_agent`, `firecrawl_agent_status`	Yes
Browser	`firecrawl_browser_create`, `firecrawl_browser_execute`, `firecrawl_browser_delete`, `firecrawl_browser_list`	Session

Choosing the right tool

Apply this decision tree top-to-bottom. Pick the first match.

You have a single URL and need its content → firecrawl_scrape
You need to find pages on the open web by query → firecrawl_search
You need to discover URLs within a single domain → firecrawl_map
You need content from many pages under one domain → firecrawl_crawl
You need structured fields from one or more known URLs → firecrawl_extract
You have a complex, open-ended research question spanning multiple unknown sources → firecrawl_agent
You need to interact with a page (fill forms, click, authenticate) → firecrawl_browser_*

When in doubt between scrape and search: if you already have the URL, scrape. If you need to find the URL first, search.

When in doubt between extract and scrape-with-JSON-format: firecrawl_extract operates on multiple URLs and uses Firecrawl's server-side LLM. The JSON format on firecrawl_scrape works on a single page and also uses server-side extraction. Prefer firecrawl_extract when pulling uniform structured data from several pages. Prefer scrape with JSON format when you want markdown and structured data from the same single page in one call.

When in doubt between crawl and map-then-scrape: crawl is a single async job that handles traversal and scraping together. Map-then-scrape gives you more control (you can filter the URL list before scraping selectively). Prefer map-then-scrape when you only need a subset of pages; prefer crawl when you want everything under a domain up to a depth/limit.

Credit costs — be frugal

Every tool call consumes API credits. Minimise unnecessary calls.

Tool	Base cost
`firecrawl_scrape`	1 credit per page
`firecrawl_search`	1 credit per result (+ scrape costs if `scrapeOptions` used)
`firecrawl_map`	1 credit per call (regardless of URL count returned)
`firecrawl_crawl`	1 credit per page crawled
`firecrawl_extract`	Varies; LLM extraction adds cost
`firecrawl_agent`	Varies by research scope
`firecrawl_browser_*`	Session-based billing

Additional surcharges: JSON mode adds 4 credits/page. Enhanced proxy adds 4 credits/page. PDF parsing adds 1 credit per PDF page.

Always set limit on crawl and map calls. The default crawl limit is 10,000 pages — a runaway crawl will burn through credits fast. Start with a low limit (10–50) and increase only if needed.

Core patterns

Pattern 1: Scrape a known URL

{
  "name": "firecrawl_scrape",
  "arguments": {
    "url": "https://example.com/pricing",
    "formats": ["markdown"],
    "onlyMainContent": true
  }
}

Set onlyMainContent: true to strip nav, footer, and sidebar boilerplate. This reduces token count and improves downstream processing.

Available formats: markdown, html, rawHtml, screenshot, links, json, images, branding, summary.

Request only the formats you need. Multiple formats in one call are fine — the page is fetched once.

For pages that require JavaScript rendering or contain dynamic content, Firecrawl handles this automatically. If a standard scrape fails or returns incomplete content, consider using waitFor (milliseconds) to let JS finish, or use actions for pages that need interaction before content appears.

→ For full scrape options, read references/scrape-options.md.

Pattern 2: Search the web

{
  "name": "firecrawl_search",
  "arguments": {
    "query": "Rust async runtime benchmarks 2025",
    "limit": 5
  }
}

Without scrapeOptions, search returns metadata only (URL, title, description, position). Add scrapeOptions to get full page content from each result in one operation — but note this multiplies credit cost.

Time-based filtering with tbs: qdr:d (past day), qdr:w (past week), qdr:m (past month). Essential for finding recent content.

Source types via sources: ["web"] (default), ["news"], ["images"], or combinations. The limit applies per source type.

Category filtering via categories: ["github"], ["research"], ["pdf"]. Narrows results to specific domains (GitHub repos, academic sites, PDF documents respectively).

→ For full search options, read references/search-options.md.

Pattern 3: Map a site's URL structure

{
  "name": "firecrawl_map",
  "arguments": {
    "url": "https://docs.example.com",
    "search": "authentication",
    "limit": 100
  }
}

Map returns an array of URLs (with optional title/description). It does not return page content. Use it as a reconnaissance step before selective scraping.

The search parameter filters returned URLs by relevance to a term — useful when you only need the authentication docs from a large site, for instance.

Set ignoreQueryParameters: true to deduplicate URLs that differ only by query string.

Pattern 4: Crawl an entire site (async)

{
  "name": "firecrawl_crawl",
  "arguments": {
    "url": "https://docs.example.com",
    "maxDiscoveryDepth": 2,
    "limit": 50,
    "deduplicateSimilarURLs": true
  }
}

Crawl is asynchronous. It returns a job ID immediately. Poll with firecrawl_check_crawl_status using that ID. Allow 15–30 seconds between polls. The status will be scraping, completed, or failed.

By default, crawl stays within the URL's path hierarchy. Set allowExternalLinks: true to follow links to other domains (use with caution — credit implications). Set allowSubdomains: true to include subdomains like blog.example.com when crawling example.com.

All scrape options (formats, onlyMainContent, actions, location, tags) can be passed via scrapeOptions and apply to every page the crawler visits.

→ For full crawl options, read references/crawl-options.md.

Pattern 5: Extract structured data

{
  "name": "firecrawl_extract",
  "arguments": {
    "urls": ["https://example.com/product/1", "https://example.com/product/2"],
    "prompt": "Extract the product name, price, and availability status",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "in_stock": { "type": "boolean" }
      },
      "required": ["name", "price"]
    }
  }
}

The schema follows JSON Schema format. If omitted, the LLM chooses its own structure guided by prompt. Providing a schema is strongly recommended for consistent, parseable output.

Pattern 6: Autonomous research agent (async)

{
  "name": "firecrawl_agent",
  "arguments": {
    "prompt": "Find the pricing tiers and feature limits for Vercel, Netlify, and Cloudflare Pages. Compare them.",
    "schema": {
      "type": "object",
      "properties": {
        "providers": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "tiers": { "type": "array", "items": { "type": "object" } }
            }
          }
        }
      }
    }
  }
}

The agent is async — it returns a job ID. Poll firecrawl_agent_status every 15–30 seconds. Allow at least 2–3 minutes before treating it as failed. The agent autonomously searches, navigates, and extracts.

Provide urls to focus the agent on specific pages. Omit urls to let it search freely. The prompt is limited to 10,000 characters.

Best for: complex cross-site research where you don't know the exact URLs in advance, or where content is spread across many pages.

Pattern 7: Browser sandbox sessions

For interactive web tasks (form filling, authentication, multi-step navigation), use the browser sandbox.

Lifecycle:

firecrawl_browser_create — start a session (returns session ID)
firecrawl_browser_execute — run code in the session (repeatable)
firecrawl_browser_delete — destroy the session when finished

Always delete sessions when done. Sessions have TTLs but leaving them open wastes resources.

→ For full browser options and commands, read references/browser-options.md.

Handling asynchronous tools

Both firecrawl_crawl and firecrawl_agent are async. The workflow is:

Call the tool → receive a job ID.
Poll the status tool with that ID every 15–30 seconds.
On completed, the response includes the results.
On failed, report the error. Consider retrying with adjusted parameters.

Do not poll more frequently than every 15 seconds — it wastes rate-limit budget and the status endpoints have their own rate limits.

Error handling

The MCP server handles retries internally with exponential backoff (default: 3 attempts, starting at 1s, doubling each time, capped at 10s). If a call still fails after retries, you will receive an error response.

Common errors:

Rate limit exceeded: Back off and retry after the indicated delay. Check whether you're making unnecessary calls that can be consolidated.
Credit limit warnings: The server emits warnings at configurable thresholds. If you see a credit warning, inform the user and stop non-essential operations.
Timeout: Increase the timeout parameter or simplify the request (fewer actions, simpler schema, lower page count).

Anti-patterns

Scraping then extracting the same page: Use firecrawl_scrape with formats: ["markdown", "json"] to get both in one call, or use firecrawl_extract if you only need structured data.
Crawling an entire domain to find one page: Use firecrawl_map with the search parameter first, then scrape the specific URL.
Polling status every 2 seconds: Wastes rate-limit budget. Use 15–30 second intervals.
Omitting limit on crawl: The default is 10,000 pages. Always set an explicit limit.
Using firecrawl_agent for single-page tasks: The agent is designed for multi-source research. For single pages, firecrawl_scrape or firecrawl_extract are faster, cheaper, and more predictable.
Requesting rawHtml when markdown suffices: rawHtml is large and rarely needed for LLM consumption. Use markdown by default; html (cleaned) if you need structure; rawHtml only for debugging or when you need the exact original markup.
Leaving browser sessions open: Always call firecrawl_browser_delete when your task is complete. Use firecrawl_browser_list to check for orphaned sessions.

Caching

Firecrawl caches scraped pages with a default freshness window of 2 days (maxAge: 172800000 ms). Cached responses are significantly faster (up to 5×). Set maxAge: 0 to force a fresh scrape — but only when you genuinely need the absolute latest content. A non-zero maxAge is almost always the right choice.

Reference files

For detailed parameter documentation on each tool, read the appropriate reference file:

File	Contents
`references/scrape-options.md`	All `firecrawl_scrape` parameters, formats, actions, and location settings
`references/search-options.md`	All `firecrawl_search` parameters, source types, categories, and scrape integration
`references/crawl-options.md`	All `firecrawl_crawl` parameters, path filtering, scope, and status polling
`references/browser-options.md`	Browser session lifecycle, execute languages, agent-browser commands, TTL config
`references/extract-agent-options.md`	`firecrawl_extract` schema design and `firecrawl_agent` usage patterns

Read these files when you need parameter-level detail beyond what this document covers. For most tasks, the patterns above are sufficient.

ナビゲーション

Skillsとは？

リンク

firecrawl-mcp