Skip to main content
Format options control what data types are included in your response. Specify one or more formats to receive HTML, markdown, screenshots, or extracted links alongside your extracted data. Common uses:
  • Full page content: Get raw HTML for custom processing
  • Readable text: Convert pages to clean markdown format
  • Visual records: Capture screenshots for monitoring or archival
  • Link extraction: Get all URLs from the page for further crawling
You can combine multiple formats in a single request. All specified formats will be included in the response.

Supported parameters

Available in - Extract and Crawl.
ParameterTypeDescriptionDefault
formatsList (String)Sets what data types are included in your response["html"]

Available formats

FormatDescriptionUse Case
htmlRaw HTML contentCustom parsing, archival, full DOM access
markdownClean markdown conversionReadable text, content analysis, LLM processing
screenshotPage screenshot (base64)Visual verification, monitoring, documentation
linksAll extracted URLsLink discovery, crawling, sitemap building

Usage

Single HTML format

Request one format type - html (default). Best for:
  • Custom HTML parsing
  • Preserving exact page structure
  • Accessing all DOM elements and attributes
  • Archival purposes
from nimble import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract({
    "url": "https://www.example.com",
    "formats": ["html"] # default
})

# Access HTML content
html_content = result["data"]["html"]
print(html_content)

Multiple formats

Combine multiple formats to get different data representations:
from nimble import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract({
    "url": "https://www.example.com",
    "formats": ["html", "markdown", "screenshot", "links"]
})

print(result)

Markdown format

Convert the page to clean, readable markdown. Best for:
  • Clean text extraction
  • Content analysis
  • LLM processing
  • Human-readable output:
from nimble import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract({
    "url": "https://www.example.com/article",
    "formats": ["markdown"]
})

# Access markdown content
markdown_content = result["data"]["markdown"]
print(markdown_content)

Screenshot format

Capture a visual snapshot of the page. Best for:
  • Visual verification
  • Monitoring page changes
  • Documentation and reporting
  • Debugging layout issues:
from nimble import Nimble
import base64

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract({
    "url": "https://www.example.com",
    "formats": ["screenshot"],
    "render": True
})

# Access screenshot (base64 encoded)
screenshot_data = result["data"]["screenshot"]

# Decode and save
with open("screenshot.png", "wb") as f:
    f.write(base64.b64decode(screenshot_data))
Screenshots require page rendering to be enabled (render: true). The image is returned as base64-encoded PNG data.
Extract all URLs found on the page. Best for:
  • Link discovery
  • Building sitemaps
  • Crawling workflows
  • Finding internal/external links:
from nimble import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract({
    "url": "https://www.example.com",
    "formats": ["links"]
})

# Access extracted links
links = result["data"]["links"]
for link in links:
    print(link)

Combining with other features

Formats work seamlessly with parsing, browser actions, and other features:
from nimble import Nimble
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract({
    "url": "https://www.example.com/product",
    "render": True,
    "formats": ["html", "markdown", "screenshot"],
    "schema": Product,
    "browser_actions": [
        {
            "wait": {
                "delay": 1000
            }
        }
    ]
})

# Access different formats
product_data = result["data"]["parsed"]
html = result["data"]["html"]
markdown = result["data"]["markdown"]
screenshot = result["data"]["screenshot"]

Example response

When formats are specified, all requested data is included in the response. The response includes:
  • html: Raw HTML if requested
  • markdown: Converted markdown if requested
  • screenshot: Base64-encoded PNG if requested
  • links: Array of extracted URLs if requested
  • parsed: Structured data if parsing was used
  • metadata: Execution details and formats included:
{
  "status": "success",
  "data": {
    "html": "<!DOCTYPE html><html><head>...</head><body>...</body></html>",
    "markdown": "# Article Title\n\nThis is the article content...",
    "screenshot": "iVBORw0KGgoAAAANSUhEUgAAA...",
    "links": [
      "https://www.example.com/about",
      "https://www.example.com/contact",
      "https://www.example.com/products",
      "https://external-site.com"
    ],
    "parsed": {
      "title": "Example Article",
      "author": "John Doe"
    }
  },
  "metadata": {
    "driver": "vx8",
    "execution_time_ms": 1850,
    "formats_included": ["html", "markdown", "screenshot", "links"]
  }
}

Best practices

Format selection

Choose formats based on your needs:
  • Use html when you need full DOM access
  • Use markdown for clean text and content analysis
  • Use screenshot for visual verification
  • Use links for discovering URLs to crawl
Avoid unnecessary formats:
# ❌ Don't request all formats if you only need one
formats=["html", "markdown", "screenshot", "links"]

# ✅ Request only what you need
formats=["markdown"]

Performance considerations

  • Each format adds processing time
  • Screenshots require rendering and are slower
  • HTML and markdown are faster to generate
  • Request only needed formats for optimal performance
Process links after extraction:
# Filter internal links only
internal_links = [
    link for link in result["data"]["links"]
    if link.startswith("https://www.example.com")
]

# Filter by file type
pdf_links = [
    link for link in result["data"]["links"]
    if link.endswith(".pdf")
]