Skip to main content
Parsing Schema converts HTML and network responses into clean, structured JSON matching your exact requirements. Choose between defining precise schemas or describing what you need in natural language. Common uses:
  • Product extraction: Get prices, names, ratings, and availability
  • Content scraping: Extract articles, reviews, or listings
  • Data collection: Structured data from tables, lists, or complex layouts
  • API-like responses: Transform HTML into clean JSON objects

Two modes

Parsing Schema offers two approaches to extract structured data:
  1. Schema Mode
  2. LLM Mode

Schema Mode

Define exact data structure using parsers and CSS selectors for predictable, low-cost extraction. Best for:
  • Stable page structures
  • Cost-sensitive applications (lower pricing)
  • Precise control over data extraction
  • High-volume, consistent extraction
How it works:
  1. Create a parser with CSS selectors mapping to your data model
  2. Define data types and structure
  3. Nimble extracts data exactly as specified
Pricing: Based on driver usage (vx6, vx8, vx10, etc.) - no token costs

Schema Mode

Precise extraction with CSS selectors and parsers

LLM Mode

Describe what data you need in natural language or provide a loose schema. AI dynamically analyzes each page to extract and structure information. Best for:
  • Pages with varying structures
  • Quick setup without selector maintenance
  • Complex extraction requirements
  • Adaptive data extraction
How it works:
  1. Define a data model (optional) or describe what to extract
  2. AI analyzes the page and identifies relevant data
  3. Data is extracted and structured automatically
Pricing: Uses vx14 driver + token consumption (higher cost than schema mode)

LLM Mode

AI-powered extraction with natural language prompts

Quick comparison

FeatureSchema ModeLLM Mode
SetupManual parser creationNatural language or loose schema
CostLower (driver only)Higher (driver + tokens)
AccuracyPrecise (if maintained)Adaptive
MaintenanceUpdate when page changesSelf-healing
SpeedFaster (no inference)Slower (LLM processing)
ControlFull control over selectorsAI-determined extraction
Best forStable pages, cost controlVarying pages, quick setup
Both modes extract data within the same extract method call. Choose the approach that matches your needs for cost, control, and maintenance.

Choosing the right mode

Use Schema Mode when:
  • You need low-cost, predictable extraction
  • Page structure is stable and CSS selectors work reliably
  • You want complete control over what gets extracted
  • Processing high volumes with consistent structure
Use LLM Mode when:
  • You want zero-setup, hands-free extraction
  • Pages have varying structures or layouts
  • Maintaining CSS selectors is challenging
  • Higher cost is acceptable for reduced maintenance

Usage examples

Schema Mode

Define exact structure with a data model:
from nimble import Nimble
from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    rating: float
    in_stock: bool

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract(
    url="https://www.example.com/product",
    schema=ProductInfo
)

print(result)

LLM Mode

Let AI figure out the extraction:
from nimble import Nimble
from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    rating: float
    in_stock: bool

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract(
    url="https://www.example.com/product",
    schema=ProductInfo,
    schema_prompt="Extract product details including name, price, customer ratings, and stock availability"
)

print(result)

Combining with other features

Parsing works seamlessly with other Extract features:
from nimble import Nimble
from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    rating: float

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract({
    "url": "https://www.example.com",
    "render": True,
    "browser_actions": [
        {
            "infinite_scroll": {
                "duration": 5000
            }
        }
    ],
    "schema": ProductInfo
})

print(result)

Next steps