Schema mode gives you full control over data extraction using CSS selectors and parsers. Define exact data structures for predictable, low-cost extraction that executes consistently across thousands of pages.
When to use
Use schema mode when you need:
- Low cost: Pay per driver usage, not token consumption
- Predictable extraction: Same selectors extract same data every time
- Full control: Specify exact CSS paths and data types
- High volume: Process large datasets efficiently
Parsers may break when page structure or selectors change. Monitor source pages and update parsers as needed.
Supported parameters
Available in - Extract.
| Parameter | Type | Description | Default |
|---|
parse | Boolean | Enable or disable HTML parsing (required to be true) | false |
parser | Object | Selector based HTML parser object | - |
Parser structure
Parsers define how to extract data using three core components:
Required properties:
type - Return format: item, list, json, table, object, object-list
selectors - CSS selectors identifying target elements (multiple for fallbacks)
extractor - What to extract: text, html, or attributes like [href]
| Type | Purpose | Returns |
|---|
item | Single element | First matched element |
list | Multiple elements | Array of all matches |
json | JSON formatted | First element as JSON |
table | HTML tables | JSON using headers as keys |
object | Custom structure | Single element with multiple fields |
object-list | Multiple custom structures | Array of objects |
Specify what to extract from matched elements:
text - Element text without HTML tags
html - Full inner HTML content
[attribute] - Specific attributes (e.g., [href], [src], [data-id])
Usage
Extract a single field:
from nimble import Nimble
from pydantic import BaseModel
class Product(BaseModel):
name: str
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/product",
schema=Product,
parser={
"name": {
"selector": ".product-name",
"extractor": "text"
}
}
)
print(result)
Extract multiple fields from a single element:
from nimble import Nimble
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
rating: float
url: str
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/product",
schema=Product,
parser={
"type": "object",
"selector": ".product-card",
"fields": {
"name": {
"selector": ".product-name",
"extractor": "text"
},
"price": {
"selector": ".price",
"extractor": "text"
},
"rating": {
"selector": ".rating",
"extractor": "text"
},
"url": {
"selector": "a.product-link",
"extractor": "[href]"
}
}
}
)
print(result)
Extract multiple items with the same structure:
from nimble import Nimble
from pydantic import BaseModel
from typing import List
class Product(BaseModel):
name: str
price: float
class ProductList(BaseModel):
products: List[Product]
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/products",
schema=ProductList,
parser={
"products": {
"type": "object-list",
"selector": ".product-item",
"fields": {
"name": {
"selector": ".product-name",
"extractor": "text"
},
"price": {
"selector": ".price",
"extractor": "text"
}
}
}
}
)
print(result)
Convert HTML tables to JSON:
from nimble import Nimble
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/table",
parser={
"data": {
"type": "table",
"selector": "table.data-table"
}
}
)
print(result)
Fallback selectors
Provide multiple selectors for robustness:
from nimble import Nimble
from pydantic import BaseModel
class Product(BaseModel):
price: float
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/product",
schema=Product,
parser={
"price": {
"selectors": [".price-new", ".price", "[data-price]"],
"extractor": "text"
}
}
)
print(result)
Nimble tries each selector in order and uses the first match. This creates fallback logic for varying page structures.
Example response
When parsing completes successfully, you receive structured data matching your schema. The response includes:
- data: All related extacted data
- data.parsing: Structured JSON matching your schema
- metadata: Execution details including task id, driver used, execution time and more
{
"status": "success",
"data": {
"html":"<!DOCTYPE html><html>...</html>",
"parsing":{
"products": [
{
"name": "Wireless Headphones",
"price": 79.99,
"rating": 4.5,
"url": "/product/wireless-headphones",
"in_stock": true
},
{
"name": "Bluetooth Speaker",
"price": 49.99,
"rating": 4.2,
"url": "/product/bluetooth-speaker",
"in_stock": true
}
]
}
},
"metadata": {
"task_id":".....",
"country":"US",
"driver": "vx14",
"execution_time_ms": 2100
}
}
Best practices
Selector specificity
Use specific selectors:
✅ ".product-card .price-value"
❌ ".price"
Combine classes and attributes:
✅ "button[data-action='add-to-cart']"
❌ "button"