Skip to main content
Schema mode gives you full control over data extraction using CSS selectors and parsers. Define exact data structures for predictable, low-cost extraction that executes consistently across thousands of pages.

When to use

Use schema mode when you need:
  • Low cost: Pay per driver usage, not token consumption
  • Predictable extraction: Same selectors extract same data every time
  • Full control: Specify exact CSS paths and data types
  • High volume: Process large datasets efficiently
Parsers may break when page structure or selectors change. Monitor source pages and update parsers as needed.

Supported parameters

Available in - Extract.
ParameterTypeDescriptionDefault
parseBooleanEnable or disable HTML parsing (required to be true)false
parserObjectSelector based HTML parser object-

Parser structure

Parsers define how to extract data using three core components: Required properties:
  • type - Return format: item, list, json, table, object, object-list
  • selectors - CSS selectors identifying target elements (multiple for fallbacks)
  • extractor - What to extract: text, html, or attributes like [href]

Extraction types

TypePurposeReturns
itemSingle elementFirst matched element
listMultiple elementsArray of all matches
jsonJSON formattedFirst element as JSON
tableHTML tablesJSON using headers as keys
objectCustom structureSingle element with multiple fields
object-listMultiple custom structuresArray of objects

Extractors

Specify what to extract from matched elements:
  • text - Element text without HTML tags
  • html - Full inner HTML content
  • [attribute] - Specific attributes (e.g., [href], [src], [data-id])

Usage

Basic extraction

Extract a single field:
from nimble import Nimble
from pydantic import BaseModel

class Product(BaseModel):
    name: str

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract(
    url="https://www.example.com/product",
    schema=Product,
    parser={
        "name": {
            "selector": ".product-name",
            "extractor": "text"
        }
    }
)

print(result)

Object extraction

Extract multiple fields from a single element:
from nimble import Nimble
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    rating: float
    url: str

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract(
    url="https://www.example.com/product",
    schema=Product,
    parser={
        "type": "object",
        "selector": ".product-card",
        "fields": {
            "name": {
                "selector": ".product-name",
                "extractor": "text"
            },
            "price": {
                "selector": ".price",
                "extractor": "text"
            },
            "rating": {
                "selector": ".rating",
                "extractor": "text"
            },
            "url": {
                "selector": "a.product-link",
                "extractor": "[href]"
            }
        }
    }
)

print(result)

List extraction

Extract multiple items with the same structure:
from nimble import Nimble
from pydantic import BaseModel
from typing import List

class Product(BaseModel):
    name: str
    price: float

class ProductList(BaseModel):
    products: List[Product]

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract(
    url="https://www.example.com/products",
    schema=ProductList,
    parser={
        "products": {
            "type": "object-list",
            "selector": ".product-item",
            "fields": {
                "name": {
                    "selector": ".product-name",
                    "extractor": "text"
                },
                "price": {
                    "selector": ".price",
                    "extractor": "text"
                }
            }
        }
    }
)

print(result)

Table extraction

Convert HTML tables to JSON:
from nimble import Nimble

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract(
    url="https://www.example.com/table",
    parser={
        "data": {
            "type": "table",
            "selector": "table.data-table"
        }
    }
)

print(result)

Fallback selectors

Provide multiple selectors for robustness:
from nimble import Nimble
from pydantic import BaseModel

class Product(BaseModel):
    price: float

nimble = Nimble(api_key="YOUR-API-KEY")

result = nimble.extract(
    url="https://www.example.com/product",
    schema=Product,
    parser={
        "price": {
            "selectors": [".price-new", ".price", "[data-price]"],
            "extractor": "text"
        }
    }
)

print(result)
Nimble tries each selector in order and uses the first match. This creates fallback logic for varying page structures.

Example response

When parsing completes successfully, you receive structured data matching your schema. The response includes:
  • data: All related extacted data
    • data.parsing: Structured JSON matching your schema
  • metadata: Execution details including task id, driver used, execution time and more
{
  "status": "success",
  "data": {
	"html":"<!DOCTYPE html><html>...</html>",
	"parsing":{
    	"products": [
      		{
        		"name": "Wireless Headphones",
        		"price": 79.99,
        		"rating": 4.5,
        		"url": "/product/wireless-headphones",
        		"in_stock": true
      		},
      		{
        		"name": "Bluetooth Speaker",
        		"price": 49.99,
        		"rating": 4.2,
        		"url": "/product/bluetooth-speaker",
        		"in_stock": true
      		}
    	]
  	}
   },
  "metadata": {
	"task_id":".....",
	"country":"US",
    "driver": "vx14",
    "execution_time_ms": 2100
  }
}

Best practices

Selector specificity

Use specific selectors:
✅ ".product-card .price-value"
❌ ".price"
Combine classes and attributes:
✅ "button[data-action='add-to-cart']"
❌ "button"