LLM mode enables intelligent data extraction using AI. Describe what data you need in natural language or provide a loose schema, and the AI dynamically analyzes each page to extract and structure information - adapting to variations without schema maintenance.
When to use
Use LLM mode when you need:
- Zero setup: No CSS selector configuration required
- Self-healing: Automatically adapts when pages change
- Flexible extraction: Works across varying page structures
- Complex data: AI understands context and relationships
LLM mode uses the vx14 driver and incurs additional token consumption costs. This is more expensive than schema mode but requires zero maintenance when pages change.
How it works
- Define what you need: Provide a schema model and/or natural language prompt
- AI analyzes the page: The LLM understands page structure and identifies relevant data
- Dynamic extraction: Data is extracted and structured automatically
- Adaptive behavior: Works across different page layouts and structures
The AI analyzes each page individually, adapting to its specific structure. You can provide a schema model for structure, a prompt for guidance, or both.
Supported parameters
Available in - Extract.
| Parameter | Type | Description | Default |
|---|
parse | Boolean | Enable or disable HTML parsing (required to be true) | false |
schema | Object | Set the desired output JSON schema | - |
schema_prompt | String | Described the required parsing with natural language prompt | - |
Usage
With schema model
Define your data structure and let AI find the data:
from nimble import Nimble
from pydantic import BaseModel
class ProductInfo(BaseModel):
name: str
price: float
rating: float
in_stock: bool
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/product",
schema=ProductInfo
)
print(result)
With natural language prompt
Describe what to extract without defining structure:
from nimble import Nimble
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/product",
schema_prompt="Extract product details including name, price, customer ratings, and stock availability"
)
print(result)
Combining schema and prompt
Provide both structure and guidance for best results:
from nimble import Nimble
from pydantic import BaseModel
class ProductInfo(BaseModel):
name: str
price: float
rating: float
reviews_count: int
in_stock: bool
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/product",
schema=ProductInfo,
schema_prompt="Extract product information. For rating, use the average customer rating. Count total number of reviews."
)
print(result)
Extract multiple items with AI intelligence:
from nimble import Nimble
from pydantic import BaseModel
from typing import List
class Product(BaseModel):
name: str
price: float
rating: float
class ProductList(BaseModel):
products: List[Product]
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/products",
schema=ProductList,
schema_prompt="Extract all products visible on the page with their names, prices, and ratings"
)
print(result)
Let AI handle intricate data relationships:
from nimble import Nimble
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract(
url="https://www.example.com/article",
schema_prompt="Extract the article title, author name, publication date, main content, and all image URLs. Also extract any related articles mentioned with their titles and links."
)
print(result)
Combining with browser actions
Use AI extraction after dynamic interactions:
from nimble import Nimble
from pydantic import BaseModel
from typing import List
class Product(BaseModel):
name: str
price: float
class ProductList(BaseModel):
products: List[Product]
nimble = Nimble(api_key="YOUR-API-KEY")
result = nimble.extract({
"url": "https://www.example.com",
"render": True,
"browser_actions": [
{
"infinite_scroll": {
"duration": 10000
}
}
],
"schema": ProductList,
"schema_prompt": "Extract all loaded products with names and prices"
})
print(result)
Example response
When LLM parsing completes, you receive structured data with AI execution details. The response includes:
- data: All related extacted data
- data.parsing: Extracted data matching your schema or prompt
- metadata: Execution details including task id, driver used, execution time and more
{
"status": "success",
"data": {
"html":"<!DOCTYPE html><html>...</html>",
"parsing":{
"name": "Wireless Bluetooth Headphones",
"price": 79.99,
"rating": 4.5,
"reviews_count": 1234,
"in_stock": true,
"features": [
"Active Noise Cancellation",
"30-hour battery life",
"Quick charge technology"
]
}
},
"metadata": {
"task_id":".....",
"country":"US",
"driver": "vx14",
"execution_time_ms": 2100
}
}
Best practices
Writing effective prompts
Be specific about requirements:
✅ "Extract product name, current price (not original price), average rating, and stock status"
❌ "Get product info"
Describe data relationships:
✅ "Extract all review comments with their associated ratings and dates"
❌ "Get reviews"
Clarify ambiguities:
✅ "For price, use the discounted price if available, otherwise use the regular price"
❌ "Extract price"
Schema design
Use descriptive field names:
class Product(BaseModel):
current_price: float # ✅ Clear intent
price: float # ❌ Ambiguous
Provide context with prompts:
result = nimble.extract(
schema=Product,
schema_prompt="Extract current selling price (not MSRP)" # ✅ Clarifies which price
)
Pricing
LLM mode costs include:
- vx14 driver usage: Higher tier driver for AI capabilities
- Token consumption: Based on page size and prompt length
- API call: Standard request fee
Typical cost is 3-5x higher than schema mode, but eliminates maintenance when pages change.