Batch Processing

Process multiple URLs simultaneously for efficient large-scale data extraction.

Basic Batch Request

Endpoint

POST https://api.nimbleway.com/v1/batch

Simple Batch

curl -X POST "https://api.nimbleway.com/v1/batch" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example1.com",
      "https://example2.com", 
      "https://example3.com"
    ],
    "delivery_method": "webhook",
    "webhook_url": "https://your-app.com/webhook"
  }'

Response

{
  "batch_id": "batch_123456",
  "status": "queued",
  "total_urls": 3,
  "estimated_completion": "2024-01-15T10:35:00Z",
  "urls": [
    {
      "url": "https://example1.com",
      "job_id": "job_001",
      "status": "queued"
    },
    {
      "url": "https://example2.com", 
      "job_id": "job_002",
      "status": "queued"
    },
    {
      "url": "https://example3.com",
      "job_id": "job_003", 
      "status": "queued"
    }
  ]
}

Advanced Batch Configuration

Per-URL Settings

{
  "requests": [
    {
      "url": "https://ecommerce1.com/products",
      "country": "US",
      "render": true,
      "actions": [
        {
          "type": "wait_and_click",
          "selector": ".load-more"
        }
      ]
    },
    {
      "url": "https://ecommerce2.com/products",
      "country": "UK", 
      "format": "json",
      "headers": {
        "Accept": "application/json"
      }
    }
  ],
  "delivery_method": "s3",
  "s3_config": {
    "bucket": "my-scraping-results",
    "key_prefix": "batch_{{batch_id}}/"
  }
}

Batch-Level Settings

{
  "urls": ["https://site1.com", "https://site2.com"],
  "global_settings": {
    "timeout": 30,
    "retries": 2,
    "render": false,
    "country": "US"
  },
  "concurrency": 10,
  "rate_limit": "5/second"
}

Batch Parameters

Required Parameters

Parameter	Type	Description
`urls` or `requests`	array	List of URLs or detailed request objects

Optional Parameters

Parameter	Type	Default	Description
`delivery_method`	string	`webhook`	How to deliver results
`concurrency`	integer	`5`	Number of simultaneous requests
`rate_limit`	string	`10/second`	Rate limiting for requests
`priority`	string	`normal`	Batch priority: `low`, `normal`, `high`
`tags`	array	`[]`	Tags for organizing batches

Monitoring Batch Progress

Check Batch Status

curl -X GET "https://api.nimbleway.com/v1/batch/batch_123456" \
  -H "Authorization: Bearer your-api-key"

Response

{
  "batch_id": "batch_123456",
  "status": "processing",
  "progress": {
    "completed": 150,
    "failed": 5,
    "pending": 45,
    "total": 200
  },
  "started_at": "2024-01-15T10:30:00Z",
  "estimated_completion": "2024-01-15T10:45:00Z"
}

Individual Job Status

curl -X GET "https://api.nimbleway.com/v1/jobs/job_001" \
  -H "Authorization: Bearer your-api-key"

Batch Results

Webhook Delivery

Results are delivered individually as each URL completes:

{
  "batch_id": "batch_123456",
  "job_id": "job_001",
  "url": "https://example1.com",
  "status": "completed",
  "html": "<!DOCTYPE html>...",
  "completed_at": "2024-01-15T10:32:15Z"
}

S3 Delivery Structure

my-bucket/
├── batch_123456/
│   ├── job_001.json
│   ├── job_002.json
│   ├── job_003.json
│   └── batch_summary.json

Polling Results

# Get all completed results
curl -X GET "https://api.nimbleway.com/v1/batch/batch_123456/results" \
  -H "Authorization: Bearer your-api-key"

Error Handling

Partial Failures

{
  "batch_id": "batch_123456",
  "status": "completed_with_errors",
  "results": {
    "successful": 180,
    "failed": 20,
    "total": 200
  },
  "failed_urls": [
    {
      "url": "https://timeout-site.com",
      "error": "TIMEOUT_EXCEEDED",
      "job_id": "job_105"
    },
    {
      "url": "https://blocked-site.com", 
      "error": "ACCESS_DENIED",
      "job_id": "job_147"
    }
  ]
}

Retry Failed URLs

curl -X POST "https://api.nimbleway.com/v1/batch/batch_123456/retry" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "job_ids": ["job_105", "job_147"],
    "retry_settings": {
      "timeout": 60,
      "country": "CA"
    }
  }'

Use Cases

E-commerce Price Monitoring

{
  "requests": [
    {
      "url": "https://shop1.com/product/123",
      "actions": [
        {"type": "wait_for_selector", "selector": ".price"}
      ]
    },
    {
      "url": "https://shop2.com/item/456", 
      "actions": [
        {"type": "wait_and_click", "selector": ".show-price"}
      ]
    }
  ],
  "delivery_method": "webhook",
  "tags": ["price-monitoring", "daily-check"]
}

{
  "urls": [
    "https://competitor1.com/blog",
    "https://competitor2.com/resources",
    "https://competitor3.com/guides"
  ],
  "global_settings": {
    "render": true,
    "actions": [
      {"type": "scroll", "direction": "down"},
      {"type": "wait", "duration": 3000}
    ]
  },
  "tags": ["seo-analysis", "content-audit"]
}

Data Migration

{
  "requests": [
    /* 1000+ URLs with different configurations */
  ],
  "concurrency": 20,
  "delivery_method": "s3",
  "s3_config": {
    "bucket": "migration-data",
    "key_prefix": "scraped_content/{{date}}/"
  },
  "priority": "high"
}

Performance Optimization

Optimal Concurrency

Low concurrency (1-5): Respectful scraping, avoid rate limits
Medium concurrency (5-15): Balanced speed and stability
High concurrency (15-50): Maximum speed for robust targets

Resource Management

{
  "urls": ["https://heavy-site1.com", "https://heavy-site2.com"],
  "global_settings": {
    "timeout": 120,
    "render": false
  },
  "concurrency": 3,
  "memory_limit": "high"
}

SDK Examples

Node.js

// Submit batch
const batch = await client.batch({
  urls: [
    'https://site1.com',
    'https://site2.com', 
    'https://site3.com'
  ],
  delivery_method: 'webhook',
  webhook_url: 'https://your-app.com/webhook',
  concurrency: 10
});

// Monitor progress
const status = await client.getBatchStatus(batch.batch_id);
console.log(`Progress: ${status.progress.completed}/${status.progress.total}`);

Python

# Submit batch with per-URL settings
batch = client.batch({
    'requests': [
        {
            'url': 'https://site1.com',
            'render': True,
            'country': 'US'
        },
        {
            'url': 'https://site2.com',
            'format': 'json',
            'country': 'UK'
        }
    ],
    'delivery_method': 's3',
    's3_config': {
        'bucket': 'my-results',
        'key_prefix': 'batch_{{batch_id}}/'
    }
})

# Check results
results = client.get_batch_results(batch['batch_id'])

Rate Limits & Pricing

Rate Limits

Plan	Concurrent Batches	URLs per Batch	Max Concurrency
Starter	1	100	5
Professional	5	1,000	20
Enterprise	Unlimited	10,000+	100+

Pricing

Batch processing fee: $0.01 per batch
Per-URL pricing: Same as individual requests
Priority processing: +25% for high priority batches

Best Practices

Group similar URLs for optimal performance
Use appropriate concurrency for target sites
Implement proper error handling for partial failures
Monitor batch progress for large operations
Tag batches for easy organization and filtering
Set realistic timeouts based on target complexity

Overview

Web API

Functions

Vertical Endpoints

Reference

​Batch Processing

​Basic Batch Request

​Endpoint

​Simple Batch

​Response

​Advanced Batch Configuration

​Per-URL Settings

​Batch-Level Settings

​Batch Parameters

​Required Parameters

​Optional Parameters

​Monitoring Batch Progress

​Check Batch Status

​Response

​Individual Job Status

​Batch Results

​Webhook Delivery

​S3 Delivery Structure

​Polling Results

​Error Handling

​Partial Failures

​Retry Failed URLs

​Use Cases

​E-commerce Price Monitoring

​SEO Content Analysis

​Data Migration

​Performance Optimization

​Optimal Concurrency

​Resource Management

​SDK Examples

​Node.js

​Python

​Rate Limits & Pricing

​Rate Limits

​Pricing

​Best Practices

Batch Processing

Basic Batch Request

Endpoint

Simple Batch

Response

Advanced Batch Configuration

Per-URL Settings

Batch-Level Settings

Batch Parameters

Required Parameters

Optional Parameters

Monitoring Batch Progress

Check Batch Status

Response

Individual Job Status

Batch Results

Webhook Delivery

S3 Delivery Structure

Polling Results

Error Handling

Partial Failures

Retry Failed URLs

Use Cases

E-commerce Price Monitoring

SEO Content Analysis

Data Migration

Performance Optimization

Optimal Concurrency

Resource Management

SDK Examples

Node.js

Python

Rate Limits & Pricing

Rate Limits

Pricing

Best Practices