Skip to main content

Batch Processing

Process multiple URLs simultaneously for efficient large-scale data extraction.

Basic Batch Request

Endpoint

POST https://api.nimbleway.com/v1/batch

Simple Batch

curl -X POST "https://api.nimbleway.com/v1/batch" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example1.com",
      "https://example2.com", 
      "https://example3.com"
    ],
    "delivery_method": "webhook",
    "webhook_url": "https://your-app.com/webhook"
  }'

Response

{
  "batch_id": "batch_123456",
  "status": "queued",
  "total_urls": 3,
  "estimated_completion": "2024-01-15T10:35:00Z",
  "urls": [
    {
      "url": "https://example1.com",
      "job_id": "job_001",
      "status": "queued"
    },
    {
      "url": "https://example2.com", 
      "job_id": "job_002",
      "status": "queued"
    },
    {
      "url": "https://example3.com",
      "job_id": "job_003", 
      "status": "queued"
    }
  ]
}

Advanced Batch Configuration

Per-URL Settings

{
  "requests": [
    {
      "url": "https://ecommerce1.com/products",
      "country": "US",
      "render": true,
      "actions": [
        {
          "type": "wait_and_click",
          "selector": ".load-more"
        }
      ]
    },
    {
      "url": "https://ecommerce2.com/products",
      "country": "UK", 
      "format": "json",
      "headers": {
        "Accept": "application/json"
      }
    }
  ],
  "delivery_method": "s3",
  "s3_config": {
    "bucket": "my-scraping-results",
    "key_prefix": "batch_{{batch_id}}/"
  }
}

Batch-Level Settings

{
  "urls": ["https://site1.com", "https://site2.com"],
  "global_settings": {
    "timeout": 30,
    "retries": 2,
    "render": false,
    "country": "US"
  },
  "concurrency": 10,
  "rate_limit": "5/second"
}

Batch Parameters

Required Parameters

ParameterTypeDescription
urls or requestsarrayList of URLs or detailed request objects

Optional Parameters

ParameterTypeDefaultDescription
delivery_methodstringwebhookHow to deliver results
concurrencyinteger5Number of simultaneous requests
rate_limitstring10/secondRate limiting for requests
prioritystringnormalBatch priority: low, normal, high
tagsarray[]Tags for organizing batches

Monitoring Batch Progress

Check Batch Status

curl -X GET "https://api.nimbleway.com/v1/batch/batch_123456" \
  -H "Authorization: Bearer your-api-key"

Response

{
  "batch_id": "batch_123456",
  "status": "processing",
  "progress": {
    "completed": 150,
    "failed": 5,
    "pending": 45,
    "total": 200
  },
  "started_at": "2024-01-15T10:30:00Z",
  "estimated_completion": "2024-01-15T10:45:00Z"
}

Individual Job Status

curl -X GET "https://api.nimbleway.com/v1/jobs/job_001" \
  -H "Authorization: Bearer your-api-key"

Batch Results

Webhook Delivery

Results are delivered individually as each URL completes:
{
  "batch_id": "batch_123456",
  "job_id": "job_001",
  "url": "https://example1.com",
  "status": "completed",
  "html": "<!DOCTYPE html>...",
  "completed_at": "2024-01-15T10:32:15Z"
}

S3 Delivery Structure

my-bucket/
├── batch_123456/
│   ├── job_001.json
│   ├── job_002.json
│   ├── job_003.json
│   └── batch_summary.json

Polling Results

# Get all completed results
curl -X GET "https://api.nimbleway.com/v1/batch/batch_123456/results" \
  -H "Authorization: Bearer your-api-key"

Error Handling

Partial Failures

{
  "batch_id": "batch_123456",
  "status": "completed_with_errors",
  "results": {
    "successful": 180,
    "failed": 20,
    "total": 200
  },
  "failed_urls": [
    {
      "url": "https://timeout-site.com",
      "error": "TIMEOUT_EXCEEDED",
      "job_id": "job_105"
    },
    {
      "url": "https://blocked-site.com", 
      "error": "ACCESS_DENIED",
      "job_id": "job_147"
    }
  ]
}

Retry Failed URLs

curl -X POST "https://api.nimbleway.com/v1/batch/batch_123456/retry" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "job_ids": ["job_105", "job_147"],
    "retry_settings": {
      "timeout": 60,
      "country": "CA"
    }
  }'

Use Cases

E-commerce Price Monitoring

{
  "requests": [
    {
      "url": "https://shop1.com/product/123",
      "actions": [
        {"type": "wait_for_selector", "selector": ".price"}
      ]
    },
    {
      "url": "https://shop2.com/item/456", 
      "actions": [
        {"type": "wait_and_click", "selector": ".show-price"}
      ]
    }
  ],
  "delivery_method": "webhook",
  "tags": ["price-monitoring", "daily-check"]
}

SEO Content Analysis

{
  "urls": [
    "https://competitor1.com/blog",
    "https://competitor2.com/resources",
    "https://competitor3.com/guides"
  ],
  "global_settings": {
    "render": true,
    "actions": [
      {"type": "scroll", "direction": "down"},
      {"type": "wait", "duration": 3000}
    ]
  },
  "tags": ["seo-analysis", "content-audit"]
}

Data Migration

{
  "requests": [
    /* 1000+ URLs with different configurations */
  ],
  "concurrency": 20,
  "delivery_method": "s3",
  "s3_config": {
    "bucket": "migration-data",
    "key_prefix": "scraped_content/{{date}}/"
  },
  "priority": "high"
}

Performance Optimization

Optimal Concurrency

  • Low concurrency (1-5): Respectful scraping, avoid rate limits
  • Medium concurrency (5-15): Balanced speed and stability
  • High concurrency (15-50): Maximum speed for robust targets

Resource Management

{
  "urls": ["https://heavy-site1.com", "https://heavy-site2.com"],
  "global_settings": {
    "timeout": 120,
    "render": false
  },
  "concurrency": 3,
  "memory_limit": "high"
}

SDK Examples

Node.js

// Submit batch
const batch = await client.batch({
  urls: [
    'https://site1.com',
    'https://site2.com', 
    'https://site3.com'
  ],
  delivery_method: 'webhook',
  webhook_url: 'https://your-app.com/webhook',
  concurrency: 10
});

// Monitor progress
const status = await client.getBatchStatus(batch.batch_id);
console.log(`Progress: ${status.progress.completed}/${status.progress.total}`);

Python

# Submit batch with per-URL settings
batch = client.batch({
    'requests': [
        {
            'url': 'https://site1.com',
            'render': True,
            'country': 'US'
        },
        {
            'url': 'https://site2.com',
            'format': 'json',
            'country': 'UK'
        }
    ],
    'delivery_method': 's3',
    's3_config': {
        'bucket': 'my-results',
        'key_prefix': 'batch_{{batch_id}}/'
    }
})

# Check results
results = client.get_batch_results(batch['batch_id'])

Rate Limits & Pricing

Rate Limits

PlanConcurrent BatchesURLs per BatchMax Concurrency
Starter11005
Professional51,00020
EnterpriseUnlimited10,000+100+

Pricing

  • Batch processing fee: $0.01 per batch
  • Per-URL pricing: Same as individual requests
  • Priority processing: +25% for high priority batches

Best Practices

  • Group similar URLs for optimal performance
  • Use appropriate concurrency for target sites
  • Implement proper error handling for partial failures
  • Monitor batch progress for large operations
  • Tag batches for easy organization and filtering
  • Set realistic timeouts based on target complexity