WebExtrator Webpage Rendering and Extraction API

WebExtrator is a high-concurrency webpage rendering and intelligent content extraction service launched by Ace Data Cloud. Based on headless browser + LLM post-processing, a single call can obtain the true result of a target page after JavaScript rendering, and it supports extracting webpage content into various formats such as structured fields, Markdown body, plain text, and more.

Core Features

  • Real Browser Rendering: Uses Chromium for complete page loading, supports SPA / asynchronous rendering, Cookie injection, custom User-Agent, custom Header.
  • Multiple Outputs: Returns page HTML, visible text, Markdown, link list, page screenshot (base64 PNG), etc.
  • Intelligent Extraction: Optional activation of LLM extraction mode to automatically extract structured fields like “article content,” “product info,” “comment list,” etc., based on user descriptions.
  • Sync / Async Dual Modes: Defaults to synchronous response; passing in callback_url switches to asynchronous mode, where results are pushed back via HTTP POST.
  • Unified Task Records: Each call includes task_id / trace_id. You can query task history within 7 days via /webextrator/tasks for batch or individual retrieval.

Use Cases

  • Massive e-commerce / news / government webpage crawling
  • Backend infrastructure for AI agent browsing capabilities
  • Structuring webpage data for Knowledge Base / RAG systems
  • SEO monitoring, competitor analysis, public opinion analysis

API List

Interface Path Description
Webpage Rendering POST /webextrator/render Render and return HTML / screenshot / text only
Smart Extraction POST /webextrator/extract Rendered page plus structured / Markdown / article content extraction
Task Query POST /webextrator/tasks Query historical tasks by task_id or trace_id (free)

Billing Details

  • Rendering (/webextrator/render): 0.005 Credits per call
  • Extraction (/webextrator/extract): 0.005 Credits per call
  • Task Query (/webextrator/tasks): Free

Calls resulting in failures (4xx / 5xx) do not incur charges by default; when enabling asynchronous callback_url, the initial response will include x-usage-exempt: true, and charges are settled after callback completion.

Quick Start

Send your API Key to the interface:

curl -X POST https://api.acedata.cloud/webextrator/render \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "wait_until": "networkidle"
  }'

For more usage details, refer to the Integration Guide section.