WebExtrator Webpage Rendering and Extraction API
WebExtrator is a high-concurrency webpage rendering and intelligent content extraction service launched by Ace Data Cloud. Based on headless browser + LLM post-processing, a single call can obtain the true result of a target page after JavaScript rendering, and it supports extracting webpage content into various formats such as structured fields, Markdown body, plain text, and more.
¶ Core Features
- Real Browser Rendering: Uses Chromium for complete page loading, supports SPA / asynchronous rendering, Cookie injection, custom User-Agent, custom Header.
- Multiple Outputs: Returns page HTML, visible text, Markdown, link list, page screenshot (base64 PNG), etc.
- Intelligent Extraction: Optional activation of LLM extraction mode to automatically extract structured fields like “article content,” “product info,” “comment list,” etc., based on user descriptions.
- Sync / Async Dual Modes: Defaults to synchronous response; passing in
callback_urlswitches to asynchronous mode, where results are pushed back via HTTP POST. - Unified Task Records: Each call includes
task_id/trace_id. You can query task history within 7 days via/webextrator/tasksfor batch or individual retrieval.
¶ Use Cases
- Massive e-commerce / news / government webpage crawling
- Backend infrastructure for AI agent browsing capabilities
- Structuring webpage data for Knowledge Base / RAG systems
- SEO monitoring, competitor analysis, public opinion analysis
¶ API List
| Interface | Path | Description |
|---|---|---|
| Webpage Rendering | POST /webextrator/render |
Render and return HTML / screenshot / text only |
| Smart Extraction | POST /webextrator/extract |
Rendered page plus structured / Markdown / article content extraction |
| Task Query | POST /webextrator/tasks |
Query historical tasks by task_id or trace_id (free) |
¶ Billing Details
- Rendering (
/webextrator/render): 0.005 Credits per call - Extraction (
/webextrator/extract): 0.005 Credits per call - Task Query (
/webextrator/tasks): Free
Calls resulting in failures (4xx / 5xx) do not incur charges by default; when enabling asynchronous
callback_url, the initial response will includex-usage-exempt: true, and charges are settled after callback completion.
¶ Quick Start
Send your API Key to the interface:
curl -X POST https://api.acedata.cloud/webextrator/render \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"wait_until": "networkidle"
}'
For more usage details, refer to the Integration Guide section.
Introduction
APIs
Pricing
