WebExtrator Intelligent Extraction API Integration Guide
POST https://api.acedata.cloud/webextrator/extract
Content extraction based on /webextrator/render. In addition to all parameters of the render interface, it additionally supports:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
expected_type |
string | ❌ | markdown |
Expected extraction output: markdown / article / text / links / structured |
enable_llm |
boolean | ❌ | false | Enable LLM post-processing (suitable for article / structured) |
instruction |
string | ❌ | - | LLM extraction instructions, e.g., "extract product title, price, specifications" |
¶ Synchronous Response
{
"success": true,
"task_id": "550e8400-...",
"trace_id": "550e8400-...",
"started_at": "2026-05-02T10:30:00.123Z",
"finished_at": "2026-05-02T10:30:08.789Z",
"elapsed": 8.666,
"data": {
"kind": "extract",
"expected_type": "article",
"url": "https://example.com/post/1",
"title": "示例文章",
"author": "张三",
"published_at": "2026-05-01",
"content": "# 示例文章\n\n正文 ...",
"summary": "本文介绍 ..."
}
}
Asynchronous mode, error codes, and billing rules are exactly the same as /webextrator/render.
¶ Example: Extract Article Content (Enable LLM)
curl -X POST https://api.acedata.cloud/webextrator/extract \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/news/1",
"expected_type": "article",
"enable_llm": true
}'
¶ Example: Async + Custom Structured Extraction
curl -X POST https://api.acedata.cloud/webextrator/extract \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://shop.example.com/item/123",
"expected_type": "structured",
"enable_llm": true,
"instruction": "抽取商品标题、价格、库存、3 张主图 URL",
"callback_url": "https://your-domain.com/wbx-callback"
}'
