How do I call the OpenAI Chat Completions API endpoint?

Send an HTTP request to https://api.acedata.cloud/openai/chat/completions with a Bearer token in the Authorization header. See the documentation for request parameters and response format.

What authentication does OpenAI generation require?

All Ace Data Cloud APIs use Bearer token authentication. Get your API key from the platform dashboard at https://platform.acedata.cloud and include it as: Authorization: Bearer YOUR_TOKEN

How much does OpenAI generation cost?

OpenAI generation uses pay-as-you-go pricing with no monthly fees. Each API call costs credits based on the model and parameters used. Visit the service page for detailed pricing.

Ace Data Cloud

OpenAI Chat Completion API

POSThttps://api.acedata.cloud/openai/chat/completions

POSThttps://api.acedata.cloud/v1/chat/completions

OpenAI Chat Completion API, compatible with the official API format.

Related Products

ChatGPT - Ace Data Cloud

ChatGPT dialogue generation platform, directly connected to this API, fast, stable, and available.

https://hub.acedata.cloud/chatgpt/conversations

Request Headers

acceptstring

Specify the response format returned by the server.

Please select

authorizationstring

Bearer token

Request Body

modelstringRequired parameter

Model ID to be used.

messagesarrayRequired parameter

List of messages that make up the current conversation.

streamboolean

If set to true, it will send partial messages incrementally like ChatGPT.

Please select

stream_optionsobject

Options for streaming responses. Use only when setting `stream: true`.

include_usageboolean

If set, an additional chunk will be streamed back before the `data: [DONE]` message. The `usage` field of this chunk displays the total token usage statistics for the entire request, while the `choices` field will always be an empty array.

Please select

max_tokensnumber

The maximum number of tokens that can be generated in dialogue completion.

max_completion_tokensinteger

The maximum number of tokens that can be generated for a single completion, including visible output tokens and inference tokens. For newer models (o1/o3/o4/gpt-5 series and other inference models), this parameter replaces `max_tokens`.

reasoning_effortstring

The inference input level of the constraint reasoning model (o1/o3/o4/gpt-5 series). Currently supports `minimal`, `low`, `medium`, and `high`. Reducing the inference input can lead to faster responses and decrease the number of tokens used for inference in the responses.

Please select

modalitiesarray

You hope for the output type generated by the model. Most models default to `["text"]`. The `gpt-4o-audio-preview` model can also be used to generate audio. If you want the model to generate both text and audio responses, you can use `["text", "audio"]`.

audioobject

Audio output parameters. Required when requesting audio output via `modalities: ["audio"]`.

voicestringRequired parameter

The tone used by the model in its responses.

formatstringRequired parameter

Specify the output audio format.

predictionobject

Static prediction output content, such as the content of a text file that is being regenerated. It can be used to shorten response time when most of the response content is already known in advance.

typestringRequired parameter

Predict the type of content, fixed as `content`.

contentRequired parameter

Content for prediction.

nnumber

Generate how many dialogue completion options for each input message.

toolsarray

List of tools that can be called by the model. Currently, only functions are supported as tools.

tool_choice

Control which tool the model calls (if any). `none` means the model will not call any tools; `auto` means the model can choose between generating a message or calling one/multiple tools; `required` means the model must call one or more tools.

parallel_tool_callsboolean

Whether to enable parallel function calls during tool invocation.

Please select

web_search_optionsobject

Options for web search tools, for use with the GPT-4o model that has built-in search (`gpt-4o-search-preview`, `gpt-4o-mini-search-preview`).

user_locationobject

Approximate location parameters for search.

typestring

Please select

approximateobject

citystring

regionstring

countrystring

timezonestring

search_context_sizestring

High-level guidelines on the size of the context window space occupied by the search.

Please select

response_format

The object format that the specified model must output. Setting it to `{ "type": "json_schema", "json_schema": {...} }` enables structured output; setting it to `{ "type": "json_object" }` enables the older JSON Object mode.

temperaturenumber

The sampling temperature used ranges from 0 to 2. Higher values (such as 0.8) will make the output more random, while lower values (such as 0.2) will make the output more focused and certain.

top_pnumber

An alternative method for temperature sampling, called nucleus sampling, considers only the portion of tokens whose cumulative probability mass reaches top_p. For example, 0.1 means only considering the tokens that account for the top 10% of cumulative probability mass. It is generally recommended to adjust only one of these parameters or `temperature`, rather than adjusting both simultaneously.

frequency_penaltynumber

The value ranges from -2.0 to 2.0. Positive values will penalize based on the frequency of the token appearing in the generated text, thereby reducing the likelihood of the model repeating the same line word for word.

presence_penaltynumber

The value ranges from -2.0 to 2.0. Positive values will penalize based on whether the token has already appeared in the generated text, thereby increasing the model's likelihood of discussing new topics.

seedinteger

If specified, the system will strive to perform deterministic sampling, so that repeated requests using the same `seed` and parameters return the same results. Determinism cannot be guaranteed; please refer to the `system_fingerprint` in the response to monitor backend changes.

stop

A maximum of 4 sequences, the API will stop generating when it reaches these sequences. The returned text does not include the stopping sequences.

logprobsboolean

Whether to return the logarithmic probability of the output tokens. If true, it will return the logarithmic probability of each output token in the `content` of `message`.

Please select

top_logprobsinteger

The integer value between 0 and 20 specifies the number of most likely tokens to return at each token position, each accompanied by the corresponding log probability. When using this parameter, `logprobs` must be set to `true`.

logit_biasobject

Modify the likelihood of a specified token appearing in the completion. Accept a JSON object that maps the token (specified by its token ID in the tokenizer) to a bias value between -100 and 100.

userstring

A unique identifier representing your end users, which helps the platform monitor and detect abusive behavior.

service_tierstring

Specify the processing type used for the designated service request. Set to `auto` to let the platform choose automatically, `default` for standard pricing/performance, `flex` for a slower but cheaper Flex processing layer, and `priority` for the fastest processing layer on supported models.

Please select

storeboolean

Whether to store the output of this conversation completion request for model distillation or product evaluation.

Please select

metadataobject

A set of key-value pairs that can be attached to an object, with a maximum of 16 pairs. It can be used to store additional information about the object in a structured format. The key is a string of up to 64 characters, and the value is a string of up to 512 characters.

Response

OK, request successful.

Response Body

idstring

The unique identifier for dialogue completion.

modelstring

Model used for completing this conversation.

usageobject

Usage statistics for this completion request.

total_tokensinteger

The total number of tokens used in this request (prompt + completion).

prompt_tokensinteger

The number of tokens in the prompt.

completion_tokensinteger

The number of tokens in the generated completion content.

prompt_tokens_detailsobject

Details of token usage for prompts.

audio_tokensinteger

The number of audio input tokens in the prompt.

cached_tokensinteger

The number of tokens hit in the cache in the prompt.

completion_tokens_detailsobject

Complete the token usage details.

audio_tokensinteger

The number of tokens in the audio output generated by the model.

reasoning_tokensinteger

The number of tokens generated by the model for inference.

accepted_prediction_tokensinteger

The number of tokens that appear in the completion of the predicted content when using predicted outputs.

rejected_prediction_tokensinteger

The number of tokens in the predicted content that do not appear in the completion when using predicted outputs.

objectstring

Object type, fixed as `chat.completion`.

choicesarray

Dialogue completion options list. If `n` is greater than 1, it may include multiple options.

Array of

indexnumber

The index of the option in the list of options.

messageobject

A message completing a dialogue generated by the model.

rolestring

The role of the author of the message.

audioobject

When the audio output modality is requested, this object contains the audio response data returned by the model.

idstring

The unique identifier for the audio reply.

datastring

Base64 encoded audio bytes generated by the model.

expires_atinteger

The audio reply is no longer available on the server (for multi-turn conversations) Unix timestamp (seconds).

transcriptstring

Text transcription of audio generated by the model.

contentstring

Content of the message.

refusalstring

The message generated by the model rejecting the answer.

tool_callsarray

Tool calls generated by the model, such as function calls.

Array of

idstring

The ID of the tool call.

typestring

The type of tool invocation. Currently, only `function` is supported.

functionobject

Function for model invocation.

namestring

The name of the function to be called.

argumentsstring

The parameters used when calling the function are generated by the model in JSON format.

logprobsobject

The logarithmic probability information of this option.

finish_reasonstring

Reasons for the model to stop generating tokens. If the model reaches a natural stopping point or hits the provided stopping sequence, it is `stop`; if it reaches the maximum token number specified in the request, it is `length`; if content is omitted due to content filter flags, it is `content_filter`; if the model called a tool, it is `tool_calls`; if the model called a function (deprecated), it is `function_call`.

creatednumber

Create the Unix timestamp (seconds) when this conversation is completed.

system_fingerprintstring

This fingerprint indicates the backend configuration used for the model run. It can be used in conjunction with the request parameter `seed` to understand when changes that may affect determinism occurred in the backend.

Allow Use General Balance

When 'Allow General Balance' is enabled, the general balance is used automatically if an app's balance is insufficient.

Shell

Python

JavaScript

Java

PHP

Kind reminder: For streaming requests, the above code may not be fully applicable. Please refer to the integration documentation for changes.

Related Products

Request Headers

Request Body

Response

Response Body

Array of

Array of

Response Body

Example

Response Body

Example

Response Body

Example

Response Body

Example

Response Body

Example