Create Chat Completion (Non-Streaming)
This endpoint takes a prompt and returns one or more predicted completions generated by the model. The non-streaming endpoint returns the entire response at once after generation is complete.
- Endpoint URL:
https://api.codingplanx.ai/v1/chat/completions - HTTP Method:
POST - Content-Type:
application/json
Request Parameters
Headers
| Parameter | Type | Required | Example | Description |
|---|---|---|---|---|
| Content-Type | string | Yes | application/json | The format of the request body. |
| Accept | string | Yes | application/json | The format of the response body. |
| Authorization | string | No | Bearer {{YOUR_API_KEY}} | Authentication token. |
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | - | The ID of the model to use (e.g., gpt-4o, gpt-3.5-turbo). |
| messages | array | Yes | - | A list of messages comprising the conversation. Each object must contain a role (system/user/assistant) and content. |
| temperature | number | No | 1 | Sampling temperature (0 to 2). Higher values make the output more random, while lower values make it more deterministic. |
| top_p | number | No | 1 | Nucleus sampling. Considers only the tokens comprising the top_p probability mass. We recommend altering this OR temperature, but not both. |
| n | integer | No | 1 | How many completion choices to generate for each input message. |
| stream | boolean | No | false | Whether to stream back partial progress. For this endpoint, it should be set to false. |
| stop | string/array | No | null | Up to 4 sequences where the API will stop generating further tokens. |
| max_tokens | integer | No | inf | The maximum number of tokens to generate in the completion. Restricted by the model's context length. |
| presence_penalty | number | No | 0 | Number between -2.0 and 2.0. Positive values increase the model's likelihood to talk about new topics. |
| frequency_penalty | number | No | 0 | Number between -2.0 and 2.0. Positive values decrease the model's likelihood to repeat the same lines verbatim. |
| logit_bias | object | No | null | Modifies the likelihood of specified tokens appearing in the completion. |
| user | string | No | - | A unique identifier representing your end-user, which can help monitor and detect abuse. |
| response_format | object | No | - | An object specifying the format that the model must output (e.g., { "type": "json_object" } enables JSON mode). |
| tools | array | No | - | A list of tools the model may call. Currently, only functions are supported. |
| tool_choice | string/obj | No | auto | Controls which (if any) tool is called by the model (none/auto/specific function). |
Response Parameters
Response Body
| Parameter | Type | Description |
|---|---|---|
| id | string | A unique identifier for the chat completion request. |
| object | string | The object type, which is always chat.completion. |
| created | integer | The Unix timestamp (in seconds) of when the completion was created. |
| choices | array | A list of chat completion choices. |
| ├─ index | integer | The index of the choice in the list of choices. |
| ├─ message | object | A chat completion message generated by the model. |
| │ ├─ role | string | The role of the author of this message (usually assistant). |
| │ └─ content | string | The textual content of the message. |
| └─ finish_reason | string | The reason the model stopped generating tokens (e.g., stop, length, tool_calls). |
| usage | object | Usage statistics for the completion request. |
| ├─ prompt_tokens | integer | Number of tokens in the prompt. |
| ├─ completion_tokens | integer | Number of tokens in the generated completion. |
| └─ total_tokens | integer | Total number of tokens used in the request (prompt + completion). |
Examples
Request Example (cURL)
curl --location --request POST 'https://api.codingplanx.ai/v1/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello, please introduce yourself."
}
],
"max_tokens": 1000
}'
Response Example
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I am an AI assistant provided by CodingPlanX. I am very happy to help you."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 20,
"total_tokens": 34
}
}
FAQs (Frequently Asked Questions)
Q1: How do I enable JSON mode?
A1: Set "response_format": { "type": "json_object" } in your request body. Note: When using JSON mode, you must explicitly instruct the model to produce JSON in your system or user message. Otherwise, the model may generate an endless stream of whitespace until it reaches the token limit.
Q2: How should I choose between temperature and top_p?
A2: We generally recommend altering only one of these parameters. If you need the model to be more creative and diverse, increase the temperature (e.g., 0.8). If you need more precise, factual, and deterministic answers, decrease it (e.g., 0.2).
Q3: Why is the finish_reason returned as length?
A3: This indicates that the generated response was truncated because it reached your specified max_tokens limit or exceeded the model's maximum context length window.
Q4: What is a Token? A4: Tokens are the fundamental units of text processed by the model. For Chinese text, one character is roughly equal to 1~2 tokens (for English, 1 token is approximately 4 characters or 0.75 words). API billing and context limitations are calculated based on the total token usage.
Q5: What is the difference between presence_penalty and frequency_penalty?
A5: presence_penalty applies a penalty if a token has appeared at all, encouraging the model to "talk about new topics." frequency_penalty applies a penalty proportional to how many times a token has already appeared, discouraging "repetitive phrasing."
Q6: What if I want a real-time, typewriter-like generation effect?
A6: Set the stream parameter to true. Please note that the response payload format for streaming requests differs from this document (non-streaming) and utilizes Server-Sent Events (SSE).