Chat Completions API Documentation
This endpoint is used to provide a prompt, and the model will return one or more predicted completions. You can generate text based on the conversation history and control the randomness, length, and format of the generated output.
- Endpoint Name: List Models / Chat Completions
- HTTP Method:
GET(Note: Based on the provided data definition) - Endpoint URL:
https://api.codingplanx.ai/v1/models - Status: Released
1. Request Parameters
1.1 Header Parameters
| Parameter | Required | Type | Example Value | Description |
|---|---|---|---|---|
| Content-Type | Yes | String | application/json | Request body format. |
| Accept | Yes | String | application/json | Expected response format. |
| Authorization | No | String | Bearer {{YOUR_API_KEY}} | API key used for authentication. |
1.2 Body Parameters
Note: Although originally defined as a
GETrequest, based on typical business logic, the following parameters are usually passed via the JSON Body.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | - | ID of the model to use. Example: gpt-3.5-turbo. |
| messages | array | Yes | - | A list of messages comprising the conversation so far. Each message contains a role and content. |
| temperature | number | No | 1 | Sampling temperature, between 0 and 2. Higher values will make the output more random. |
| top_p | number | No | 1 | Nucleus sampling. 0.1 means only the tokens comprising the top 10% probability mass are considered. |
| n | integer | No | 1 | How many chat completion choices to generate for each input message. |
| stream | boolean | No | false | Whether to stream back partial progress. If set to true, partial message deltas will be sent via Server-Sent Events (SSE). |
| stop | string/array | No | null | Stop sequence(s). The model will stop generating further tokens when it encounters these characters. |
| max_tokens | integer | No | inf | The maximum number of tokens to generate in the chat completion. |
| presence_penalty | number | No | 0 | Number between -2.0 and 2.0. Positive values increase the model's likelihood to talk about new topics. |
| frequency_penalty | number | No | 0 | Number between -2.0 and 2.0. Positive values decrease the model's likelihood to repeat the same line verbatim. |
| logit_bias | object | No | null | Modify the likelihood of specified tokens appearing in the completion. |
| user | string | No | - | A unique identifier representing your end-user, which can help monitor and detect abuse. |
| response_format | object | No | - | Specifies the format that the model must output. E.g., { "type": "json_object" } enables JSON mode. |
| tools | array | No | - | A list of tools the model may call (such as functions). |
| tool_choice | object | No | auto | Controls which (if any) function is called by the model. |
2. Response Parameters
| Parameter | Type | Description |
|---|---|---|
| id | string | A unique identifier for the chat completion request. |
| object | string | The object type, which is typically chat.completion. |
| created | integer | The Unix timestamp (in seconds) of when the request was created. |
| choices | array | A list of chat completion choices. |
| ├─ index | integer | The index of the choice in the list. |
| ├─ message | object | The specific message generated by the model. |
| │ ├─ role | string | The role of the author of this message, typically assistant. |
| │ └─ content | string | The textual content of the message. |
| └─ finish_reason | string | The reason the model stopped generating tokens. E.g., stop (natural stop) or length (reached maximum length). |
| usage | object | Usage statistics for the completion request. |
| ├─ prompt_tokens | integer | Number of tokens consumed by the input prompt. |
| ├─ completion_tokens | integer | Number of tokens generated in the output. |
| └─ total_tokens | integer | Total number of tokens consumed (prompt + completion). |
3. Request Examples
Successful Request Example
curl --location --request GET 'https://api.codingplanx.ai/v1/models' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hello there, how may I assist you today?"
}
],
"temperature": 1,
"top_p": 1,
"n": 1,
"stream": false,
"user": "user-1234"
}'
Successful Response Example
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\r
\r
Hello there, how may I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
4. FAQs (Frequently Asked Questions)
Q: Why does the documentation show a GET method, but it looks like a POST request?
A: In standard OpenAI specifications, creating completions generally uses the POST method. However, based on the provided metadata, this endpoint is defined as GET. Please verify the actual server-side configuration when making API calls. If GET does not work, try using POST.
Q: How should I choose between temperature and top_p?
A: We generally recommend altering either temperature or top_p, but not both. If you want precise and deterministic answers, lower the temperature (e.g., 0.2). If you want more diverse and creative responses, increase it.
Q: How do I enable JSON mode?
A: You need to set { "type": "json_object" } within the response_format parameter, and explicitly instruct the model (via System or User messages) to output in JSON format. Otherwise, the model may generate parsing errors or an endless stream of whitespace.
Q: What is the difference between presence_penalty and frequency_penalty?
A:
presence_penalty: Focuses on penalizing topics that have already appeared, encouraging the model to discuss new topics.frequency_penalty: Focuses on penalizing exact words/phrases that appear frequently, preventing the model from repeating the same sentences within a short period.
Q: How does streaming output (stream: true) work?
A: When streaming is enabled, the API returns data word-by-word via Server-Sent Events (SSE). Each data chunk is a JSON string prefixed with data: , and the stream is terminated with a data: [DONE] message. This is highly useful for displaying long-form text generation in real-time on UI/UX interfaces.