Chat Completions API Documentation

This endpoint is used to provide a prompt, and the model will return one or more predicted completions. You can generate text based on the conversation history and control the randomness, length, and format of the generated output.

Endpoint Name: List Models / Chat Completions
HTTP Method: GET (Note: Based on the provided data definition)
Endpoint URL: https://api.codingplanx.ai/v1/models
Status: Released

1. Request Parameters

1.1 Header Parameters

Parameter	Required	Type	Example Value	Description
Content-Type	Yes	String	`application/json`	Request body format.
Accept	Yes	String	`application/json`	Expected response format.
Authorization	No	String	`Bearer {{YOUR_API_KEY}}`	API key used for authentication.

1.2 Body Parameters

Note: Although originally defined as a GET request, based on typical business logic, the following parameters are usually passed via the JSON Body.

Parameter	Type	Required	Default	Description
model	string	Yes	-	ID of the model to use. Example: `gpt-3.5-turbo`.
messages	array	Yes	-	A list of messages comprising the conversation so far. Each message contains a `role` and `content`.
temperature	number	No	1	Sampling temperature, between 0 and 2. Higher values will make the output more random.
top_p	number	No	1	Nucleus sampling. 0.1 means only the tokens comprising the top 10% probability mass are considered.
n	integer	No	1	How many chat completion choices to generate for each input message.
stream	boolean	No	false	Whether to stream back partial progress. If set to true, partial message deltas will be sent via Server-Sent Events (SSE).
stop	string/array	No	null	Stop sequence(s). The model will stop generating further tokens when it encounters these characters.
max_tokens	integer	No	inf	The maximum number of tokens to generate in the chat completion.
presence_penalty	number	No	0	Number between -2.0 and 2.0. Positive values increase the model's likelihood to talk about new topics.
frequency_penalty	number	No	0	Number between -2.0 and 2.0. Positive values decrease the model's likelihood to repeat the same line verbatim.
logit_bias	object	No	null	Modify the likelihood of specified tokens appearing in the completion.
user	string	No	-	A unique identifier representing your end-user, which can help monitor and detect abuse.
response_format	object	No	-	Specifies the format that the model must output. E.g., `{ "type": "json_object" }` enables JSON mode.
tools	array	No	-	A list of tools the model may call (such as functions).
tool_choice	object	No	auto	Controls which (if any) function is called by the model.

2. Response Parameters

Parameter	Type	Description
id	string	A unique identifier for the chat completion request.
object	string	The object type, which is typically `chat.completion`.
created	integer	The Unix timestamp (in seconds) of when the request was created.
choices	array	A list of chat completion choices.
├─ index	integer	The index of the choice in the list.
├─ message	object	The specific message generated by the model.
│ ├─ role	string	The role of the author of this message, typically `assistant`.
│ └─ content	string	The textual content of the message.
└─ finish_reason	string	The reason the model stopped generating tokens. E.g., `stop` (natural stop) or `length` (reached maximum length).
usage	object	Usage statistics for the completion request.
├─ prompt_tokens	integer	Number of tokens consumed by the input prompt.
├─ completion_tokens	integer	Number of tokens generated in the output.
└─ total_tokens	integer	Total number of tokens consumed (prompt + completion).

3. Request Examples

Successful Request Example

curl --location --request GET 'https://api.codingplanx.ai/v1/models' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Hello there, how may I assist you today?"
    }
  ],
  "temperature": 1,
  "top_p": 1,
  "n": 1,
  "stream": false,
  "user": "user-1234"
}'

Successful Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\r
\r
Hello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

4. FAQs (Frequently Asked Questions)

Q: Why does the documentation show a GET method, but it looks like a POST request? A: In standard OpenAI specifications, creating completions generally uses the POST method. However, based on the provided metadata, this endpoint is defined as GET. Please verify the actual server-side configuration when making API calls. If GET does not work, try using POST.

Q: How should I choose between temperature and top_p? A: We generally recommend altering either temperature or top_p, but not both. If you want precise and deterministic answers, lower the temperature (e.g., 0.2). If you want more diverse and creative responses, increase it.

Q: How do I enable JSON mode? A: You need to set { "type": "json_object" } within the response_format parameter, and explicitly instruct the model (via System or User messages) to output in JSON format. Otherwise, the model may generate parsing errors or an endless stream of whitespace.

Q: What is the difference between presence_penalty and frequency_penalty? A:

presence_penalty: Focuses on penalizing topics that have already appeared, encouraging the model to discuss new topics.
frequency_penalty: Focuses on penalizing exact words/phrases that appear frequently, preventing the model from repeating the same sentences within a short period.

Q: How does streaming output (stream: true) work? A: When streaming is enabled, the API returns data word-by-word via Server-Sent Events (SSE). Each data chunk is a JSON string prefixed with data: , and the stream is terminated with a data: [DONE] message. This is highly useful for displaying long-form text generation in real-time on UI/UX interfaces.