Chat Completions API Documentation

This API is used to provide a series of conversation messages, and the model will return one or more predicted completion results. It supports streaming output, tool calling (Function Calling), and various sampling parameter adjustments.

1. Interface Information

Interface Name: Official N Test
HTTP Method: POST
Endpoint URL: https://api.codingplanx.ai/v1/chat/completions
Content-Type: application/json
Authentication: Bearer {{YOUR_API_KEY}}

2. Request Headers

Parameter	Required	Type	Description	Example
Content-Type	Yes	string	Media type identifier	`application/json`
Accept	Yes	string	Response format accepted by the client	`application/json`
Authorization	No	string	API Access Token	`Bearer sk-xxxxxx`

3. Request Body

Parameter	Required	Type	Description
model	Yes	string	ID of the model to use (e.g., `gpt-4o`, `gpt-3.5-turbo`).
messages	Yes	array	A list of messages comprising the conversation. Each object contains `role` (system/user/assistant) and `content`.
tools	Yes	array	A list of tools the model may call. Currently, only functions are supported.
tool_choice	Yes	object	Controls which (if any) tool is called by the model. Options: `none`, `auto`, or a specific function.
temperature	No	number	Sampling temperature (0-2). Higher values mean more random, lower values mean more deterministic. It is recommended to alter this or `top_p` but not both.
top_p	No	number	Nucleus sampling. 0.1 means only tokens comprising the top 10% probability mass are considered.
n	No	integer	Defaults to 1. The number of chat completion choices to generate for each input message.
stream	No	boolean	Defaults to false. If true, tokens are sent as data-only server-sent events as they become available.
stop	No	string	Stop sequences. The model will stop generating further tokens when these characters are encountered.
max_tokens	No	integer	The maximum number of tokens to generate. Defaults to inf.
presence_penalty	No	number	Between -2.0 and 2.0. Penalizes new tokens based on whether they appear in the text so far, increasing the likelihood of talking about new topics.
frequency_penalty	No	number	Between -2.0 and 2.0. Penalizes new tokens based on their existing frequency in the text, decreasing the likelihood of repetition.
logit_bias	No	object	Modifies the likelihood of specified tokens appearing in the completion.
user	No	string	A unique identifier representing your end-user, which can help in monitoring and detecting abuse.
response_format	No	object	Specifies the format that the model must output. For example, `{ "type": "json_object" }` enables JSON mode.
seed	No	integer	Experimental feature. If specified, the system will make a best effort to sample deterministically for reproducible outputs.

4. Response Parameters

Parameter	Type	Description
id	string	A unique identifier for the chat completion.
object	string	The object type, which is always `chat.completion`.
created	integer	The Unix timestamp (in seconds) of when the chat completion was created.
choices	array	A list of chat completion choices.
├─ index	integer	The index of the choice in the list of choices.
├─ message	object	A chat completion message generated by the model.
│ ├─ role	string	The role of the author of this message (usually `assistant`).
│ └─ content	string	The contents of the message.
└─ finish_reason	string	The reason the model stopped generating (e.g., `stop`, `length`, `tool_calls`).
usage	object	Usage statistics for the completion request.
├─ prompt_tokens	integer	Number of tokens in the prompt.
├─ completion_tokens	integer	Number of tokens in the generated completion.
└─ total_tokens	integer	Total number of tokens used in the request (prompt + completion).

5. Example

Request Example

{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "who are u?"
        }
    ],
    "n": 1,
    "max_tokens": 100,
    "temperature": 0.8,
    "stream": false
}

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\r
\r
Hello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

6. FAQs

Q: Why do I get an error when I set response_format: { "type": "json_object" }? A: When using JSON mode, you must explicitly instruct the model to produce JSON via a system or user message (e.g., "Please reply in JSON format"). Otherwise, the model might fail or hang because it cannot generate valid JSON.

Q: Can I adjust temperature and top_p at the same time? A: Technically yes, but it is generally recommended to adjust only one of them. Adjusting one is usually sufficient to change the randomness of the output; adjusting both can lead to unpredictable results.

Q: How do I receive data when stream: true is enabled? A: When streaming is enabled, the server sends a series of Server-Sent Events (SSE). The data portion of each event is a JSON object, and the stream ends with data: [DONE]. You need to use a streaming library (like Python's response.iter_lines()) to process the response line by line.

Q: How is token consumption calculated? A: For English, 1 token is approximately 4 characters or 0.75 words. For Chinese, one character may correspond to 1-2 tokens. The final consumption is based on the usage field in the response body.

Q: What does finish_reason: "length" mean? A: This indicates that the generated content exceeded the limit set in max_tokens or reached the model's maximum context length limit, resulting in the content being truncated.

Q: Does the API support Function Calling? A: Yes. By defining function prototypes via the tools parameter, the model will return tool_calls in choices[0].message instead of plain content when a function call is required.