Chat Completions API Documentation

This API is used to provide a series of conversation messages, and the model will return one or more predicted completion results. It supports streaming output, tool calling (Function Calling), and various sampling parameter adjustments.

1. Interface Information

  • Interface Name: Official N Test
  • HTTP Method: POST
  • Endpoint URL: https://api.codingplanx.ai/v1/chat/completions
  • Content-Type: application/json
  • Authentication: Bearer {{YOUR_API_KEY}}

2. Request Headers

ParameterRequiredTypeDescriptionExample
Content-TypeYesstringMedia type identifierapplication/json
AcceptYesstringResponse format accepted by the clientapplication/json
AuthorizationNostringAPI Access TokenBearer sk-xxxxxx

3. Request Body

ParameterRequiredTypeDescription
modelYesstringID of the model to use (e.g., gpt-4o, gpt-3.5-turbo).
messagesYesarrayA list of messages comprising the conversation. Each object contains role (system/user/assistant) and content.
toolsYesarrayA list of tools the model may call. Currently, only functions are supported.
tool_choiceYesobjectControls which (if any) tool is called by the model. Options: none, auto, or a specific function.
temperatureNonumberSampling temperature (0-2). Higher values mean more random, lower values mean more deterministic. It is recommended to alter this or top_p but not both.
top_pNonumberNucleus sampling. 0.1 means only tokens comprising the top 10% probability mass are considered.
nNointegerDefaults to 1. The number of chat completion choices to generate for each input message.
streamNobooleanDefaults to false. If true, tokens are sent as data-only server-sent events as they become available.
stopNostringStop sequences. The model will stop generating further tokens when these characters are encountered.
max_tokensNointegerThe maximum number of tokens to generate. Defaults to inf.
presence_penaltyNonumberBetween -2.0 and 2.0. Penalizes new tokens based on whether they appear in the text so far, increasing the likelihood of talking about new topics.
frequency_penaltyNonumberBetween -2.0 and 2.0. Penalizes new tokens based on their existing frequency in the text, decreasing the likelihood of repetition.
logit_biasNoobjectModifies the likelihood of specified tokens appearing in the completion.
userNostringA unique identifier representing your end-user, which can help in monitoring and detecting abuse.
response_formatNoobjectSpecifies the format that the model must output. For example, { "type": "json_object" } enables JSON mode.
seedNointegerExperimental feature. If specified, the system will make a best effort to sample deterministically for reproducible outputs.

4. Response Parameters

ParameterTypeDescription
idstringA unique identifier for the chat completion.
objectstringThe object type, which is always chat.completion.
createdintegerThe Unix timestamp (in seconds) of when the chat completion was created.
choicesarrayA list of chat completion choices.
├─ indexintegerThe index of the choice in the list of choices.
├─ messageobjectA chat completion message generated by the model.
│ ├─ rolestringThe role of the author of this message (usually assistant).
│ └─ contentstringThe contents of the message.
└─ finish_reasonstringThe reason the model stopped generating (e.g., stop, length, tool_calls).
usageobjectUsage statistics for the completion request.
├─ prompt_tokensintegerNumber of tokens in the prompt.
├─ completion_tokensintegerNumber of tokens in the generated completion.
└─ total_tokensintegerTotal number of tokens used in the request (prompt + completion).

5. Example

Request Example

{
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "who are u?"
        }
    ],
    "n": 1,
    "max_tokens": 100,
    "temperature": 0.8,
    "stream": false
}

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\r
\r
Hello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

6. FAQs

Q: Why do I get an error when I set response_format: { "type": "json_object" }? A: When using JSON mode, you must explicitly instruct the model to produce JSON via a system or user message (e.g., "Please reply in JSON format"). Otherwise, the model might fail or hang because it cannot generate valid JSON.

Q: Can I adjust temperature and top_p at the same time? A: Technically yes, but it is generally recommended to adjust only one of them. Adjusting one is usually sufficient to change the randomness of the output; adjusting both can lead to unpredictable results.

Q: How do I receive data when stream: true is enabled? A: When streaming is enabled, the server sends a series of Server-Sent Events (SSE). The data portion of each event is a JSON object, and the stream ends with data: [DONE]. You need to use a streaming library (like Python's response.iter_lines()) to process the response line by line.

Q: How is token consumption calculated? A: For English, 1 token is approximately 4 characters or 0.75 words. For Chinese, one character may correspond to 1-2 tokens. The final consumption is based on the usage field in the response body.

Q: What does finish_reason: "length" mean? A: This indicates that the generated content exceeded the limit set in max_tokens or reached the model's maximum context length limit, resulting in the content being truncated.

Q: Does the API support Function Calling? A: Yes. By defining function prototypes via the tools parameter, the model will return tool_calls in choices[0].message instead of plain content when a function call is required.