Create Chat Completion (Non-Streaming)

This endpoint takes a prompt and returns one or more predicted completions generated by the model. The non-streaming endpoint returns the entire response at once after generation is complete.

Endpoint URL: https://api.codingplanx.ai/v1/chat/completions
HTTP Method: POST
Content-Type: application/json

Request Parameters

Headers

Parameter	Type	Required	Example	Description
Content-Type	string	Yes	`application/json`	The format of the request body.
Accept	string	Yes	`application/json`	The format of the response body.
Authorization	string	No	`Bearer {{YOUR_API_KEY}}`	Authentication token.

Request Body

Parameter	Type	Required	Default	Description
model	string	Yes	-	The ID of the model to use (e.g., `gpt-4o`, `gpt-3.5-turbo`).
messages	array	Yes	-	A list of messages comprising the conversation. Each object must contain a `role` (system/user/assistant) and `content`.
temperature	number	No	1	Sampling temperature (0 to 2). Higher values make the output more random, while lower values make it more deterministic.
top_p	number	No	1	Nucleus sampling. Considers only the tokens comprising the top_p probability mass. We recommend altering this OR temperature, but not both.
n	integer	No	1	How many completion choices to generate for each input message.
stream	boolean	No	false	Whether to stream back partial progress. For this endpoint, it should be set to `false`.
stop	string/array	No	null	Up to 4 sequences where the API will stop generating further tokens.
max_tokens	integer	No	inf	The maximum number of tokens to generate in the completion. Restricted by the model's context length.
presence_penalty	number	No	0	Number between -2.0 and 2.0. Positive values increase the model's likelihood to talk about new topics.
frequency_penalty	number	No	0	Number between -2.0 and 2.0. Positive values decrease the model's likelihood to repeat the same lines verbatim.
logit_bias	object	No	null	Modifies the likelihood of specified tokens appearing in the completion.
user	string	No	-	A unique identifier representing your end-user, which can help monitor and detect abuse.
response_format	object	No	-	An object specifying the format that the model must output (e.g., `{ "type": "json_object" }` enables JSON mode).
tools	array	No	-	A list of tools the model may call. Currently, only functions are supported.
tool_choice	string/obj	No	auto	Controls which (if any) tool is called by the model (none/auto/specific function).

Response Parameters

Response Body

Parameter	Type	Description
id	string	A unique identifier for the chat completion request.
object	string	The object type, which is always `chat.completion`.
created	integer	The Unix timestamp (in seconds) of when the completion was created.
choices	array	A list of chat completion choices.
├─ index	integer	The index of the choice in the list of choices.
├─ message	object	A chat completion message generated by the model.
│ ├─ role	string	The role of the author of this message (usually `assistant`).
│ └─ content	string	The textual content of the message.
└─ finish_reason	string	The reason the model stopped generating tokens (e.g., `stop`, `length`, `tool_calls`).
usage	object	Usage statistics for the completion request.
├─ prompt_tokens	integer	Number of tokens in the prompt.
├─ completion_tokens	integer	Number of tokens in the generated completion.
└─ total_tokens	integer	Total number of tokens used in the request (prompt + completion).

Examples

Request Example (cURL)

curl --location --request POST 'https://api.codingplanx.ai/v1/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Hello, please introduce yourself."
    }
  ],
  "max_tokens": 1000
}'

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! I am an AI assistant provided by CodingPlanX. I am very happy to help you."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 20,
        "total_tokens": 34
    }
}

FAQs (Frequently Asked Questions)

Q1: How do I enable JSON mode? A1: Set "response_format": { "type": "json_object" } in your request body. Note: When using JSON mode, you must explicitly instruct the model to produce JSON in your system or user message. Otherwise, the model may generate an endless stream of whitespace until it reaches the token limit.

Q2: How should I choose between temperature and top_p? A2: We generally recommend altering only one of these parameters. If you need the model to be more creative and diverse, increase the temperature (e.g., 0.8). If you need more precise, factual, and deterministic answers, decrease it (e.g., 0.2).

Q3: Why is the finish_reason returned as length? A3: This indicates that the generated response was truncated because it reached your specified max_tokens limit or exceeded the model's maximum context length window.

Q4: What is a Token? A4: Tokens are the fundamental units of text processed by the model. For Chinese text, one character is roughly equal to 1~2 tokens (for English, 1 token is approximately 4 characters or 0.75 words). API billing and context limitations are calculated based on the total token usage.

Q5: What is the difference between presence_penalty and frequency_penalty? A5: presence_penalty applies a penalty if a token has appeared at all, encouraging the model to "talk about new topics." frequency_penalty applies a penalty proportional to how many times a token has already appeared, discouraging "repetitive phrasing."

Q6: What if I want a real-time, typewriter-like generation effect? A6: Set the stream parameter to true. Please note that the response payload format for streaming requests differs from this document (non-streaming) and utilizes Server-Sent Events (SSE).