Create Chat Completion (Non-Streaming)

This endpoint takes a prompt and returns one or more predicted completions generated by the model. The non-streaming endpoint returns the entire response at once after generation is complete.

  • Endpoint URL: https://api.codingplanx.ai/v1/chat/completions
  • HTTP Method: POST
  • Content-Type: application/json

Request Parameters

Headers

ParameterTypeRequiredExampleDescription
Content-TypestringYesapplication/jsonThe format of the request body.
AcceptstringYesapplication/jsonThe format of the response body.
AuthorizationstringNoBearer {{YOUR_API_KEY}}Authentication token.

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYes-The ID of the model to use (e.g., gpt-4o, gpt-3.5-turbo).
messagesarrayYes-A list of messages comprising the conversation. Each object must contain a role (system/user/assistant) and content.
temperaturenumberNo1Sampling temperature (0 to 2). Higher values make the output more random, while lower values make it more deterministic.
top_pnumberNo1Nucleus sampling. Considers only the tokens comprising the top_p probability mass. We recommend altering this OR temperature, but not both.
nintegerNo1How many completion choices to generate for each input message.
streambooleanNofalseWhether to stream back partial progress. For this endpoint, it should be set to false.
stopstring/arrayNonullUp to 4 sequences where the API will stop generating further tokens.
max_tokensintegerNoinfThe maximum number of tokens to generate in the completion. Restricted by the model's context length.
presence_penaltynumberNo0Number between -2.0 and 2.0. Positive values increase the model's likelihood to talk about new topics.
frequency_penaltynumberNo0Number between -2.0 and 2.0. Positive values decrease the model's likelihood to repeat the same lines verbatim.
logit_biasobjectNonullModifies the likelihood of specified tokens appearing in the completion.
userstringNo-A unique identifier representing your end-user, which can help monitor and detect abuse.
response_formatobjectNo-An object specifying the format that the model must output (e.g., { "type": "json_object" } enables JSON mode).
toolsarrayNo-A list of tools the model may call. Currently, only functions are supported.
tool_choicestring/objNoautoControls which (if any) tool is called by the model (none/auto/specific function).

Response Parameters

Response Body

ParameterTypeDescription
idstringA unique identifier for the chat completion request.
objectstringThe object type, which is always chat.completion.
createdintegerThe Unix timestamp (in seconds) of when the completion was created.
choicesarrayA list of chat completion choices.
├─ indexintegerThe index of the choice in the list of choices.
├─ messageobjectA chat completion message generated by the model.
│ ├─ rolestringThe role of the author of this message (usually assistant).
│ └─ contentstringThe textual content of the message.
└─ finish_reasonstringThe reason the model stopped generating tokens (e.g., stop, length, tool_calls).
usageobjectUsage statistics for the completion request.
├─ prompt_tokensintegerNumber of tokens in the prompt.
├─ completion_tokensintegerNumber of tokens in the generated completion.
└─ total_tokensintegerTotal number of tokens used in the request (prompt + completion).

Examples

Request Example (cURL)

curl --location --request POST 'https://api.codingplanx.ai/v1/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Hello, please introduce yourself."
    }
  ],
  "max_tokens": 1000
}'

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! I am an AI assistant provided by CodingPlanX. I am very happy to help you."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 20,
        "total_tokens": 34
    }
}

FAQs (Frequently Asked Questions)

Q1: How do I enable JSON mode? A1: Set "response_format": { "type": "json_object" } in your request body. Note: When using JSON mode, you must explicitly instruct the model to produce JSON in your system or user message. Otherwise, the model may generate an endless stream of whitespace until it reaches the token limit.

Q2: How should I choose between temperature and top_p? A2: We generally recommend altering only one of these parameters. If you need the model to be more creative and diverse, increase the temperature (e.g., 0.8). If you need more precise, factual, and deterministic answers, decrease it (e.g., 0.2).

Q3: Why is the finish_reason returned as length? A3: This indicates that the generated response was truncated because it reached your specified max_tokens limit or exceeded the model's maximum context length window.

Q4: What is a Token? A4: Tokens are the fundamental units of text processed by the model. For Chinese text, one character is roughly equal to 1~2 tokens (for English, 1 token is approximately 4 characters or 0.75 words). API billing and context limitations are calculated based on the total token usage.

Q5: What is the difference between presence_penalty and frequency_penalty? A5: presence_penalty applies a penalty if a token has appeared at all, encouraging the model to "talk about new topics." frequency_penalty applies a penalty proportional to how many times a token has already appeared, discouraging "repetitive phrasing."

Q6: What if I want a real-time, typewriter-like generation effect? A6: Set the stream parameter to true. Please note that the response payload format for streaming requests differs from this document (non-streaming) and utilizes Server-Sent Events (SSE).