Chat Completions API Documentation

This endpoint is used to provide a prompt, and the model will return one or more predicted completions. You can generate text based on the conversation history and control the randomness, length, and format of the generated output.

  • Endpoint Name: List Models / Chat Completions
  • HTTP Method: GET (Note: Based on the provided data definition)
  • Endpoint URL: https://api.codingplanx.ai/v1/models
  • Status: Released

1. Request Parameters

1.1 Header Parameters

ParameterRequiredTypeExample ValueDescription
Content-TypeYesStringapplication/jsonRequest body format.
AcceptYesStringapplication/jsonExpected response format.
AuthorizationNoStringBearer {{YOUR_API_KEY}}API key used for authentication.

1.2 Body Parameters

Note: Although originally defined as a GET request, based on typical business logic, the following parameters are usually passed via the JSON Body.

ParameterTypeRequiredDefaultDescription
modelstringYes-ID of the model to use. Example: gpt-3.5-turbo.
messagesarrayYes-A list of messages comprising the conversation so far. Each message contains a role and content.
temperaturenumberNo1Sampling temperature, between 0 and 2. Higher values will make the output more random.
top_pnumberNo1Nucleus sampling. 0.1 means only the tokens comprising the top 10% probability mass are considered.
nintegerNo1How many chat completion choices to generate for each input message.
streambooleanNofalseWhether to stream back partial progress. If set to true, partial message deltas will be sent via Server-Sent Events (SSE).
stopstring/arrayNonullStop sequence(s). The model will stop generating further tokens when it encounters these characters.
max_tokensintegerNoinfThe maximum number of tokens to generate in the chat completion.
presence_penaltynumberNo0Number between -2.0 and 2.0. Positive values increase the model's likelihood to talk about new topics.
frequency_penaltynumberNo0Number between -2.0 and 2.0. Positive values decrease the model's likelihood to repeat the same line verbatim.
logit_biasobjectNonullModify the likelihood of specified tokens appearing in the completion.
userstringNo-A unique identifier representing your end-user, which can help monitor and detect abuse.
response_formatobjectNo-Specifies the format that the model must output. E.g., { "type": "json_object" } enables JSON mode.
toolsarrayNo-A list of tools the model may call (such as functions).
tool_choiceobjectNoautoControls which (if any) function is called by the model.

2. Response Parameters

ParameterTypeDescription
idstringA unique identifier for the chat completion request.
objectstringThe object type, which is typically chat.completion.
createdintegerThe Unix timestamp (in seconds) of when the request was created.
choicesarrayA list of chat completion choices.
├─ indexintegerThe index of the choice in the list.
├─ messageobjectThe specific message generated by the model.
│ ├─ rolestringThe role of the author of this message, typically assistant.
│ └─ contentstringThe textual content of the message.
└─ finish_reasonstringThe reason the model stopped generating tokens. E.g., stop (natural stop) or length (reached maximum length).
usageobjectUsage statistics for the completion request.
├─ prompt_tokensintegerNumber of tokens consumed by the input prompt.
├─ completion_tokensintegerNumber of tokens generated in the output.
└─ total_tokensintegerTotal number of tokens consumed (prompt + completion).

3. Request Examples

Successful Request Example

curl --location --request GET 'https://api.codingplanx.ai/v1/models' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Hello there, how may I assist you today?"
    }
  ],
  "temperature": 1,
  "top_p": 1,
  "n": 1,
  "stream": false,
  "user": "user-1234"
}'

Successful Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\r
\r
Hello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

4. FAQs (Frequently Asked Questions)

Q: Why does the documentation show a GET method, but it looks like a POST request? A: In standard OpenAI specifications, creating completions generally uses the POST method. However, based on the provided metadata, this endpoint is defined as GET. Please verify the actual server-side configuration when making API calls. If GET does not work, try using POST.

Q: How should I choose between temperature and top_p? A: We generally recommend altering either temperature or top_p, but not both. If you want precise and deterministic answers, lower the temperature (e.g., 0.2). If you want more diverse and creative responses, increase it.

Q: How do I enable JSON mode? A: You need to set { "type": "json_object" } within the response_format parameter, and explicitly instruct the model (via System or User messages) to output in JSON format. Otherwise, the model may generate parsing errors or an endless stream of whitespace.

Q: What is the difference between presence_penalty and frequency_penalty? A:

  • presence_penalty: Focuses on penalizing topics that have already appeared, encouraging the model to discuss new topics.
  • frequency_penalty: Focuses on penalizing exact words/phrases that appear frequently, preventing the model from repeating the same sentences within a short period.

Q: How does streaming output (stream: true) work? A: When streaming is enabled, the API returns data word-by-word via Server-Sent Events (SSE). Each data chunk is a JSON string prefixed with data: , and the stream is terminated with a data: [DONE] message. This is highly useful for displaying long-form text generation in real-time on UI/UX interfaces.