Control Reasoning Effort (Chat Completions)
Interface Description
Given a prompt, the model will return one or more predicted completions (supporting control over the reasoning depth/effort of the model).
- Official Documentation Reference: OpenAI Reasoning Guides
- Interface Status:
Released
Request Specifications
- Method:
POST - Endpoint:
https://api.codingplanx.ai/v1/chat/completions
Request Headers
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
Content-Type | Yes | string | application/json | Data format |
Accept | Yes | string | application/json | Response format |
Authorization | No | string | Bearer {{YOUR_API_KEY}} | Authentication credentials |
Request Body
Data Format: application/json
| Parameter | Required | Type | Description |
|---|---|---|---|
model | Yes | string | ID of the model to use (e.g., o4-mini, etc.). |
messages | Yes | array | A list of messages comprising the conversation so far. Includes role (e.g., user/assistant) and content. |
tools | Yes | array | A list of tools (functions) the model may call. Used to provide functions for which the model can generate JSON inputs. |
tool_choice | Yes | object | Controls which function is called by the model. none means no call, auto means automatic selection. |
reasoning_effort | No | string | Core Parameter: Controls the level of effort (depth of thought) the reasoning model spends on the reply. Common values: low, medium, high. |
temperature | No | number | Sampling temperature between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make it more focused and deterministic. Recommended to modify either this or top_p, but not both. |
top_p | No | number | Nucleus sampling. 0.1 means only tokens comprising the top 10% probability mass are considered. Recommended to modify either this or temperature, but not both. |
n | No | integer | Defaults to 1. How many chat completion choices to generate for each input message. |
stream | No | boolean | Defaults to false. If true, partial message deltas will be sent as Server-Sent Events (SSE), terminated by data: [DONE]. |
stop | No | string | Up to 4 sequences where the API will stop generating further tokens. |
max_tokens | No | integer | The maximum number of tokens to generate in the completion. Total length is limited by the model's context length. |
presence_penalty | No | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they have appeared so far, increasing the likelihood of talking about new topics. |
frequency_penalty | No | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency, decreasing the likelihood of repeating the same lines. |
logit_bias | No | object | Accepts a JSON object that maps token IDs to bias values (-100 to 100) to modify the likelihood of specified tokens appearing. |
user | No | string | A unique identifier representing your end-user, helping to monitor and detect abuse. |
response_format | No | object | Specifies the format the model must output. For example, {"type": "json_object"} enables JSON mode. |
seen | No | integer | Beta feature (similar to seed). If specified, the system will make a best effort to sample deterministically so that repeated requests with the same parameters return the same result. |
Request Example
{
"model": "o4-mini",
"max_tokens": 500,
"messages": [
{
"role": "user",
"content": "Hello, please explain the principles of quantum computing in detail."
}
],
"temperature": 1.0,
"stream": false,
"reasoning_effort": "medium"
}
Response Specifications
Response Body Parameters
Data Format: application/json
| Parameter | Type | Description |
|---|---|---|
id | string | A unique identifier for the request. |
object | string | The object type, usually chat.completion. |
created | integer | The Unix timestamp of when the completion was created. |
choices | array | A list of completion choices. |
└ index | integer | The index of the choice in the list. |
└ message | object | The message generated by the model. Contains role and content. |
└ finish_reason | string | The reason the model stopped generating (e.g., stop, length). |
usage | object | Token usage statistics. |
└ prompt_tokens | integer | Number of tokens in the prompt. |
└ completion_tokens | integer | Number of tokens in the generated completion. |
└ total_tokens | integer | Total number of tokens used in the request. |
Response Example (HTTP 200 OK)
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\r
\r
Hello there, how may I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
FAQs
1. What is the purpose of the reasoning_effort parameter?
reasoning_effort is used to control how much computational resource and time a model with advanced reasoning capabilities (such as OpenAI's o1/o4 series) spends on "internal thinking" before providing a final answer. Typically, it supports three levels: low, medium, and high. Setting it to high allows the model to perform deeper logical derivation, making it suitable for complex math or coding problems, though response latency will increase.
2. Why don't I receive a standard JSON response when stream: true is enabled?
When streaming is enabled (stream: true), the API uses the Server-Sent Events (SSE) protocol to return data chunks continuously. Each chunk is sent as data: {...}, and a data: [DONE] message is sent when the transmission is complete. Developers must read the stream line-by-line and parse the JSON strings following the data: prefix to reconstruct the full response.
3. The documentation suggests modifying either temperature or top_p, but not both. Why?
Both parameters control the randomness and diversity of the model's output. temperature changes the probability distribution by scaling logits, while top_p (nucleus sampling) limits the candidate pool by truncating low-probability cumulative tokens. Adjusting both simultaneously makes the model's behavior difficult to predict and control. The best practice is to keep one at its default value and adjust the other.
4. What does it mean if finish_reason returns "length"?
This indicates that the model's response was forcibly truncated. This usually happens for two reasons: either the number of generated tokens reached the max_tokens limit set in the request, or the total tokens (prompt + completion) exceeded the model's maximum context window limit. In such cases, consider shortening the conversation history or increasing the max_tokens parameter.
5. How can I force the model to return data in JSON format?
You can enable JSON mode by passing "response_format": {"type": "json_object"} in the request body. Extremely Important: When enabling this mode, you must also explicitly instruct the model to output JSON via natural language in the messages (usually in the system or user prompt). Otherwise, the model might generate an infinite sequence of whitespace until it hits the token limit.