API Documentation: Creating Structured Outputs
Interface Description
Given a prompt, the model returns one or more predicted completion responses. It supports forcing the model to output structured data according to a specified JSON Schema.
Official Reference: OpenAI Structured Outputs
Base Information
- Method:
POST - Path:
https://api.codingplanx.ai/v1/chat/completions - Format:
application/json
Request Headers
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
Content-Type | Yes | string | application/json | Specifies the request body data type. |
Accept | Yes | string | application/json | Specifies the data type the client accepts. |
Authorization | No | string | Bearer {{YOUR_API_KEY}} | Authentication credentials. |
Request Body
| Parameter | Required | Type | Description |
|---|---|---|---|
model | Yes | string | ID of the model to use (e.g., gpt-4.1-2025-04-14). |
messages | Yes | array | A list of messages comprising the conversation so far. |
∟ messages[].role | Yes | string | The role of the message author, e.g., system, user, assistant. |
∟ messages[].content | Yes | string | The contents of the message. |
tools | Yes | array | A list of tools the model may call. Currently, only functions are supported. Used to provide a list of functions for which the model can generate JSON inputs. |
tool_choice | Yes | object | Controls which function is called by the model. none means no call, auto means automatic selection. Can be forced via {"type": "function", "function": {"name": "my_function"}}. |
temperature | No | number | Sampling temperature between 0 and 2. Higher values (e.g., 0.8) make output more random, while lower values (e.g., 0.2) make it more focused and deterministic. Modify this or top_p, but not both. |
top_p | No | number | Nucleus sampling parameter. 0.1 means only tokens comprising the top 10% probability mass are considered. Modify this or temperature, but not both. |
n | No | integer | How many completion choices to generate for each input message. Defaults to 1. |
stream | No | boolean | Whether to enable streaming. If true, partial message deltas are sent as Server-Sent Events (SSE) terminated by data: [DONE]. Defaults to false. |
stop | No | string/array | Up to 4 sequences where the API will stop generating further tokens. Defaults to null. |
max_tokens | No | integer | The maximum number of tokens to generate in the chat completion. Defaults to inf (infinite). |
presence_penalty | No | number | Penalty (-2.0 to 2.0). Positive values penalize new tokens based on whether they appear in the text so far, increasing the likelihood of talking about new topics. |
frequency_penalty | No | number | Penalty (-2.0 to 2.0). Positive values penalize new tokens based on their existing frequency in the text, decreasing the likelihood of repetition. |
logit_bias | No | object | Accepts a JSON object that maps tokens to bias values (-100 to 100) to modify the likelihood of specified tokens appearing in the completion. |
user | No | string | A unique identifier representing your end-user, helping to monitor and detect abuse. |
response_format | No | object | Specifies the format the model must output. Enable Structured Outputs by setting {"type": "json_schema", "json_schema": {...}} to ensure valid JSON output. |
seen | No | integer | (Beta) Sets a random seed. The system will make a best effort to sample deterministically; repeated requests with the same seed and parameters should return the same result. |
Request Example
{
"model": "gpt-4.1-2025-04-14",
"messages": [
{
"role": "system",
"content": "Determine if the user input violates specific guidelines and explain if they do."
},
{
"role": "user",
"content": "How do I prepare for a job interview?"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "content_compliance",
"description": "Determines if content is violating specific moderation rules",
"schema": {
"type": "object",
"properties": {
"is_violating": {
"type": "boolean",
"description": "Indicates if the content is violating guidelines"
},
"category": {
"type": ["string", "null"],
"description": "Type of violation, if the content is violating guidelines. Null otherwise.",
"enum": ["violence", "sexual", "self_harm"]
},
"explanation_if_violating": {
"type": ["string", "null"],
"description": "Explanation of why the content is violating"
}
},
"required": ["is_violating", "category", "explanation_if_violating"],
"additionalProperties": false
},
"strict": true
}
}
}
Response Body
| Parameter | Type | Description |
|---|---|---|
id | string | A unique identifier for the completion request. |
object | string | The object type, always chat.completion. |
created | integer | The Unix timestamp (in seconds) of when the completion was created. |
choices | array | A list of completion choices. |
∟ choices[].index | integer | The index of the choice in the list. |
∟ choices[].message | object | The message object generated by the model. |
∟ choices[].message.role | string | The role of the author (usually assistant). |
∟ choices[].message.content | string | The contents of the message. |
∟ choices[].finish_reason | string | The reason the model stopped generating (e.g., stop, length). |
usage | object | Usage statistics for the completion request. |
∟ usage.prompt_tokens | integer | Number of tokens consumed by the prompt. |
∟ usage.completion_tokens | integer | Number of tokens consumed by the generated completion. |
∟ usage.total_tokens | integer | Total number of tokens used (prompt + completion). |
Response Example
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"is_violating\": false, \"category\": null, \"explanation_if_violating\": null}"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Frequently Asked Questions (FAQs)
1. What is response_format and how do I ensure strict JSON output?
The response_format parameter allows you to request data in a specific format. To use "Structured Outputs," set it to {"type": "json_schema"} and define your JSON Schema within it, ensuring strict: true is enabled. This guarantees the model will follow your defined fields and data types 100%, eliminating the need for complex JSON error handling in your code.
2. Why is my JSON output truncated?
If the value of choices[0].finish_reason is length, it means the generated text reached the max_tokens limit or exceeded the model's maximum context length. We recommend increasing the max_tokens parameter in your request.
3. What is the difference between temperature and top_p? Which should I adjust?
Both parameters control the randomness of the output:
temperaturescales the output probability distribution. Lower values make the output more precise and deterministic; higher values make it more creative and diverse.top_p(nucleus sampling) limits the model to choosing from the top-frequency tokens whose cumulative probability reachesp. Official Recommendation: Adjust only one of these parameters; do not modify bothtemperatureandtop_psimultaneously.
4. What happens when stream: true is enabled?
When stream: true is set, the API will not wait for the entire response to be generated. Instead, it returns data fragments in real-time, similar to a typewriter. The data format changes to Server-Sent Events (SSE). Clients must listen to the stream and concatenate delta.content until the [DONE] signal is received.
5. How can I get the exact same answer for the same prompt every time?
While LLMs are inherently stochastic, you can achieve maximum determinism by passing a specific seen (Seed) parameter and setting temperature to 0. Note that if the system_fingerprint in the response changes, it indicates a back-end configuration update, which may result in slight differences even with the same seed.