API Documentation: Creating Structured Outputs

Interface Description

Given a prompt, the model returns one or more predicted completion responses. It supports forcing the model to output structured data according to a specified JSON Schema.

Official Reference: OpenAI Structured Outputs

Base Information

Method: POST
Path: https://api.codingplanx.ai/v1/chat/completions
Format: application/json

Request Headers

Parameter	Required	Type	Example	Description
`Content-Type`	Yes	string	`application/json`	Specifies the request body data type.
`Accept`	Yes	string	`application/json`	Specifies the data type the client accepts.
`Authorization`	No	string	`Bearer {{YOUR_API_KEY}}`	Authentication credentials.

Request Body

Parameter	Required	Type	Description
`model`	Yes	string	ID of the model to use (e.g., `gpt-4.1-2025-04-14`).
`messages`	Yes	array	A list of messages comprising the conversation so far.
∟ `messages[].role`	Yes	string	The role of the message author, e.g., `system`, `user`, `assistant`.
∟ `messages[].content`	Yes	string	The contents of the message.
`tools`	Yes	array	A list of tools the model may call. Currently, only functions are supported. Used to provide a list of functions for which the model can generate JSON inputs.
`tool_choice`	Yes	object	Controls which function is called by the model. `none` means no call, `auto` means automatic selection. Can be forced via `{"type": "function", "function": {"name": "my_function"}}`.
`temperature`	No	number	Sampling temperature between 0 and 2. Higher values (e.g., 0.8) make output more random, while lower values (e.g., 0.2) make it more focused and deterministic. Modify this or `top_p`, but not both.
`top_p`	No	number	Nucleus sampling parameter. 0.1 means only tokens comprising the top 10% probability mass are considered. Modify this or `temperature`, but not both.
`n`	No	integer	How many completion choices to generate for each input message. Defaults to 1.
`stream`	No	boolean	Whether to enable streaming. If true, partial message deltas are sent as Server-Sent Events (SSE) terminated by `data: [DONE]`. Defaults to false.
`stop`	No	string/array	Up to 4 sequences where the API will stop generating further tokens. Defaults to null.
`max_tokens`	No	integer	The maximum number of tokens to generate in the chat completion. Defaults to inf (infinite).
`presence_penalty`	No	number	Penalty (-2.0 to 2.0). Positive values penalize new tokens based on whether they appear in the text so far, increasing the likelihood of talking about new topics.
`frequency_penalty`	No	number	Penalty (-2.0 to 2.0). Positive values penalize new tokens based on their existing frequency in the text, decreasing the likelihood of repetition.
`logit_bias`	No	object	Accepts a JSON object that maps tokens to bias values (-100 to 100) to modify the likelihood of specified tokens appearing in the completion.
`user`	No	string	A unique identifier representing your end-user, helping to monitor and detect abuse.
`response_format`	No	object	Specifies the format the model must output. Enable Structured Outputs by setting `{"type": "json_schema", "json_schema": {...}}` to ensure valid JSON output.
`seen`	No	integer	(Beta) Sets a random seed. The system will make a best effort to sample deterministically; repeated requests with the same seed and parameters should return the same result.

Request Example

{
  "model": "gpt-4.1-2025-04-14",
  "messages": [
    {
      "role": "system",
      "content": "Determine if the user input violates specific guidelines and explain if they do."
    },
    {
      "role": "user",
      "content": "How do I prepare for a job interview?"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "content_compliance",
      "description": "Determines if content is violating specific moderation rules",
      "schema": {
        "type": "object",
        "properties": {
          "is_violating": {
            "type": "boolean",
            "description": "Indicates if the content is violating guidelines"
          },
          "category": {
            "type": ["string", "null"],
            "description": "Type of violation, if the content is violating guidelines. Null otherwise.",
            "enum": ["violence", "sexual", "self_harm"]
          },
          "explanation_if_violating": {
            "type": ["string", "null"],
            "description": "Explanation of why the content is violating"
          }
        },
        "required": ["is_violating", "category", "explanation_if_violating"],
        "additionalProperties": false
      },
      "strict": true
    }
  }
}

Response Body

Parameter	Type	Description
`id`	string	A unique identifier for the completion request.
`object`	string	The object type, always `chat.completion`.
`created`	integer	The Unix timestamp (in seconds) of when the completion was created.
`choices`	array	A list of completion choices.
∟ `choices[].index`	integer	The index of the choice in the list.
∟ `choices[].message`	object	The message object generated by the model.
∟ `choices[].message.role`	string	The role of the author (usually `assistant`).
∟ `choices[].message.content`	string	The contents of the message.
∟ `choices[].finish_reason`	string	The reason the model stopped generating (e.g., `stop`, `length`).
`usage`	object	Usage statistics for the completion request.
∟ `usage.prompt_tokens`	integer	Number of tokens consumed by the prompt.
∟ `usage.completion_tokens`	integer	Number of tokens consumed by the generated completion.
∟ `usage.total_tokens`	integer	Total number of tokens used (prompt + completion).

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "{\"is_violating\": false, \"category\": null, \"explanation_if_violating\": null}"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

Frequently Asked Questions (FAQs)

1. What is `response_format` and how do I ensure strict JSON output?

The response_format parameter allows you to request data in a specific format. To use "Structured Outputs," set it to {"type": "json_schema"} and define your JSON Schema within it, ensuring strict: true is enabled. This guarantees the model will follow your defined fields and data types 100%, eliminating the need for complex JSON error handling in your code.

2. Why is my JSON output truncated?

If the value of choices[0].finish_reason is length, it means the generated text reached the max_tokens limit or exceeded the model's maximum context length. We recommend increasing the max_tokens parameter in your request.

3. What is the difference between `temperature` and `top_p`? Which should I adjust?

Both parameters control the randomness of the output:

temperature scales the output probability distribution. Lower values make the output more precise and deterministic; higher values make it more creative and diverse.
top_p (nucleus sampling) limits the model to choosing from the top-frequency tokens whose cumulative probability reaches p. Official Recommendation: Adjust only one of these parameters; do not modify both temperature and top_p simultaneously.

4. What happens when `stream: true` is enabled?

When stream: true is set, the API will not wait for the entire response to be generated. Instead, it returns data fragments in real-time, similar to a typewriter. The data format changes to Server-Sent Events (SSE). Clients must listen to the stream and concatenate delta.content until the [DONE] signal is received.

5. How can I get the exact same answer for the same prompt every time?

While LLMs are inherently stochastic, you can achieve maximum determinism by passing a specific seen (Seed) parameter and setting temperature to 0. Note that if the system_fingerprint in the response changes, it indicates a back-end configuration update, which may result in slight differences even with the same seed.