API Documentation: Creating Structured Outputs

Interface Description

Given a prompt, the model returns one or more predicted completion responses. It supports forcing the model to output structured data according to a specified JSON Schema.

Official Reference: OpenAI Structured Outputs

Base Information

  • Method: POST
  • Path: https://api.codingplanx.ai/v1/chat/completions
  • Format: application/json

Request Headers

ParameterRequiredTypeExampleDescription
Content-TypeYesstringapplication/jsonSpecifies the request body data type.
AcceptYesstringapplication/jsonSpecifies the data type the client accepts.
AuthorizationNostringBearer {{YOUR_API_KEY}}Authentication credentials.

Request Body

ParameterRequiredTypeDescription
modelYesstringID of the model to use (e.g., gpt-4.1-2025-04-14).
messagesYesarrayA list of messages comprising the conversation so far.
messages[].roleYesstringThe role of the message author, e.g., system, user, assistant.
messages[].contentYesstringThe contents of the message.
toolsYesarrayA list of tools the model may call. Currently, only functions are supported. Used to provide a list of functions for which the model can generate JSON inputs.
tool_choiceYesobjectControls which function is called by the model. none means no call, auto means automatic selection. Can be forced via {"type": "function", "function": {"name": "my_function"}}.
temperatureNonumberSampling temperature between 0 and 2. Higher values (e.g., 0.8) make output more random, while lower values (e.g., 0.2) make it more focused and deterministic. Modify this or top_p, but not both.
top_pNonumberNucleus sampling parameter. 0.1 means only tokens comprising the top 10% probability mass are considered. Modify this or temperature, but not both.
nNointegerHow many completion choices to generate for each input message. Defaults to 1.
streamNobooleanWhether to enable streaming. If true, partial message deltas are sent as Server-Sent Events (SSE) terminated by data: [DONE]. Defaults to false.
stopNostring/arrayUp to 4 sequences where the API will stop generating further tokens. Defaults to null.
max_tokensNointegerThe maximum number of tokens to generate in the chat completion. Defaults to inf (infinite).
presence_penaltyNonumberPenalty (-2.0 to 2.0). Positive values penalize new tokens based on whether they appear in the text so far, increasing the likelihood of talking about new topics.
frequency_penaltyNonumberPenalty (-2.0 to 2.0). Positive values penalize new tokens based on their existing frequency in the text, decreasing the likelihood of repetition.
logit_biasNoobjectAccepts a JSON object that maps tokens to bias values (-100 to 100) to modify the likelihood of specified tokens appearing in the completion.
userNostringA unique identifier representing your end-user, helping to monitor and detect abuse.
response_formatNoobjectSpecifies the format the model must output. Enable Structured Outputs by setting {"type": "json_schema", "json_schema": {...}} to ensure valid JSON output.
seenNointeger(Beta) Sets a random seed. The system will make a best effort to sample deterministically; repeated requests with the same seed and parameters should return the same result.

Request Example

{
  "model": "gpt-4.1-2025-04-14",
  "messages": [
    {
      "role": "system",
      "content": "Determine if the user input violates specific guidelines and explain if they do."
    },
    {
      "role": "user",
      "content": "How do I prepare for a job interview?"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "content_compliance",
      "description": "Determines if content is violating specific moderation rules",
      "schema": {
        "type": "object",
        "properties": {
          "is_violating": {
            "type": "boolean",
            "description": "Indicates if the content is violating guidelines"
          },
          "category": {
            "type": ["string", "null"],
            "description": "Type of violation, if the content is violating guidelines. Null otherwise.",
            "enum": ["violence", "sexual", "self_harm"]
          },
          "explanation_if_violating": {
            "type": ["string", "null"],
            "description": "Explanation of why the content is violating"
          }
        },
        "required": ["is_violating", "category", "explanation_if_violating"],
        "additionalProperties": false
      },
      "strict": true
    }
  }
}

Response Body

ParameterTypeDescription
idstringA unique identifier for the completion request.
objectstringThe object type, always chat.completion.
createdintegerThe Unix timestamp (in seconds) of when the completion was created.
choicesarrayA list of completion choices.
choices[].indexintegerThe index of the choice in the list.
choices[].messageobjectThe message object generated by the model.
choices[].message.rolestringThe role of the author (usually assistant).
choices[].message.contentstringThe contents of the message.
choices[].finish_reasonstringThe reason the model stopped generating (e.g., stop, length).
usageobjectUsage statistics for the completion request.
usage.prompt_tokensintegerNumber of tokens consumed by the prompt.
usage.completion_tokensintegerNumber of tokens consumed by the generated completion.
usage.total_tokensintegerTotal number of tokens used (prompt + completion).

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "{\"is_violating\": false, \"category\": null, \"explanation_if_violating\": null}"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

Frequently Asked Questions (FAQs)

1. What is response_format and how do I ensure strict JSON output?

The response_format parameter allows you to request data in a specific format. To use "Structured Outputs," set it to {"type": "json_schema"} and define your JSON Schema within it, ensuring strict: true is enabled. This guarantees the model will follow your defined fields and data types 100%, eliminating the need for complex JSON error handling in your code.

2. Why is my JSON output truncated?

If the value of choices[0].finish_reason is length, it means the generated text reached the max_tokens limit or exceeded the model's maximum context length. We recommend increasing the max_tokens parameter in your request.

3. What is the difference between temperature and top_p? Which should I adjust?

Both parameters control the randomness of the output:

  • temperature scales the output probability distribution. Lower values make the output more precise and deterministic; higher values make it more creative and diverse.
  • top_p (nucleus sampling) limits the model to choosing from the top-frequency tokens whose cumulative probability reaches p. Official Recommendation: Adjust only one of these parameters; do not modify both temperature and top_p simultaneously.

4. What happens when stream: true is enabled?

When stream: true is set, the API will not wait for the entire response to be generated. Instead, it returns data fragments in real-time, similar to a typewriter. The data format changes to Server-Sent Events (SSE). Clients must listen to the stream and concatenate delta.content until the [DONE] signal is received.

5. How can I get the exact same answer for the same prompt every time?

While LLMs are inherently stochastic, you can achieve maximum determinism by passing a specific seen (Seed) parameter and setting temperature to 0. Note that if the system_fingerprint in the response changes, it indicates a back-end configuration update, which may result in slight differences even with the same seed.