Create Chat Completion with Vision (Streaming) - Base64

Interface Description: Given a prompt, the model returns one or more predicted completions. It can also return the probabilities of alternative tokens at each position. This interface is primarily used to create chat completions that involve image recognition (Base64-based) and text prompts.

Official Reference: OpenAI API Documentation

Endpoint Details

Method: POST
URL: https://api.codingplanx.ai/v1/chat/completions
Content-Type: application/json

Request Headers

Parameter	Required	Type	Example Value	Description
Content-Type	Yes	string	`application/json`	Data format
Accept	Yes	string	`application/json`	Accepted response format
Authorization	No	string	`Bearer {{YOUR_API_KEY}}`	Authentication token

Request Body

Field Name	Type	Required	Description
model	string	Yes	ID of the model to use (e.g., `gpt-4o-mini`).
messages	array	Yes	A list of messages comprising the conversation so far. Includes `role` and `content`. For vision, `content` can be an array containing text and `image_url` (Base64).
tools	array	Yes	A list of tools the model may call. Currently, only functions are supported. Used to provide a list of functions for which the model can generate JSON inputs.
tool_choice	object	Yes	Controls which tool is called by the model. `none` means no tool; `auto` means automatic selection; a specific function can also be forced.
temperature	number	No	Sampling temperature between 0 and 2. Higher values (e.g., 0.8) make output more random, while lower values (e.g., 0.2) make it more deterministic. It is recommended not to modify this and `top_p` simultaneously.
top_p	number	No	Nucleus sampling parameter. 0.1 means only tokens comprising the top 10% probability mass are considered. It is recommended not to modify this and `temperature` simultaneously.
n	integer	No	How many chat completion choices to generate for each input message. Defaults to 1.
stream	boolean	No	Streaming switch. Defaults to `false`. If set to `true`, partial message deltas will be sent as Server-Sent Events (SSE), terminating with `data: [DONE]`.
stop	string/array	No	Up to 4 sequences where the API will stop generating further tokens. Defaults to null.
max_tokens	integer	No	The maximum number of tokens to generate in the chat completion. Defaults to inf (infinite).
presence_penalty	number	No	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penalty	number	No	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the likelihood of repeating the same content.
logit_bias	object	No	Modifies the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps token IDs to bias values (-100 to 100).
user	string	No	A unique identifier representing your end-user, helping to monitor and detect abuse.
response_format	object	No	Specifies the format that the model must output. Pass `{ "type": "json_object" }` to enable JSON mode. When used, you must also instruct the model via prompt to output JSON.
seed	integer	No	Beta feature. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Request Example

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Please describe it in detail."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAiEAAAIhCAYAAACYF2qHAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAgAElEQVR4nOy9ebxtWVXf+x1zrrX2Pud2VRQUjUVf0glIYvOJLdJ8iCagIor... (omitted long base64 string)"
          }
        }
      ]
    }
  ]
}

Response Body

Field Name	Type	Description
id	string	A unique identifier for the chat completion.
object	string	The object type, usually `chat.completion` (or `chat.completion.chunk` for streaming).
created	integer	The Unix timestamp of when the completion was created.
choices	array	A list of chat completion choices.
└─ index	integer	The index of the choice in the list.
└─ message	object	A message object generated by the model, containing `role` and `content`.
└─ finish_reason	string	The reason the model stopped generating (e.g., `stop` for natural end, `length` for reaching max_tokens).
usage	object	Usage statistics for the completion request.
└─ prompt_tokens	integer	Number of tokens in the prompt (including image tokens).
└─ completion_tokens	integer	Number of tokens in the generated content.
└─ total_tokens	integer	Total tokens used in the request.

Response Example (Successful 200 OK)

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\r
\r
Hello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

Frequently Asked Questions (FAQs)

Q1: How do I enable streaming output? A: Set the stream parameter to true in the request body. Once enabled, the API will no longer wait for the entire response to be generated. Instead, it will return data chunks in data: {...} format via Server-Sent Events (SSE) until the data: [DONE] flag is received.

Q2: What are the requirements for Base64 encoding in vision tasks? A: The image Base64 string must include the correct MIME type prefix, such as data:image/png;base64, or data:image/jpeg;base64,. It is recommended to compress the image dimensions appropriately before converting to Base64 to reduce network latency and prompt_tokens consumption (excessively large images consume many tokens and may exceed context limits).

Q3: Why is the model's response truncated? A: Check the value of choices[0].finish_reason in the response body. If it is length, it means the generated text reached the max_tokens limit you set, or the entire conversation exceeded the model's maximum context window. You can resolve this by increasing the max_tokens value.

Q4: How can I force the model to output data in JSON format? A: First, set "response_format": { "type": "json_object" } in the request body. Important: In addition to setting this parameter, you must explicitly instruct the model to output JSON using natural language in the messages (either in the system prompt or user prompt), e.g., "Please output the result in JSON format." Failure to do so may result in the model generating endless whitespace until tokens are exhausted.

Q5: What should I do if I encounter a 401 Unauthorized error? A: Ensure that the Authorization parameter in the request header correctly carries the API Key. The format must strictly be Bearer YOUR_API_KEY (note the single space between Bearer and the Key).