Create Chat Completion with Vision (Streaming) - Base64
Interface Description: Given a prompt, the model returns one or more predicted completions. It can also return the probabilities of alternative tokens at each position. This interface is primarily used to create chat completions that involve image recognition (Base64-based) and text prompts.
Official Reference: OpenAI API Documentation
Endpoint Details
- Method:
POST - URL:
https://api.codingplanx.ai/v1/chat/completions - Content-Type:
application/json
Request Headers
| Parameter | Required | Type | Example Value | Description |
|---|---|---|---|---|
| Content-Type | Yes | string | application/json | Data format |
| Accept | Yes | string | application/json | Accepted response format |
| Authorization | No | string | Bearer {{YOUR_API_KEY}} | Authentication token |
Request Body
| Field Name | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | ID of the model to use (e.g., gpt-4o-mini). |
| messages | array | Yes | A list of messages comprising the conversation so far. Includes role and content. For vision, content can be an array containing text and image_url (Base64). |
| tools | array | Yes | A list of tools the model may call. Currently, only functions are supported. Used to provide a list of functions for which the model can generate JSON inputs. |
| tool_choice | object | Yes | Controls which tool is called by the model. none means no tool; auto means automatic selection; a specific function can also be forced. |
| temperature | number | No | Sampling temperature between 0 and 2. Higher values (e.g., 0.8) make output more random, while lower values (e.g., 0.2) make it more deterministic. It is recommended not to modify this and top_p simultaneously. |
| top_p | number | No | Nucleus sampling parameter. 0.1 means only tokens comprising the top 10% probability mass are considered. It is recommended not to modify this and temperature simultaneously. |
| n | integer | No | How many chat completion choices to generate for each input message. Defaults to 1. |
| stream | boolean | No | Streaming switch. Defaults to false. If set to true, partial message deltas will be sent as Server-Sent Events (SSE), terminating with data: [DONE]. |
| stop | string/array | No | Up to 4 sequences where the API will stop generating further tokens. Defaults to null. |
| max_tokens | integer | No | The maximum number of tokens to generate in the chat completion. Defaults to inf (infinite). |
| presence_penalty | number | No | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
| frequency_penalty | number | No | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the likelihood of repeating the same content. |
| logit_bias | object | No | Modifies the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps token IDs to bias values (-100 to 100). |
| user | string | No | A unique identifier representing your end-user, helping to monitor and detect abuse. |
| response_format | object | No | Specifies the format that the model must output. Pass { "type": "json_object" } to enable JSON mode. When used, you must also instruct the model via prompt to output JSON. |
| seed | integer | No | Beta feature. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. |
Request Example
{
"model": "gpt-4o-mini",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image? Please describe it in detail."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAiEAAAIhCAYAAACYF2qHAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAgAElEQVR4nOy9ebxtWVXf+x1zrrX2Pud2VRQUjUVf0glIYvOJLdJ8iCagIor... (omitted long base64 string)"
}
}
]
}
]
}
Response Body
| Field Name | Type | Description |
|---|---|---|
| id | string | A unique identifier for the chat completion. |
| object | string | The object type, usually chat.completion (or chat.completion.chunk for streaming). |
| created | integer | The Unix timestamp of when the completion was created. |
| choices | array | A list of chat completion choices. |
| └─ index | integer | The index of the choice in the list. |
| └─ message | object | A message object generated by the model, containing role and content. |
| └─ finish_reason | string | The reason the model stopped generating (e.g., stop for natural end, length for reaching max_tokens). |
| usage | object | Usage statistics for the completion request. |
| └─ prompt_tokens | integer | Number of tokens in the prompt (including image tokens). |
| └─ completion_tokens | integer | Number of tokens in the generated content. |
| └─ total_tokens | integer | Total tokens used in the request. |
Response Example (Successful 200 OK)
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\r
\r
Hello there, how may I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Frequently Asked Questions (FAQs)
Q1: How do I enable streaming output?
A: Set the stream parameter to true in the request body. Once enabled, the API will no longer wait for the entire response to be generated. Instead, it will return data chunks in data: {...} format via Server-Sent Events (SSE) until the data: [DONE] flag is received.
Q2: What are the requirements for Base64 encoding in vision tasks?
A: The image Base64 string must include the correct MIME type prefix, such as data:image/png;base64, or data:image/jpeg;base64,. It is recommended to compress the image dimensions appropriately before converting to Base64 to reduce network latency and prompt_tokens consumption (excessively large images consume many tokens and may exceed context limits).
Q3: Why is the model's response truncated?
A: Check the value of choices[0].finish_reason in the response body. If it is length, it means the generated text reached the max_tokens limit you set, or the entire conversation exceeded the model's maximum context window. You can resolve this by increasing the max_tokens value.
Q4: How can I force the model to output data in JSON format?
A: First, set "response_format": { "type": "json_object" } in the request body. Important: In addition to setting this parameter, you must explicitly instruct the model to output JSON using natural language in the messages (either in the system prompt or user prompt), e.g., "Please output the result in JSON format." Failure to do so may result in the model generating endless whitespace until tokens are exhausted.
Q5: What should I do if I encounter a 401 Unauthorized error?
A: Ensure that the Authorization parameter in the request header correctly carries the API Key. The format must strictly be Bearer YOUR_API_KEY (note the single space between Bearer and the Key).