Create Chat Vision API (Streaming / Non-Streaming)

This API supports multimodal inputs, allowing users to send text and image URLs within a conversation. The model will generate appropriate responses based on the provided prompts and image content. It supports streaming output to provide a smoother, more interactive experience.

Endpoint: https://api.codingplanx.ai/v1/chat/completions
Method: POST
Official Reference: OpenAI Chat Completion API

Request Parameters

Header Parameters

Parameter	Required	Type	Example	Description
Content-Type	Yes	string	`application/json`	Request body format
Accept	Yes	string	`application/json`	Response format
Authorization	No	string	`Bearer {{YOUR_API_KEY}}`	API key for authentication

Body Parameters

Parameter	Type	Required	Default	Description
model	string	Yes	-	ID of the model to use (e.g., `gpt-4o`, `gpt-4-vision-preview`).
messages	array	Yes	-	A list of messages comprising the conversation so far. See the `messages` structure below.
stream	boolean	No	`false`	Whether to enable streaming output. If enabled, partial message deltas will be sent via Server-Sent Events (SSE).
temperature	number	No	`1`	Sampling temperature (0-2). Higher values make output more random, while lower values make it more deterministic.
top_p	number	No	`1`	Nucleus sampling probability. It is recommended to alter this or `temperature`, but not both.
max_tokens	integer	No	inf	The maximum number of tokens to generate in the chat completion.
n	integer	No	`1`	How many chat completion choices to generate for each input message.
presence_penalty	number	No	`0`	(-2.0 to 2.0) Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penalty	number	No	`0`	(-2.0 to 2.0) Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
response_format	object	No	-	Specifies the output format, e.g., `{"type": "json_object"}`.
stop	string/array	No	null	Up to 4 sequences where the API will stop generating further tokens.
user	string	No	-	A unique identifier representing your end-user, which can help monitor and detect abuse.

`messages` Object Structure

Each item in the message list contains the following fields:

role: (string) The role of the message's author. Options: system, user, assistant, tool.
content: (string or array) The contents of the message.
- In Vision Mode, content is an array of objects containing types text and image_url.

Request Example

Mixed Text and Vision Request (JSON)

{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a professional image analysis assistant."
      },
      {
            "role": "user",
            "content": [
                {
                    "type": "text", 
                    "text": "What is in this image? Please describe it in detail."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.png"
                    }
                }
            ]
        }
    ],
    "stream": true
}

Response Explanation

Non-Streaming Response (stream: false)

Parameter	Type	Description
id	string	A unique identifier for the chat completion.
object	string	The object type, which is always `chat.completion`.
created	integer	The Unix timestamp (in seconds) of when the chat completion was created.
choices	array	A list of chat completion choices generated by the model.
choices[n].message	object	A chat completion message generated by the model (contains `role` and `content`).
choices[n].finish_reason	string	The reason the model stopped generating tokens (e.g., `stop`, `length`).
usage	object	Usage statistics for the completion request.

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "This image shows a tranquil lake with a backdrop of rolling mountains."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 120,
        "completion_tokens": 35,
        "total_tokens": 155
    }
}

Streaming Response (stream: true)

When stream is set to true, the API returns a text/event-stream. Each line begins with data: followed by a JSON string. The stream is terminated by a data: [DONE] message.

FAQs (Frequently Asked Questions)

Q1: How can I upload local images instead of using URLs? A: This API primarily supports image URLs. If you only have local images, it is recommended to upload them to your image hosting service or cloud storage (OSS) first. Alternatively, you can convert the image into a Base64 encoded string and format it as data:image/jpeg;base64,{base64_encode_data}, then pass it into the url field.

Q2: Why are there no usage statistics in my streaming response? A: In standard OpenAI-compatible streaming protocols, usage is typically only returned in the final data chunk or by setting a specific stream_options parameter. Please check if your model version supports returning token counts within a stream.

Q3: What are the size and format limits for image recognition? A: Generally, PNG, JPEG, WEBP, and GIF (non-animated) formats are supported. It is recommended that image sizes do not exceed 20MB. For optimal recognition results, a resolution of 512x512 or higher is advised.

Q4: What should I do if my request returns a "401 Unauthorized" error? A: Please ensure that you are correctly passing the Authorization field in the Header. The format should be Bearer followed by your API KEY. Also, verify that the KEY is valid and your account balance is sufficient.

Q5: Does the temperature parameter affect image recognition? A: Yes, it does. The temperature affects the linguistic creativity of the model when describing an image. If you need highly objective and rigorous image descriptions, a lower value (e.g., 0.2) is recommended. If you want more vivid and engaging descriptions, you can set a higher value (e.g., 0.8).

Q6: How do I process multiple images in a single request? A: You can place multiple objects with the image_url type into the messages.content array. The model will attempt to understand the context and content of all provided images simultaneously.