Create Image Recognition Chat Completion (Non-Streaming)
This endpoint is used to send prompts containing both text and images to the model, and retrieve a single, non-streaming chat completion response. The model will return a detailed description or answer based on the provided context and image content.
- Endpoint URL:
https://api.codingplanx.ai/v1/chat/completions - HTTP Method:
POST - Status: Released
Request Headers
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| Content-Type | Yes | String | application/json | Request body format |
| Accept | Yes | String | application/json | Response body format |
| Authorization | No | String | Bearer {{YOUR_API_KEY}} | API key used for authentication |
Request Body (Payload)
| Parameter | Required | Type | Default | Description |
|---|---|---|---|---|
| model | Yes | String | - | ID of the model to use (e.g., gpt-4o). |
| messages | Yes | Array | - | A list of messages comprising the conversation. For image recognition, content must be an array of objects. |
| temperature | No | Number | 1 | Sampling temperature (0 to 2). Higher values make output more random, lower values make it more deterministic. |
| top_p | No | Number | 1 | Nucleus sampling probability. It is recommended to alter this or temperature but not both. |
| max_tokens | No | Integer | inf | The maximum number of tokens to generate in the chat completion. |
| n | No | Integer | 1 | How many chat completion choices to generate for each input message. |
| stream | No | Boolean | false | Whether to stream back partial progress. Fixed to false for this endpoint. |
| stop | No | String/Array | null | Up to 4 sequences where the API will stop generating further tokens. |
| presence_penalty | No | Number | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
| frequency_penalty | No | Number | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
| response_format | No | Object | - | Specifies the format that the model must output, e.g., { "type": "json_object" } to enable JSON mode. |
| user | No | String | - | A unique identifier representing your end-user. |
| tools | No | Array | - | A list of tools the model may call (currently only function is supported). |
| tool_choice | No | Object | - | Controls which (if any) tool is called by the model (none/auto/specific function). |
messages Image Recognition Object Structure (Content Array)
When making an image recognition request, messages.content should contain the following structure:
type: "text" or "image_url"text: Whentypeis "text", input the text prompt/question here.image_url: Whentypeis "image_url", input an object{ "url": "Image URL" }.
Request Example
{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image? Please describe it in detail."
},
{
"type": "image_url",
"image_url": {
"url": "https://lsky.zhongzhuan.chat/i/2024/10/17/6711068a14527.png"
}
}
]
}
]
}
Response Body
| Parameter | Type | Description |
|---|---|---|
| id | String | A unique identifier for the chat completion request. |
| object | String | The object type, which is always chat.completion. |
| created | Integer | The Unix timestamp (in seconds) of when the completion was created. |
| choices | Array | A list of chat completion choices generated by the model. |
| └ index | Integer | The index of the choice in the list. |
| └ message | Object | The chat completion message object (contains role and content). |
| └ finish_reason | String | The reason the model stopped generating tokens (e.g., stop, length). |
| usage | Object | Usage statistics for the completion request. |
| └ prompt_tokens | Integer | Number of tokens consumed by the prompt. |
| └ completion_tokens | Integer | Number of tokens consumed by the generated result. |
| └ total_tokens | Integer | Total number of tokens used in the request. |
Response Example
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "This image shows a neat office desk with a laptop, a cup of coffee, and a succulent plant on it."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 120,
"completion_tokens": 45,
"total_tokens": 165
}
}
Frequently Asked Questions (FAQs)
Q: Which models support the image recognition (vision) feature?
A: Currently, multi-modal models such as gpt-4o, gpt-4-turbo, and gpt-4-vision-preview are supported. Please ensure you pass the correct ID in the model field.
Q: What image formats are supported? A: We support common image formats including PNG, JPEG, WEBP, and non-animated GIFs.
Q: Are there access restrictions on image links?
A: The provided image URLs must be publicly accessible. If the image is hosted on private cloud storage, please generate a temporary pre-signed URL or convert the image to a Base64 encoded string before sending (Note: Base64 strings must follow the data:image/jpeg;base64,... format).
Q: How is image recognition billed? A: Images are converted into tokens for billing based on their resolution and level of detail. Typically, low-resolution images consume a fixed, small number of tokens, while high-resolution images are billed based on the number of 512x512 tiles required to process them.
Q: What if I want streaming output (typewriter effect)?
A: Please set the stream parameter to true in your request. However, note that this specific documentation covers non-streaming responses. If streaming is enabled, the data format will change to Server-Sent Events (SSE).
Q: Why did the API return "finish_reason": "length"?
A: This indicates that the generated response hit the max_tokens limit you configured, or it exceeded the model's context window limit, resulting in truncated content.