Create Image Recognition Chat Completion (Non-Streaming)

This endpoint is used to send prompts containing both text and images to the model, and retrieve a single, non-streaming chat completion response. The model will return a detailed description or answer based on the provided context and image content.

Endpoint URL: https://api.codingplanx.ai/v1/chat/completions
HTTP Method: POST
Status: Released

Request Headers

Parameter	Required	Type	Example	Description
Content-Type	Yes	String	`application/json`	Request body format
Accept	Yes	String	`application/json`	Response body format
Authorization	No	String	`Bearer {{YOUR_API_KEY}}`	API key used for authentication

Request Body (Payload)

Parameter	Required	Type	Default	Description
model	Yes	String	-	ID of the model to use (e.g., `gpt-4o`).
messages	Yes	Array	-	A list of messages comprising the conversation. For image recognition, `content` must be an array of objects.
temperature	No	Number	1	Sampling temperature (0 to 2). Higher values make output more random, lower values make it more deterministic.
top_p	No	Number	1	Nucleus sampling probability. It is recommended to alter this or `temperature` but not both.
max_tokens	No	Integer	inf	The maximum number of tokens to generate in the chat completion.
n	No	Integer	1	How many chat completion choices to generate for each input message.
stream	No	Boolean	false	Whether to stream back partial progress. Fixed to `false` for this endpoint.
stop	No	String/Array	null	Up to 4 sequences where the API will stop generating further tokens.
presence_penalty	No	Number	0	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penalty	No	Number	0	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
response_format	No	Object	-	Specifies the format that the model must output, e.g., `{ "type": "json_object" }` to enable JSON mode.
user	No	String	-	A unique identifier representing your end-user.
tools	No	Array	-	A list of tools the model may call (currently only `function` is supported).
tool_choice	No	Object	-	Controls which (if any) tool is called by the model (none/auto/specific function).

`messages` Image Recognition Object Structure (Content Array)

When making an image recognition request, messages.content should contain the following structure:

type: "text" or "image_url"
text: When type is "text", input the text prompt/question here.
image_url: When type is "image_url", input an object { "url": "Image URL" }.

Request Example

{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
            "role": "user",
            "content": [
                {
                    "type": "text", 
                    "text": "What is in this image? Please describe it in detail."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://lsky.zhongzhuan.chat/i/2024/10/17/6711068a14527.png"
                    }
                }
            ]
        }
    ]
}

Response Body

Parameter	Type	Description
id	String	A unique identifier for the chat completion request.
object	String	The object type, which is always `chat.completion`.
created	Integer	The Unix timestamp (in seconds) of when the completion was created.
choices	Array	A list of chat completion choices generated by the model.
└ index	Integer	The index of the choice in the list.
└ message	Object	The chat completion message object (contains `role` and `content`).
└ finish_reason	String	The reason the model stopped generating tokens (e.g., `stop`, `length`).
usage	Object	Usage statistics for the completion request.
└ prompt_tokens	Integer	Number of tokens consumed by the prompt.
└ completion_tokens	Integer	Number of tokens consumed by the generated result.
└ total_tokens	Integer	Total number of tokens used in the request.

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "This image shows a neat office desk with a laptop, a cup of coffee, and a succulent plant on it."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 120,
        "completion_tokens": 45,
        "total_tokens": 165
    }
}

Frequently Asked Questions (FAQs)

Q: Which models support the image recognition (vision) feature? A: Currently, multi-modal models such as gpt-4o, gpt-4-turbo, and gpt-4-vision-preview are supported. Please ensure you pass the correct ID in the model field.

Q: What image formats are supported? A: We support common image formats including PNG, JPEG, WEBP, and non-animated GIFs.

Q: Are there access restrictions on image links? A: The provided image URLs must be publicly accessible. If the image is hosted on private cloud storage, please generate a temporary pre-signed URL or convert the image to a Base64 encoded string before sending (Note: Base64 strings must follow the data:image/jpeg;base64,... format).

Q: How is image recognition billed? A: Images are converted into tokens for billing based on their resolution and level of detail. Typically, low-resolution images consume a fixed, small number of tokens, while high-resolution images are billed based on the number of 512x512 tiles required to process them.

Q: What if I want streaming output (typewriter effect)? A: Please set the stream parameter to true in your request. However, note that this specific documentation covers non-streaming responses. If streaming is enabled, the data format will change to Server-Sent Events (SSE).

Q: Why did the API return "finish_reason": "length"? A: This indicates that the generated response hit the max_tokens limit you configured, or it exceeded the model's context window limit, resulting in truncated content.