Create Image Recognition Chat Completion (Non-Streaming)

This endpoint is used to send prompts containing both text and images to the model, and retrieve a single, non-streaming chat completion response. The model will return a detailed description or answer based on the provided context and image content.

  • Endpoint URL: https://api.codingplanx.ai/v1/chat/completions
  • HTTP Method: POST
  • Status: Released

Request Headers

ParameterRequiredTypeExampleDescription
Content-TypeYesStringapplication/jsonRequest body format
AcceptYesStringapplication/jsonResponse body format
AuthorizationNoStringBearer {{YOUR_API_KEY}}API key used for authentication

Request Body (Payload)

ParameterRequiredTypeDefaultDescription
modelYesString-ID of the model to use (e.g., gpt-4o).
messagesYesArray-A list of messages comprising the conversation. For image recognition, content must be an array of objects.
temperatureNoNumber1Sampling temperature (0 to 2). Higher values make output more random, lower values make it more deterministic.
top_pNoNumber1Nucleus sampling probability. It is recommended to alter this or temperature but not both.
max_tokensNoIntegerinfThe maximum number of tokens to generate in the chat completion.
nNoInteger1How many chat completion choices to generate for each input message.
streamNoBooleanfalseWhether to stream back partial progress. Fixed to false for this endpoint.
stopNoString/ArraynullUp to 4 sequences where the API will stop generating further tokens.
presence_penaltyNoNumber0Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penaltyNoNumber0Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
response_formatNoObject-Specifies the format that the model must output, e.g., { "type": "json_object" } to enable JSON mode.
userNoString-A unique identifier representing your end-user.
toolsNoArray-A list of tools the model may call (currently only function is supported).
tool_choiceNoObject-Controls which (if any) tool is called by the model (none/auto/specific function).

messages Image Recognition Object Structure (Content Array)

When making an image recognition request, messages.content should contain the following structure:

  • type: "text" or "image_url"
  • text: When type is "text", input the text prompt/question here.
  • image_url: When type is "image_url", input an object { "url": "Image URL" }.

Request Example

{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
            "role": "user",
            "content": [
                {
                    "type": "text", 
                    "text": "What is in this image? Please describe it in detail."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://lsky.zhongzhuan.chat/i/2024/10/17/6711068a14527.png"
                    }
                }
            ]
        }
    ]
}

Response Body

ParameterTypeDescription
idStringA unique identifier for the chat completion request.
objectStringThe object type, which is always chat.completion.
createdIntegerThe Unix timestamp (in seconds) of when the completion was created.
choicesArrayA list of chat completion choices generated by the model.
indexIntegerThe index of the choice in the list.
messageObjectThe chat completion message object (contains role and content).
finish_reasonStringThe reason the model stopped generating tokens (e.g., stop, length).
usageObjectUsage statistics for the completion request.
prompt_tokensIntegerNumber of tokens consumed by the prompt.
completion_tokensIntegerNumber of tokens consumed by the generated result.
total_tokensIntegerTotal number of tokens used in the request.

Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "This image shows a neat office desk with a laptop, a cup of coffee, and a succulent plant on it."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 120,
        "completion_tokens": 45,
        "total_tokens": 165
    }
}

Frequently Asked Questions (FAQs)

Q: Which models support the image recognition (vision) feature? A: Currently, multi-modal models such as gpt-4o, gpt-4-turbo, and gpt-4-vision-preview are supported. Please ensure you pass the correct ID in the model field.

Q: What image formats are supported? A: We support common image formats including PNG, JPEG, WEBP, and non-animated GIFs.

Q: Are there access restrictions on image links? A: The provided image URLs must be publicly accessible. If the image is hosted on private cloud storage, please generate a temporary pre-signed URL or convert the image to a Base64 encoded string before sending (Note: Base64 strings must follow the data:image/jpeg;base64,... format).

Q: How is image recognition billed? A: Images are converted into tokens for billing based on their resolution and level of detail. Typically, low-resolution images consume a fixed, small number of tokens, while high-resolution images are billed based on the number of 512x512 tiles required to process them.

Q: What if I want streaming output (typewriter effect)? A: Please set the stream parameter to true in your request. However, note that this specific documentation covers non-streaming responses. If streaming is enabled, the data format will change to Server-Sent Events (SSE).

Q: Why did the API return "finish_reason": "length"? A: This indicates that the generated response hit the max_tokens limit you configured, or it exceeded the model's context window limit, resulting in truncated content.