Create Chat Completion with Vision (Streaming) - Base64

Interface Description: Given a prompt, the model returns one or more predicted completions. It can also return the probabilities of alternative tokens at each position. This interface is primarily used to create chat completions that involve image recognition (Base64-based) and text prompts.

Official Reference: OpenAI API Documentation


Endpoint Details

  • Method: POST
  • URL: https://api.codingplanx.ai/v1/chat/completions
  • Content-Type: application/json

Request Headers

ParameterRequiredTypeExample ValueDescription
Content-TypeYesstringapplication/jsonData format
AcceptYesstringapplication/jsonAccepted response format
AuthorizationNostringBearer {{YOUR_API_KEY}}Authentication token

Request Body

Field NameTypeRequiredDescription
modelstringYesID of the model to use (e.g., gpt-4o-mini).
messagesarrayYesA list of messages comprising the conversation so far. Includes role and content. For vision, content can be an array containing text and image_url (Base64).
toolsarrayYesA list of tools the model may call. Currently, only functions are supported. Used to provide a list of functions for which the model can generate JSON inputs.
tool_choiceobjectYesControls which tool is called by the model. none means no tool; auto means automatic selection; a specific function can also be forced.
temperaturenumberNoSampling temperature between 0 and 2. Higher values (e.g., 0.8) make output more random, while lower values (e.g., 0.2) make it more deterministic. It is recommended not to modify this and top_p simultaneously.
top_pnumberNoNucleus sampling parameter. 0.1 means only tokens comprising the top 10% probability mass are considered. It is recommended not to modify this and temperature simultaneously.
nintegerNoHow many chat completion choices to generate for each input message. Defaults to 1.
streambooleanNoStreaming switch. Defaults to false. If set to true, partial message deltas will be sent as Server-Sent Events (SSE), terminating with data: [DONE].
stopstring/arrayNoUp to 4 sequences where the API will stop generating further tokens. Defaults to null.
max_tokensintegerNoThe maximum number of tokens to generate in the chat completion. Defaults to inf (infinite).
presence_penaltynumberNoNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penaltynumberNoNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the likelihood of repeating the same content.
logit_biasobjectNoModifies the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps token IDs to bias values (-100 to 100).
userstringNoA unique identifier representing your end-user, helping to monitor and detect abuse.
response_formatobjectNoSpecifies the format that the model must output. Pass { "type": "json_object" } to enable JSON mode. When used, you must also instruct the model via prompt to output JSON.
seedintegerNoBeta feature. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Request Example

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Please describe it in detail."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAiEAAAIhCAYAAACYF2qHAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAgAElEQVR4nOy9ebxtWVXf+x1zrrX2Pud2VRQUjUVf0glIYvOJLdJ8iCagIor... (omitted long base64 string)"
          }
        }
      ]
    }
  ]
}

Response Body

Field NameTypeDescription
idstringA unique identifier for the chat completion.
objectstringThe object type, usually chat.completion (or chat.completion.chunk for streaming).
createdintegerThe Unix timestamp of when the completion was created.
choicesarrayA list of chat completion choices.
└─ indexintegerThe index of the choice in the list.
└─ messageobjectA message object generated by the model, containing role and content.
└─ finish_reasonstringThe reason the model stopped generating (e.g., stop for natural end, length for reaching max_tokens).
usageobjectUsage statistics for the completion request.
└─ prompt_tokensintegerNumber of tokens in the prompt (including image tokens).
└─ completion_tokensintegerNumber of tokens in the generated content.
└─ total_tokensintegerTotal tokens used in the request.

Response Example (Successful 200 OK)

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\r
\r
Hello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

Frequently Asked Questions (FAQs)

Q1: How do I enable streaming output? A: Set the stream parameter to true in the request body. Once enabled, the API will no longer wait for the entire response to be generated. Instead, it will return data chunks in data: {...} format via Server-Sent Events (SSE) until the data: [DONE] flag is received.

Q2: What are the requirements for Base64 encoding in vision tasks? A: The image Base64 string must include the correct MIME type prefix, such as data:image/png;base64, or data:image/jpeg;base64,. It is recommended to compress the image dimensions appropriately before converting to Base64 to reduce network latency and prompt_tokens consumption (excessively large images consume many tokens and may exceed context limits).

Q3: Why is the model's response truncated? A: Check the value of choices[0].finish_reason in the response body. If it is length, it means the generated text reached the max_tokens limit you set, or the entire conversation exceeded the model's maximum context window. You can resolve this by increasing the max_tokens value.

Q4: How can I force the model to output data in JSON format? A: First, set "response_format": { "type": "json_object" } in the request body. Important: In addition to setting this parameter, you must explicitly instruct the model to output JSON using natural language in the messages (either in the system prompt or user prompt), e.g., "Please output the result in JSON format." Failure to do so may result in the model generating endless whitespace until tokens are exhausted.

Q5: What should I do if I encounter a 401 Unauthorized error? A: Ensure that the Authorization parameter in the request header correctly carries the API Key. The format must strictly be Bearer YOUR_API_KEY (note the single space between Bearer and the Key).