DeepSeek-OCR Image Recognition API Documentation

This endpoint provides Image to Text (OCR) capabilities based on the DeepSeek-OCR model. The API adopts an OpenAI-compatible Chat Completions format, supporting image uploads via URL or Base64 encoding to convert images into plain text or Markdown format.

Base URL: https://api.codingplanx.ai
Endpoint Path: /v1/chat/completions
Method: POST
Status: Released

1. Request Parameters

1.1 Headers

Parameter	Required	Type	Example	Description
Content-Type	Yes	String	`application/json`	Request body format
Accept	Yes	String	`application/json`	Response body format
Authorization	Yes	String	`Bearer YOUR_API_KEY`	API Key for authentication

1.2 Request Body

Parameter	Required	Type	Description
model	Yes	String	The model ID to use. For OCR recognition, please specify `deepseek-ocr`.
messages	Yes	Array	A list of messages comprising the conversation. For OCR, image information must be included in the messages.
stream	No	Boolean	Defaults to `false`. If `true`, the response will be streamed back in real-time.
temperature	No	Number	Sampling temperature (0-2). Lower values are more focused and deterministic; higher values make the output more random.
top_p	No	Number	Nucleus sampling probability threshold.
max_tokens	No	Integer	The maximum number of tokens to generate.
response_format	No	Object	Specifies the return format, e.g., `{"type": "json_object"}` to enable JSON mode.
tools	No	Array	A list of tools the model may call (currently mainly supports function calling).

2. Request Example

2.1 OCR Image Recognition Request (JSON)

{
  "model": "deepseek-ocr",
  "stream": false,
  "messages": [
    {
      "role": "system",
      "content": "<image>\r
Free OCR."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://s3.ffire.cc/files/pdf_to_markdown.jpg" 
          }
        }
      ]
    }
  ]
}

Note: The image_url.url field supports passing a publicly accessible image URL directly, or a Base64-encoded data string (format: data:image/jpeg;base64,...).

3. Response Description

3.1 Response Body Structure

Field	Type	Description
id	String	Unique identifier for the request.
object	String	Object type, typically `chat.completion`.
created	Integer	Unix timestamp (in seconds) of when the response was created.
choices	Array	A list containing the generated choices.
└─ message	Object	The message generated by the model, containing `role` and `content`.
└─ finish_reason	String	The reason the model stopped generating tokens (e.g., `stop` or `length`).
usage	Object	Token usage statistics, including prompt, completion, and total tokens.

3.2 Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Here is the extracted text from the OCR process, typically returned in Markdown format to preserve the layout of the image."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 512,
        "completion_tokens": 128,
        "total_tokens": 640
    }
}

4. Frequently Asked Questions (FAQs)

Q1: What image formats does DeepSeek-OCR support? A: Currently, mainstream image formats such as JPEG, PNG, WebP, and BMP are supported. It is recommended to use clear images with appropriate resolution to ensure high recognition accuracy.

Q2: How do I upload an image using Base64? A: Simply replace the value of image_url.url with your Base64 string. Example format: "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...".

Q3: Why is the recognition result cut off? A: Please check the finish_reason in the response. If it is length, it means the generated content exceeded the max_tokens limit. It is recommended to increase the max_tokens parameter.

Q4: Does this endpoint support batch image recognition? A: You can try adding multiple image_url objects within the messages array in a single request. However, it is highly recommended to make separate API calls for each image to achieve optimal context processing and more stable responses.

Q5: How can I get more accurate formatting (e.g., for tables or math formulas)? A: You can add specific prompt instructions in the system or user message. For example: "Please use Markdown format to preserve the table layout in the image and recognize the mathematical formulas within it."

Q6: Is streaming output supported? A: Yes. By setting the stream parameter to true in your request payload, the model will return recognized text fragments in real-time. This is ideal for interactive scenarios requiring low Time-To-First-Byte (TTFB) and minimal initial latency.