DeepSeek-OCR Image Recognition API Documentation

This endpoint provides Image to Text (OCR) capabilities based on the DeepSeek-OCR model. The API adopts an OpenAI-compatible Chat Completions format, supporting image uploads via URL or Base64 encoding to convert images into plain text or Markdown format.

  • Base URL: https://api.codingplanx.ai
  • Endpoint Path: /v1/chat/completions
  • Method: POST
  • Status: Released

1. Request Parameters

1.1 Headers

ParameterRequiredTypeExampleDescription
Content-TypeYesStringapplication/jsonRequest body format
AcceptYesStringapplication/jsonResponse body format
AuthorizationYesStringBearer YOUR_API_KEYAPI Key for authentication

1.2 Request Body

ParameterRequiredTypeDescription
modelYesStringThe model ID to use. For OCR recognition, please specify deepseek-ocr.
messagesYesArrayA list of messages comprising the conversation. For OCR, image information must be included in the messages.
streamNoBooleanDefaults to false. If true, the response will be streamed back in real-time.
temperatureNoNumberSampling temperature (0-2). Lower values are more focused and deterministic; higher values make the output more random.
top_pNoNumberNucleus sampling probability threshold.
max_tokensNoIntegerThe maximum number of tokens to generate.
response_formatNoObjectSpecifies the return format, e.g., {"type": "json_object"} to enable JSON mode.
toolsNoArrayA list of tools the model may call (currently mainly supports function calling).

2. Request Example

2.1 OCR Image Recognition Request (JSON)

{
  "model": "deepseek-ocr",
  "stream": false,
  "messages": [
    {
      "role": "system",
      "content": "<image>\r
Free OCR."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://s3.ffire.cc/files/pdf_to_markdown.jpg" 
          }
        }
      ]
    }
  ]
}

Note: The image_url.url field supports passing a publicly accessible image URL directly, or a Base64-encoded data string (format: data:image/jpeg;base64,...).


3. Response Description

3.1 Response Body Structure

FieldTypeDescription
idStringUnique identifier for the request.
objectStringObject type, typically chat.completion.
createdIntegerUnix timestamp (in seconds) of when the response was created.
choicesArrayA list containing the generated choices.
└─ messageObjectThe message generated by the model, containing role and content.
└─ finish_reasonStringThe reason the model stopped generating tokens (e.g., stop or length).
usageObjectToken usage statistics, including prompt, completion, and total tokens.

3.2 Response Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Here is the extracted text from the OCR process, typically returned in Markdown format to preserve the layout of the image."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 512,
        "completion_tokens": 128,
        "total_tokens": 640
    }
}

4. Frequently Asked Questions (FAQs)

Q1: What image formats does DeepSeek-OCR support? A: Currently, mainstream image formats such as JPEG, PNG, WebP, and BMP are supported. It is recommended to use clear images with appropriate resolution to ensure high recognition accuracy.

Q2: How do I upload an image using Base64? A: Simply replace the value of image_url.url with your Base64 string. Example format: "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...".

Q3: Why is the recognition result cut off? A: Please check the finish_reason in the response. If it is length, it means the generated content exceeded the max_tokens limit. It is recommended to increase the max_tokens parameter.

Q4: Does this endpoint support batch image recognition? A: You can try adding multiple image_url objects within the messages array in a single request. However, it is highly recommended to make separate API calls for each image to achieve optimal context processing and more stable responses.

Q5: How can I get more accurate formatting (e.g., for tables or math formulas)? A: You can add specific prompt instructions in the system or user message. For example: "Please use Markdown format to preserve the table layout in the image and recognize the mathematical formulas within it."

Q6: Is streaming output supported? A: Yes. By setting the stream parameter to true in your request payload, the model will return recognized text fragments in real-time. This is ideal for interactive scenarios requiring low Time-To-First-Byte (TTFB) and minimal initial latency.