DeepSeek-OCR Image Recognition API Documentation
This endpoint provides Image to Text (OCR) capabilities based on the DeepSeek-OCR model. The API adopts an OpenAI-compatible Chat Completions format, supporting image uploads via URL or Base64 encoding to convert images into plain text or Markdown format.
- Base URL:
https://api.codingplanx.ai - Endpoint Path:
/v1/chat/completions - Method:
POST - Status: Released
1. Request Parameters
1.1 Headers
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| Content-Type | Yes | String | application/json | Request body format |
| Accept | Yes | String | application/json | Response body format |
| Authorization | Yes | String | Bearer YOUR_API_KEY | API Key for authentication |
1.2 Request Body
| Parameter | Required | Type | Description |
|---|---|---|---|
| model | Yes | String | The model ID to use. For OCR recognition, please specify deepseek-ocr. |
| messages | Yes | Array | A list of messages comprising the conversation. For OCR, image information must be included in the messages. |
| stream | No | Boolean | Defaults to false. If true, the response will be streamed back in real-time. |
| temperature | No | Number | Sampling temperature (0-2). Lower values are more focused and deterministic; higher values make the output more random. |
| top_p | No | Number | Nucleus sampling probability threshold. |
| max_tokens | No | Integer | The maximum number of tokens to generate. |
| response_format | No | Object | Specifies the return format, e.g., {"type": "json_object"} to enable JSON mode. |
| tools | No | Array | A list of tools the model may call (currently mainly supports function calling). |
2. Request Example
2.1 OCR Image Recognition Request (JSON)
{
"model": "deepseek-ocr",
"stream": false,
"messages": [
{
"role": "system",
"content": "<image>\r
Free OCR."
},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://s3.ffire.cc/files/pdf_to_markdown.jpg"
}
}
]
}
]
}
Note: The
image_url.urlfield supports passing a publicly accessible image URL directly, or a Base64-encoded data string (format:data:image/jpeg;base64,...).
3. Response Description
3.1 Response Body Structure
| Field | Type | Description |
|---|---|---|
| id | String | Unique identifier for the request. |
| object | String | Object type, typically chat.completion. |
| created | Integer | Unix timestamp (in seconds) of when the response was created. |
| choices | Array | A list containing the generated choices. |
| └─ message | Object | The message generated by the model, containing role and content. |
| └─ finish_reason | String | The reason the model stopped generating tokens (e.g., stop or length). |
| usage | Object | Token usage statistics, including prompt, completion, and total tokens. |
3.2 Response Example
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here is the extracted text from the OCR process, typically returned in Markdown format to preserve the layout of the image."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 512,
"completion_tokens": 128,
"total_tokens": 640
}
}
4. Frequently Asked Questions (FAQs)
Q1: What image formats does DeepSeek-OCR support?
A: Currently, mainstream image formats such as JPEG, PNG, WebP, and BMP are supported. It is recommended to use clear images with appropriate resolution to ensure high recognition accuracy.
Q2: How do I upload an image using Base64?
A: Simply replace the value of image_url.url with your Base64 string. Example format: "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...".
Q3: Why is the recognition result cut off?
A: Please check the finish_reason in the response. If it is length, it means the generated content exceeded the max_tokens limit. It is recommended to increase the max_tokens parameter.
Q4: Does this endpoint support batch image recognition?
A: You can try adding multiple image_url objects within the messages array in a single request. However, it is highly recommended to make separate API calls for each image to achieve optimal context processing and more stable responses.
Q5: How can I get more accurate formatting (e.g., for tables or math formulas)?
A: You can add specific prompt instructions in the system or user message. For example: "Please use Markdown format to preserve the table layout in the image and recognize the mathematical formulas within it."
Q6: Is streaming output supported?
A: Yes. By setting the stream parameter to true in your request payload, the model will return recognized text fragments in real-time. This is ideal for interactive scenarios requiring low Time-To-First-Byte (TTFB) and minimal initial latency.