Create Speech (Text-to-Speech) API Documentation
1. API Overview
This endpoint provides Text-to-Speech (TTS) capabilities, allowing you to convert input text into highly realistic synthetic speech. This API is fully compatible with OpenAI's standard specifications.
- Endpoint Name: Create Speech
gpt-4o-mini-tts - Official Reference: OpenAI Text-to-Speech Guides
- Current Status: Released
2. Request Specifications
- Protocol: HTTP / HTTPS
- Method:
POST - Endpoint URL:
https://api.codingplanx.ai/v1/audio/speech - Data Format:
application/json
2.1 Request Headers
| Parameter | Required | Example Value | Description |
|---|---|---|---|
| Content-Type | Yes | application/json | Declares the data format of the request body. |
| Authorization | Yes | Bearer {YOUR_API_KEY} | Authentication token used to authorize the API request. |
2.2 Request Body Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | One of the available TTS models, such as: gpt-4o-mini-tts, tts-1, or tts-1-hd. |
input | string | Yes | - | The text to generate audio for. The maximum length is 4096 characters. |
voice | string | Yes | - | The voice to use when generating the audio. Supported voices are: alloy, echo, fable, onyx, nova, and shimmer. |
response_format | string | No | mp3 | The format of the output audio file. Supported formats are: mp3, opus, aac, and flac. |
speed | number | No | 1.0 | The playback speed of the generated audio. Supported values range from 0.25 to 4.0. |
3. Request Example
cURL Request Example
curl --request POST \
--url https://api.codingplanx.ai/v1/audio/speech \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-4o-mini-tts",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}' \
--output output.mp3
Note: Because this endpoint returns a binary audio stream, it is highly recommended to use the
--outputparameter in your cURL command to save the response directly as an audio file.
4. Response Specifications
- HTTP Status Code:
200 OK - Content-Type: Corresponds to the requested
response_format(e.g.,audio/mpeg).
4.1 Success Response
Upon a successful request, the API will directly return the binary data stream of the generated audio file. You can play this stream directly in the client application or save it locally as a file.
4.2 Error Response Example
If the request fails, it will return a JSON-formatted error message:
{
"error": {
"message": "Invalid authorization key.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
5. Frequently Asked Questions (FAQs)
Q1: What is the length limit for the input text? What should I do if my text exceeds the limit?
A1: The maximum length for the input parameter is 4096 characters. If you need to convert longer texts, we recommend that your client application splits the text into smaller chunks (less than 4096 characters) based on paragraphs or punctuation marks. You can then make separate API requests for each chunk and concatenate the resulting audio files together.
Q2: What are the differences between the response_format options, and which one should I choose?
A2:
mp3(Default): Highest compatibility, ideal for the vast majority of web browsers and mobile devices.opus: Ultra-low latency and high compression, perfect for real-time network streaming.aac: Excellent audio quality, performs best on iOS devices, within the Apple ecosystem, and on platforms like YouTube.flac: Lossless format, provides the best audio quality but results in the largest file size; ideal for audio archiving or post-production.
Q3: How can I control the tone, emotion, or pauses of the voice? A3: The current models do not support direct manipulation of emotion or pause duration via parameters (such as SSML tags). The model automatically infers appropriate pauses and intonations based on the text's context and punctuation (e.g., commas, periods, exclamation marks, question marks). You can improve the final voiceover effect by optimizing the punctuation within your text.
Q4: Can I adjust the playback speed of the audio?
A4: Yes. You can use the speed parameter and set it to any number between 0.25 and 4.0. 1.0 represents normal speed; values below 1.0 slow down the audio, while values above 1.0 speed it up.
Q5: Why did my request return a 401 Unauthorized error?
A5: Please check your request headers to ensure that you have correctly included Authorization: Bearer {YOUR_API_KEY}. Additionally, verify that your API Key is not expired and that your account status is active and normal.