Create Speech (Text-to-Speech) API Documentation

1. API Overview

This endpoint provides Text-to-Speech (TTS) capabilities, allowing you to convert input text into highly realistic synthetic speech. This API is fully compatible with OpenAI's standard specifications.


2. Request Specifications

  • Protocol: HTTP / HTTPS
  • Method: POST
  • Endpoint URL: https://api.codingplanx.ai/v1/audio/speech
  • Data Format: application/json

2.1 Request Headers

ParameterRequiredExample ValueDescription
Content-TypeYesapplication/jsonDeclares the data format of the request body.
AuthorizationYesBearer {YOUR_API_KEY}Authentication token used to authorize the API request.

2.2 Request Body Parameters

ParameterTypeRequiredDefaultDescription
modelstringYes-One of the available TTS models, such as: gpt-4o-mini-tts, tts-1, or tts-1-hd.
inputstringYes-The text to generate audio for. The maximum length is 4096 characters.
voicestringYes-The voice to use when generating the audio. Supported voices are: alloy, echo, fable, onyx, nova, and shimmer.
response_formatstringNomp3The format of the output audio file. Supported formats are: mp3, opus, aac, and flac.
speednumberNo1.0The playback speed of the generated audio. Supported values range from 0.25 to 4.0.

3. Request Example

cURL Request Example

curl --request POST \
  --url https://api.codingplanx.ai/v1/audio/speech \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "gpt-4o-mini-tts",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output output.mp3

Note: Because this endpoint returns a binary audio stream, it is highly recommended to use the --output parameter in your cURL command to save the response directly as an audio file.


4. Response Specifications

  • HTTP Status Code: 200 OK
  • Content-Type: Corresponds to the requested response_format (e.g., audio/mpeg).

4.1 Success Response

Upon a successful request, the API will directly return the binary data stream of the generated audio file. You can play this stream directly in the client application or save it locally as a file.

4.2 Error Response Example

If the request fails, it will return a JSON-formatted error message:

{
  "error": {
    "message": "Invalid authorization key.",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

5. Frequently Asked Questions (FAQs)

Q1: What is the length limit for the input text? What should I do if my text exceeds the limit? A1: The maximum length for the input parameter is 4096 characters. If you need to convert longer texts, we recommend that your client application splits the text into smaller chunks (less than 4096 characters) based on paragraphs or punctuation marks. You can then make separate API requests for each chunk and concatenate the resulting audio files together.

Q2: What are the differences between the response_format options, and which one should I choose? A2:

  • mp3 (Default): Highest compatibility, ideal for the vast majority of web browsers and mobile devices.
  • opus: Ultra-low latency and high compression, perfect for real-time network streaming.
  • aac: Excellent audio quality, performs best on iOS devices, within the Apple ecosystem, and on platforms like YouTube.
  • flac: Lossless format, provides the best audio quality but results in the largest file size; ideal for audio archiving or post-production.

Q3: How can I control the tone, emotion, or pauses of the voice? A3: The current models do not support direct manipulation of emotion or pause duration via parameters (such as SSML tags). The model automatically infers appropriate pauses and intonations based on the text's context and punctuation (e.g., commas, periods, exclamation marks, question marks). You can improve the final voiceover effect by optimizing the punctuation within your text.

Q4: Can I adjust the playback speed of the audio? A4: Yes. You can use the speed parameter and set it to any number between 0.25 and 4.0. 1.0 represents normal speed; values below 1.0 slow down the audio, while values above 1.0 speed it up.

Q5: Why did my request return a 401 Unauthorized error? A5: Please check your request headers to ensure that you have correctly included Authorization: Bearer {YOUR_API_KEY}. Additionally, verify that your API Key is not expired and that your account status is active and normal.