GPT-4o-Audio API Documentation
1. Overview
This endpoint is used to call the GPT-4o model with audio generation capabilities. It supports simultaneous text and audio interaction and generation within a single request.
- Status: Released
- Official Reference Documentation: OpenAI API Reference
2. Request Details
- Protocol: HTTP / HTTPS
- Method:
POST - Endpoint:
POST https://api.codingplanx.ai/v1/chat/completions
3. Request Headers
| Parameter | Type | Required | Example | Description |
|---|---|---|---|---|
| Content-Type | string | Yes | application/json | Specifies the data format of the request body. |
| Authorization | string | Yes | Bearer $OPENAI_API_KEY | Authentication credentials. Please replace with your actual API Key. |
4. Request Body
The request body must be in application/json format.
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | The ID of the model to use. Example: gpt-4o-audio-preview. |
| modalities | array[string] | No | The output modalities generated by the model. Default: ["text"].<br>To request the model to generate both text and audio responses, use: ["text", "audio"]. |
| audio | object | No | Audio output parameters. Required when "audio" is included in modalities. |
| ? audio.voice | string | Yes*(Conditional)* | The voice the model uses to respond. Available options: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer. |
| ? audio.format | string | Yes*(Conditional)* | Specifies the output audio format. Must be one of: wav, mp3, flac, opus, or pcm16. |
| messages | array[object] | Yes | A list of messages comprising the conversation so far. |
| ? messages[].role | string | Yes | The role of the messages author (e.g., system, user, assistant). |
| ? messages[].content | string | Yes | The contents of the message. |
5. Request Example (cURL)
curl -X POST "https://api.codingplanx.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-audio-preview",
"modalities": ["text", "audio"],
"audio": { "voice": "alloy", "format": "wav" },
"messages": [
{
"role": "user",
"content": "Is a golden retriever a good family dog?"
}
]
}'
6. Response
- HTTP Status Code:
200 OK(Success) - Content-Type:
application/json
(Note: The response structure follows the standard OpenAI Chat Completions specification. When the request includes an audio configuration, the choices[0].message in the returned JSON will contain an audio object, which includes Base64-encoded audio data and related IDs.)
7. Frequently Asked Questions (FAQs)
Q1: Why does my request only return text and no audio?
A1: Please check if the modalities parameter in the request body explicitly includes "audio" (i.e., ["text", "audio"]). If this parameter is not configured, the system defaults to outputting only "text". Additionally, ensure that the voice and format properties under the audio field are configured correctly.
Q2: How should I choose between the different audio formats? A2:
mp3: Most universally compatible, suitable for web and general applications with smaller file sizes.wav: Lossless uncompressed format, offering the highest audio quality but the largest file size. Ideal for professional audio processing requiring high fidelity.flac: Lossless compressed format. Smaller file size than wav with the exact same audio quality.opus: Designed for internet streaming, offering extremely low latency and high compression. Perfect for real-time voice calls or low-bandwidth environments.pcm16: Raw, uncompressed audio stream format. Suitable for low-level data transmission or secondary hardware-level development.
Q3: How should I format the Authorization request header?
A3: You must use the Bearer authentication scheme. Place your API Key directly after Bearer (note the required trailing space), formatting it as: Authorization: Bearer sk-xxxxxxxxx.
Q4: What voice options are supported by this API?
A4: Currently, it supports 10 preset voices: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, and shimmer. We recommend testing these voices in your specific business scenarios to select the tone that best aligns with your application's style.
Q5: What causes a 401 Unauthorized error?
A5: This is typically caused by an invalid or expired API Key, or a formatting/spelling error in the Authorization header. Please check your console dashboard to confirm your API Key is active, and ensure your code includes the correct "Bearer " prefix and spacing.