GPT-4o-Audio API Documentation

1. Overview

This endpoint is used to call the GPT-4o model with audio generation capabilities. It supports simultaneous text and audio interaction and generation within a single request.

Status: Released
Official Reference Documentation: OpenAI API Reference

2. Request Details

Protocol: HTTP / HTTPS
Method: POST
Endpoint:

POST https://api.codingplanx.ai/v1/chat/completions

3. Request Headers

Parameter	Type	Required	Example	Description
Content-Type	`string`	Yes	`application/json`	Specifies the data format of the request body.
Authorization	`string`	Yes	`Bearer $OPENAI_API_KEY`	Authentication credentials. Please replace with your actual API Key.

4. Request Body

The request body must be in application/json format.

Parameter	Type	Required	Description
model	`string`	Yes	The ID of the model to use. Example: `gpt-4o-audio-preview`.
modalities	`array[string]`	No	The output modalities generated by the model. Default: `["text"]`.<br>To request the model to generate both text and audio responses, use: `["text", "audio"]`.
audio	`object`	No	Audio output parameters. Required when `"audio"` is included in `modalities`.
? audio.voice	`string`	Yes(Conditional)	The voice the model uses to respond. Available options: `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `nova`, `onyx`, `sage`, `shimmer`.
? audio.format	`string`	Yes(Conditional)	Specifies the output audio format. Must be one of: `wav`, `mp3`, `flac`, `opus`, or `pcm16`.
messages	`array[object]`	Yes	A list of messages comprising the conversation so far.
? messages[].role	`string`	Yes	The role of the messages author (e.g., `system`, `user`, `assistant`).
? messages[].content	`string`	Yes	The contents of the message.

5. Request Example (cURL)

curl -X POST "https://api.codingplanx.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
      "model": "gpt-4o-audio-preview",
      "modalities": ["text", "audio"],
      "audio": { "voice": "alloy", "format": "wav" },
      "messages": [
        {
          "role": "user",
          "content": "Is a golden retriever a good family dog?"
        }
      ]
    }'

6. Response

HTTP Status Code: 200 OK (Success)
Content-Type: application/json

(Note: The response structure follows the standard OpenAI Chat Completions specification. When the request includes an audio configuration, the choices[0].message in the returned JSON will contain an audio object, which includes Base64-encoded audio data and related IDs.)

7. Frequently Asked Questions (FAQs)

Q1: Why does my request only return text and no audio? A1: Please check if the modalities parameter in the request body explicitly includes "audio" (i.e., ["text", "audio"]). If this parameter is not configured, the system defaults to outputting only "text". Additionally, ensure that the voice and format properties under the audio field are configured correctly.

Q2: How should I choose between the different audio formats? A2:

mp3: Most universally compatible, suitable for web and general applications with smaller file sizes.
wav: Lossless uncompressed format, offering the highest audio quality but the largest file size. Ideal for professional audio processing requiring high fidelity.
flac: Lossless compressed format. Smaller file size than wav with the exact same audio quality.
opus: Designed for internet streaming, offering extremely low latency and high compression. Perfect for real-time voice calls or low-bandwidth environments.
pcm16: Raw, uncompressed audio stream format. Suitable for low-level data transmission or secondary hardware-level development.

Q3: How should I format the Authorization request header? A3: You must use the Bearer authentication scheme. Place your API Key directly after Bearer (note the required trailing space), formatting it as: Authorization: Bearer sk-xxxxxxxxx.

Q4: What voice options are supported by this API? A4: Currently, it supports 10 preset voices: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, and shimmer. We recommend testing these voices in your specific business scenarios to select the tone that best aligns with your application's style.

Q5: What causes a 401 Unauthorized error? A5: This is typically caused by an invalid or expired API Key, or a formatting/spelling error in the Authorization header. Please check your console dashboard to confirm your API Key is active, and ensure your code includes the correct "Bearer " prefix and spacing.