GPT-4o-Audio API Documentation

1. Overview

This endpoint is used to call the GPT-4o model with audio generation capabilities. It supports simultaneous text and audio interaction and generation within a single request.

2. Request Details

  • Protocol: HTTP / HTTPS
  • Method: POST
  • Endpoint:
POST https://api.codingplanx.ai/v1/chat/completions

3. Request Headers

ParameterTypeRequiredExampleDescription
Content-TypestringYesapplication/jsonSpecifies the data format of the request body.
AuthorizationstringYesBearer $OPENAI_API_KEYAuthentication credentials. Please replace with your actual API Key.

4. Request Body

The request body must be in application/json format.

ParameterTypeRequiredDescription
modelstringYesThe ID of the model to use. Example: gpt-4o-audio-preview.
modalitiesarray[string]NoThe output modalities generated by the model. Default: ["text"].<br>To request the model to generate both text and audio responses, use: ["text", "audio"].
audioobjectNoAudio output parameters. Required when "audio" is included in modalities.
? audio.voicestringYes*(Conditional)*The voice the model uses to respond. Available options: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer.
? audio.formatstringYes*(Conditional)*Specifies the output audio format. Must be one of: wav, mp3, flac, opus, or pcm16.
messagesarray[object]YesA list of messages comprising the conversation so far.
? messages[].rolestringYesThe role of the messages author (e.g., system, user, assistant).
? messages[].contentstringYesThe contents of the message.

5. Request Example (cURL)

curl -X POST "https://api.codingplanx.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
      "model": "gpt-4o-audio-preview",
      "modalities": ["text", "audio"],
      "audio": { "voice": "alloy", "format": "wav" },
      "messages": [
        {
          "role": "user",
          "content": "Is a golden retriever a good family dog?"
        }
      ]
    }'

6. Response

  • HTTP Status Code: 200 OK (Success)
  • Content-Type: application/json

(Note: The response structure follows the standard OpenAI Chat Completions specification. When the request includes an audio configuration, the choices[0].message in the returned JSON will contain an audio object, which includes Base64-encoded audio data and related IDs.)


7. Frequently Asked Questions (FAQs)

Q1: Why does my request only return text and no audio? A1: Please check if the modalities parameter in the request body explicitly includes "audio" (i.e., ["text", "audio"]). If this parameter is not configured, the system defaults to outputting only "text". Additionally, ensure that the voice and format properties under the audio field are configured correctly.

Q2: How should I choose between the different audio formats? A2:

  • mp3: Most universally compatible, suitable for web and general applications with smaller file sizes.
  • wav: Lossless uncompressed format, offering the highest audio quality but the largest file size. Ideal for professional audio processing requiring high fidelity.
  • flac: Lossless compressed format. Smaller file size than wav with the exact same audio quality.
  • opus: Designed for internet streaming, offering extremely low latency and high compression. Perfect for real-time voice calls or low-bandwidth environments.
  • pcm16: Raw, uncompressed audio stream format. Suitable for low-level data transmission or secondary hardware-level development.

Q3: How should I format the Authorization request header? A3: You must use the Bearer authentication scheme. Place your API Key directly after Bearer (note the required trailing space), formatting it as: Authorization: Bearer sk-xxxxxxxxx.

Q4: What voice options are supported by this API? A4: Currently, it supports 10 preset voices: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, and shimmer. We recommend testing these voices in your specific business scenarios to select the tone that best aligns with your application's style.

Q5: What causes a 401 Unauthorized error? A5: This is typically caused by an invalid or expired API Key, or a formatting/spelling error in the Authorization header. Please check your console dashboard to confirm your API Key is active, and ensure your code includes the correct "Bearer " prefix and spacing.