April 16th, 2025

Get started with the new Advanced Audio Models in Azure OpenAI Service

Public Preview of Azure OpenAI Audio models
Announcing the public preview of Azure OpenAI Audio models

We’re excited to announce the preview availability of Azure OpenAI’s advanced audio models—GPT-4o-Transcribe, GPT-4o-Mini-Transcribe, and GPT-4o-Mini-TTS. This guide provides developers with essential insights and steps to effectively leverage these advanced audio capabilities in their applications.

What’s New in Azure OpenAI Audio Models?

Azure OpenAI introduces three powerful new audio models, available for deployment today in East US2 on Azure AI Foundry.

  • GPT-4o-Transcribe and GPT-4o-Mini-Transcribe: Speech-to-text models outperforming previous benchmarks.
  • GPT-4o-Mini-TTS: A customizable text-to-speech model enabling detailed instructions on speech characteristics.

Model Comparison

FeatureGPT-4o-TranscribeGPT-4o-Mini-TranscribeGPT-4o-Mini-TTS
PerformanceBest QualityGreat QualityBest Quality
SpeedFastFastestFastest
InputText, AudioText, AudioText
OutputTextTextAudio
Streaming
Ideal Use CasesAccurate transcription for challenging environments like customer call centers and automated meeting notesRapid transcription for live captioning, quick-response apps, and budget-sensitive scenariosCustomizable interactive voice outputs for chatbots, virtual assistants, accessibility tools, and educational apps

Technical Innovations

  • Targeted Audio Pretraining: OpenAI’s GPT-4o audio models leverage extensive pretraining on specialized audio datasets, significantly enhancing understanding of speech nuances.
  • Advanced Distillation Techniques: Employing sophisticated distillation methods, knowledge from larger models is transferred to efficient, smaller models, preserving high performance.
  • Reinforcement Learning: Integrated RL techniques dramatically improve transcription accuracy and reduce misrecognition, achieving state-of-the-art performance for the speech-to-text models in complex speech recognition tasks.

Getting Started Guide for Developers

Use the Azure OpenAI TTS Demo repository to explore GPT‑4o audio models through practical, hands‑on examples.

Step 1: Clone the Repository

git clone https://github.com/Azure-Samples/azure-openai-tts-demo.git cd azure-openai-tts-demo 

Step 2: Configure Your Environment

Create your virtual environment and install dependencies:

python -m venv .venv source .venv/bin/activate # macOS/Linux .venv\Scripts\activate # Windows pip install -r requirements.txt 

Set up your Azure credentials by creating a .env file:

cp .env.example .env # Edit .env with your Azure OpenAI endpoint and API key 

Example .env:

AZURE_OPENAI_ENDPOINT="https://<your-resource-name>.openai.azure.com/" AZURE_OPENAI_API_KEY="your-azure-openai-api-key" AZURE_OPENAI_API_VERSION="2025-03-01-preview" 

Step 3: Run the Interactive Gradio Soundboard

Launch the demo to experiment interactively:

python soundboard.py 

Select different voices, vibes, and listen to generated speech.

Step 4: Explore Additional Sample Scripts

Run sample scripts for specific audio tasks:

  • Streaming audio to a file
python streaming-tts-to-file-sample.py 
  • Asynchronous streaming and playback
python async-streaming-tts-sample.py 

Developer Impact

Integrating Azure OpenAI advanced audio models allows developers to:

  • Easily incorporate advanced transcription and TTS functionality.
  • Create highly interactive, intuitive voice-driven applications.
  • Enhance user experience with customizable and expressive audio interactions.

Further Exploration

We encourage developers to leverage these innovative audio models and share their insights and feedback!

Author

Nick Brady
Senior Program Manager, Developer Strategy

Since 2018, Nick Brady has been the unofficial spokesperson for Azure AI having delivered hundreds of engagements to customers and partners in his time at Microsoft. Nick leads developer strategy for Azure AI Foundry, focusing on content, code, and community to empower every developer to build, customize, and scale AI applications.

0 comments