Public Preview of Azure OpenAI Audio models — Announcing the public preview of Azure OpenAI Audio models

We’re excited to announce the preview availability of Azure OpenAI’s advanced audio models—GPT-4o-Transcribe, GPT-4o-Mini-Transcribe, and GPT-4o-Mini-TTS. This guide provides developers with essential insights and steps to effectively leverage these advanced audio capabilities in their applications.

What’s New in Azure OpenAI Audio Models?

Azure OpenAI introduces three powerful new audio models, available for deployment today in East US2 on Azure AI Foundry.

GPT-4o-Transcribe and GPT-4o-Mini-Transcribe: Speech-to-text models outperforming previous benchmarks.
GPT-4o-Mini-TTS: A customizable text-to-speech model enabling detailed instructions on speech characteristics.

Model Comparison

Feature	GPT-4o-Transcribe	GPT-4o-Mini-Transcribe	GPT-4o-Mini-TTS
Performance	Best Quality	Great Quality	Best Quality
Speed	Fast	Fastest	Fastest
Input	Text, Audio	Text, Audio	Text
Output	Text	Text	Audio
Streaming	✅	✅	✅
Ideal Use Cases	Accurate transcription for challenging environments like customer call centers and automated meeting notes	Rapid transcription for live captioning, quick-response apps, and budget-sensitive scenarios	Customizable interactive voice outputs for chatbots, virtual assistants, accessibility tools, and educational apps

Technical Innovations

Targeted Audio Pretraining: OpenAI’s GPT-4o audio models leverage extensive pretraining on specialized audio datasets, significantly enhancing understanding of speech nuances.
Advanced Distillation Techniques: Employing sophisticated distillation methods, knowledge from larger models is transferred to efficient, smaller models, preserving high performance.
Reinforcement Learning: Integrated RL techniques dramatically improve transcription accuracy and reduce misrecognition, achieving state-of-the-art performance for the speech-to-text models in complex speech recognition tasks.

Getting Started Guide for Developers

Use the Azure OpenAI TTS Demo repository to explore GPT‑4o audio models through practical, hands‑on examples.

Get started

Step 1: Clone the Repository

git clone https://github.com/Azure-Samples/azure-openai-tts-demo.git cd azure-openai-tts-demo

Step 2: Configure Your Environment

Create your virtual environment and install dependencies:

python -m venv .venv source .venv/bin/activate # macOS/Linux .venv\Scripts\activate # Windows pip install -r requirements.txt

Set up your Azure credentials by creating a .env file:

cp .env.example .env # Edit .env with your Azure OpenAI endpoint and API key

Example .env:

AZURE_OPENAI_ENDPOINT="https://<your-resource-name>.openai.azure.com/" AZURE_OPENAI_API_KEY="your-azure-openai-api-key" AZURE_OPENAI_API_VERSION="2025-03-01-preview"

Step 3: Run the Interactive Gradio Soundboard

Launch the demo to experiment interactively:

python soundboard.py

Select different voices, vibes, and listen to generated speech.

Step 4: Explore Additional Sample Scripts

Run sample scripts for specific audio tasks:

Streaming audio to a file

python streaming-tts-to-file-sample.py

Asynchronous streaming and playback

python async-streaming-tts-sample.py

Developer Impact

Integrating Azure OpenAI advanced audio models allows developers to:

Easily incorporate advanced transcription and TTS functionality.
Create highly interactive, intuitive voice-driven applications.
Enhance user experience with customizable and expressive audio interactions.

Further Exploration

We encourage developers to leverage these innovative audio models and share their insights and feedback!

Get started with the new Advanced Audio Models in Azure OpenAI Service

What’s New in Azure OpenAI Audio Models?

Model Comparison

Technical Innovations

Getting Started Guide for Developers

Step 1: Clone the Repository

Step 2: Configure Your Environment

Step 3: Run the Interactive Gradio Soundboard

Step 4: Explore Additional Sample Scripts

Developer Impact

Further Exploration

Author

0 comments

Leave a commentCancel reply

Read next

Azure AI Foundry: Create an MCP Server with Azure AI Agent Service (TypeScript Edition)

Integrating Semantic Kernel Python with Google’s A2A Protocol

What’s New in Azure OpenAI Audio Models?

Model Comparison

Technical Innovations

Getting Started Guide for Developers

Step 1: Clone the Repository

Step 2: Configure Your Environment

Step 3: Run the Interactive Gradio Soundboard

Step 4: Explore Additional Sample Scripts

Developer Impact

Further Exploration

Author

0 comments

Leave a commentCancel reply

Read next

Azure AI Foundry: Create an MCP Server with Azure AI Agent Service (TypeScript Edition)

Integrating Semantic Kernel Python with Google’s A2A Protocol

Stay informed