Name		Name	Last commit message	Last commit date
parent directory ..
async_samples		async_samples
README.md		README.md
example_chat.json		example_chat.json
sample1.png		sample1.png
sample2.png		sample2.png
sample_chat_completions.py		sample_chat_completions.py
sample_chat_completions_azure_openai.py		sample_chat_completions_azure_openai.py
sample_chat_completions_from_input_bytes.py		sample_chat_completions_from_input_bytes.py
sample_chat_completions_from_input_json.py		sample_chat_completions_from_input_json.py
sample_chat_completions_from_input_json_with_image_url.py		sample_chat_completions_from_input_json_with_image_url.py
sample_chat_completions_streaming.py		sample_chat_completions_streaming.py
sample_chat_completions_streaming_with_entra_id_auth.py		sample_chat_completions_streaming_with_entra_id_auth.py
sample_chat_completions_with_history.py		sample_chat_completions_with_history.py
sample_chat_completions_with_image_data.py		sample_chat_completions_with_image_data.py
sample_chat_completions_with_image_url.py		sample_chat_completions_with_image_url.py
sample_chat_completions_with_model_extras.py		sample_chat_completions_with_model_extras.py
sample_chat_completions_with_tools.py		sample_chat_completions_with_tools.py
sample_embeddings.py		sample_embeddings.py
sample_embeddings_azure_openai.py		sample_embeddings_azure_openai.py
sample_get_model_info.py		sample_get_model_info.py
sample_image_embeddings.py		sample_image_embeddings.py
sample_load_client.py		sample_load_client.py

README.md

page_type

languages

products

urlFragment

sample

python

azure

azure-ai

model-inference-samples

Samples for Azure AI Inference client library for Python

These are runnable console Python scripts that show how to do chat completion and text embeddings against Serverless API endpoints and Managed Compute endpoints.

Samples with azure_openai in their name show how to do chat completions and text embeddings against Azure OpenAI endpoints.

Samples in this folder use the a synchronous clients. Samples in the subfolder async_samples use the asynchronous clients. The concepts are similar, you can easily modify any of the synchronous samples to asynchronous.

Prerequisites

See Prerequisites here.

Setup

Clone or download this sample repository
Open a command prompt / terminal window in this samples folder
Install the client library for Python with pip:
```
pip install azure-ai-inference
```
or update an existing installation:
```
pip install --upgrade azure-ai-inference
```
If you plan to run the asynchronous client samples, insall the additional package aiohttp:
```
pip install aiohttp
```

Set environment variables

To construct any of the clients, you will need to pass in the endpoint URL. If you are using key authentication, you also need to pass in the key associated with your deployed AI model.

For Serverless API and Managed Compute endpoints, the endpoint URL has the form https://your-unique-resouce-name.your-azure-region.inference.ai.azure.com, where your-unique-resource-name is your globally unique Azure resource name and your-azure-region is the Azure region where the model is deployed (e.g. eastus2).
For Azure OpenAI endpoints, the endpoint URL has the form https://your-unique-resouce-name.openai.azure.com/openai/deployments/your-deployment-name, where your-unique-resource-name is your globally unique Azure OpenAI resource name, and your-deployment-name is your AI Model deployment name.
The key is a 32-character string.

For convenience, and to promote the practice of not hard-coding secrets in your source code, all samples here assume the endpoint URL and key are stored in environment variables. You will need to set these environment variables before running the samples as-is. The environment variables are mentioned in the tables below.

Note that the client library does not directly read these environment variable at run time. The sample code reads the environment variables and constructs the relevant client with these values.

Serverless API and Managed Compute endpoints

Sample type	Endpoint environment variable name	Key environment variable name
Chat completions	`AZURE_AI_CHAT_ENDPOINT`	`AZURE_AI_CHAT_KEY`
Embeddings	`AZURE_AI_EMBEDDINGS_ENDPOINT`	`AZURE_AI_EMBEDDINGS_KEY`

To run against a Managed Compute Endpoint, some samples also have an optional environment variable AZURE_AI_CHAT_DEPLOYMENT_NAME. This is the value used to set the HTTP request header azureml-model-deployment when constructing the client.

Azure OpenAI endpoints

Sample type	Endpoint environment variable name	Key environment variable name
Chat completions	`AZURE_OPENAI_CHAT_ENDPOINT`	`AZURE_OPENAI_CHAT_KEY`
Embeddings	`AZURE_OPENAI_EMBEDDINGS_ENDPOINT`	`AZURE_OPENAI_EMBEDDINGS_KEY`

Running the samples

To run the first sample, type:

python sample_chat_completions.py

similarly for the other samples.

Synchronous client samples

Chat completions

File Name	Description
sample_chat_completions_streaming.py	One chat completion operation using a synchronous client and streaming response.
sample_chat_completions_streaming_with_entra_id_auth.py	One chat completion operation using a synchronous client and streaming response, using Entra ID authentication. This sample also shows setting the `azureml-model-deployment` HTTP request header, which may be required for some Managed Compute endpoint.
sample_chat_completions.py	One chat completion operation using a synchronous client.
sample_chat_completions_with_image_url.py	One chat completion operation using a synchronous client, which includes sending an input image URL.
sample_chat_completions_with_image_data.py	One chat completion operation using a synchronous client, which includes sending input image data read from a local file.
sample_chat_completions_with_history.py	Two chat completion operations using a synchronous client, with the second completion using chat history from the first.
sample_chat_completions_from_input_bytes.py	One chat completion operation using a synchronous client, with input messages provided as `IO[bytes]`.
sample_chat_completions_from_input_json.py	One chat completion operation using a synchronous client, with input messages provided as a dictionary (type `MutableMapping[str, Any]`)
sample_chat_completions_from_input_json_with_image_url.py	One chat completion operation using a synchronous client, with input messages provided as a dictionary (type `MutableMapping[str, Any]`). Includes sending an input image URL.
sample_chat_completions_with_tools.py	Shows how do use a tool (function) in chat completions, for an AI model that supports tools
sample_load_client.py	Shows how do use the function `load_client` to create the appropriate synchronous client based on the provided endpoint URL. In this example, it creates a synchronous `ChatCompletionsClient`.
sample_get_model_info.py	Get AI model information using the chat completions client. Similarly can be done with all other clients.
sample_chat_completions_with_model_extras.py	Chat completions with additional model-specific parameters.
sample_chat_completions_azure_openai.py	Chat completions against Azure OpenAI endpoint.

Text embeddings

File Name	Description
sample_embeddings.py	One embeddings operation using a synchronous client.
sample_embeddings_azure_openai.py	One embeddings operation using a synchronous client, against Azure OpenAI endpoint.

Asynchronous client samples

Chat completions

File Name	Description
sample_chat_completions_streaming_async.py	One chat completion operation using an asynchronous client and streaming response.
sample_chat_completions_async.py	One chat completion operation using an asynchronous client.
sample_load_client_async.py	Shows how do use the function `load_client` to create the appropriate asynchronous client based on the provided endpoint URL. In this example, it creates an asynchronous `ChatCompletionsClient`.
sample_chat_completions_from_input_bytes_async.py	One chat completion operation using an asynchronous client, with input messages provided as `IO[bytes]`.
sample_chat_completions_from_input_json_async.py	One chat completion operation using an asynchronous client, with input messages provided as a dictionary (type `MutableMapping[str, Any]`)
sample_chat_completions_streaming_azure_openai_async.py	One chat completion operation using an asynchronous client and streaming response against an Azure OpenAI endpoint

Text embeddings

File Name	Description
sample_embeddings_async.py	One embeddings operation using an asynchronous client.

Troubleshooting

See Troubleshooting here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

samples

samples

README.md

Samples for Azure AI Inference client library for Python

Prerequisites

Setup

Set environment variables

Serverless API and Managed Compute endpoints

Azure OpenAI endpoints

Running the samples

Synchronous client samples

Chat completions

Text embeddings

Asynchronous client samples

Chat completions

Text embeddings

Troubleshooting

Files

samples

Directory actions

More options

Directory actions

More options

Latest commit

History

samples

Folders and files

parent directory

README.md

Samples for Azure AI Inference client library for Python

Prerequisites

Setup

Set environment variables

Serverless API and Managed Compute endpoints

Azure OpenAI endpoints

Running the samples

Synchronous client samples

Chat completions

Text embeddings

Asynchronous client samples

Chat completions

Text embeddings

Troubleshooting