Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
page_typelanguagesproductsurlFragment
sample
python
azure
azure-ai
model-inference-samples

Samples for Azure AI Inference client library for Python

These are runnable console Python scripts that show how to do chat completion and text embeddings against Serverless API endpoints and Managed Compute endpoints.

Samples with azure_openai in their name show how to do chat completions and text embeddings against Azure OpenAI endpoints.

Samples in this folder use the a synchronous clients. Samples in the subfolder async_samples use the asynchronous clients. The concepts are similar, you can easily modify any of the synchronous samples to asynchronous.

Prerequisites

See Prerequisites here.

Setup

  • Clone or download this sample repository
  • Open a command prompt / terminal window in this samples folder
  • Install the client library for Python with pip:
    pip install azure-ai-inference
    or update an existing installation:
    pip install --upgrade azure-ai-inference
  • If you plan to run the asynchronous client samples, insall the additional package aiohttp:
    pip install aiohttp

Set environment variables

To construct any of the clients, you will need to pass in the endpoint URL. If you are using key authentication, you also need to pass in the key associated with your deployed AI model.

  • For Serverless API and Managed Compute endpoints, the endpoint URL has the form https://your-unique-resouce-name.your-azure-region.inference.ai.azure.com, where your-unique-resource-name is your globally unique Azure resource name and your-azure-region is the Azure region where the model is deployed (e.g. eastus2).

  • For Azure OpenAI endpoints, the endpoint URL has the form https://your-unique-resouce-name.openai.azure.com/openai/deployments/your-deployment-name, where your-unique-resource-name is your globally unique Azure OpenAI resource name, and your-deployment-name is your AI Model deployment name.

  • The key is a 32-character string.

For convenience, and to promote the practice of not hard-coding secrets in your source code, all samples here assume the endpoint URL and key are stored in environment variables. You will need to set these environment variables before running the samples as-is. The environment variables are mentioned in the tables below.

Note that the client library does not directly read these environment variable at run time. The sample code reads the environment variables and constructs the relevant client with these values.

Serverless API and Managed Compute endpoints

Sample typeEndpoint environment variable nameKey environment variable name
Chat completionsAZURE_AI_CHAT_ENDPOINTAZURE_AI_CHAT_KEY
EmbeddingsAZURE_AI_EMBEDDINGS_ENDPOINTAZURE_AI_EMBEDDINGS_KEY

To run against a Managed Compute Endpoint, some samples also have an optional environment variable AZURE_AI_CHAT_DEPLOYMENT_NAME. This is the value used to set the HTTP request header azureml-model-deployment when constructing the client.

Azure OpenAI endpoints

Sample typeEndpoint environment variable nameKey environment variable name
Chat completionsAZURE_OPENAI_CHAT_ENDPOINTAZURE_OPENAI_CHAT_KEY
EmbeddingsAZURE_OPENAI_EMBEDDINGS_ENDPOINTAZURE_OPENAI_EMBEDDINGS_KEY

Running the samples

To run the first sample, type:

python sample_chat_completions.py

similarly for the other samples.

Synchronous client samples

Chat completions

File NameDescription
sample_chat_completions_streaming.pyOne chat completion operation using a synchronous client and streaming response.
sample_chat_completions_streaming_with_entra_id_auth.pyOne chat completion operation using a synchronous client and streaming response, using Entra ID authentication. This sample also shows setting the azureml-model-deployment HTTP request header, which may be required for some Managed Compute endpoint.
sample_chat_completions.pyOne chat completion operation using a synchronous client.
sample_chat_completions_with_image_url.pyOne chat completion operation using a synchronous client, which includes sending an input image URL.
sample_chat_completions_with_image_data.pyOne chat completion operation using a synchronous client, which includes sending input image data read from a local file.
sample_chat_completions_with_history.pyTwo chat completion operations using a synchronous client, with the second completion using chat history from the first.
sample_chat_completions_from_input_bytes.pyOne chat completion operation using a synchronous client, with input messages provided as IO[bytes].
sample_chat_completions_from_input_json.pyOne chat completion operation using a synchronous client, with input messages provided as a dictionary (type MutableMapping[str, Any])
sample_chat_completions_from_input_json_with_image_url.pyOne chat completion operation using a synchronous client, with input messages provided as a dictionary (type MutableMapping[str, Any]). Includes sending an input image URL.
sample_chat_completions_with_tools.pyShows how do use a tool (function) in chat completions, for an AI model that supports tools
sample_load_client.pyShows how do use the function load_client to create the appropriate synchronous client based on the provided endpoint URL. In this example, it creates a synchronous ChatCompletionsClient.
sample_get_model_info.pyGet AI model information using the chat completions client. Similarly can be done with all other clients.
sample_chat_completions_with_model_extras.pyChat completions with additional model-specific parameters.
sample_chat_completions_azure_openai.pyChat completions against Azure OpenAI endpoint.

Text embeddings

File NameDescription
sample_embeddings.pyOne embeddings operation using a synchronous client.
sample_embeddings_azure_openai.pyOne embeddings operation using a synchronous client, against Azure OpenAI endpoint.

Asynchronous client samples

Chat completions

File NameDescription
sample_chat_completions_streaming_async.pyOne chat completion operation using an asynchronous client and streaming response.
sample_chat_completions_async.pyOne chat completion operation using an asynchronous client.
sample_load_client_async.pyShows how do use the function load_client to create the appropriate asynchronous client based on the provided endpoint URL. In this example, it creates an asynchronous ChatCompletionsClient.
sample_chat_completions_from_input_bytes_async.pyOne chat completion operation using an asynchronous client, with input messages provided as IO[bytes].
sample_chat_completions_from_input_json_async.pyOne chat completion operation using an asynchronous client, with input messages provided as a dictionary (type MutableMapping[str, Any])
sample_chat_completions_streaming_azure_openai_async.pyOne chat completion operation using an asynchronous client and streaming response against an Azure OpenAI endpoint

Text embeddings

File NameDescription
sample_embeddings_async.pyOne embeddings operation using an asynchronous client.

Troubleshooting

See Troubleshooting here.

close