Edit

Share via


Use GitHub Actions with Azure Machine Learning

APPLIES TO:Azure CLI ml extension v2 (current)Python SDK azure-ai-ml v2 (current)

Get started with GitHub Actions to train a model on Azure Machine Learning.

This article teaches you how to create a GitHub Actions workflow that builds and deploys a machine learning model to Azure Machine Learning. You train a scikit-learn linear regression model on the NYC Taxi dataset.

GitHub Actions use a workflow YAML (.yml) file in the /.github/workflows/ path in your repository. This definition contains the various steps and parameters that make up the workflow.

Prerequisites

  • An Azure Machine Learning workspace. For steps for creating a workspace, see Create the workspace.

  • The Azure Machine Learning SDK for Python v2. To install the SDK, use the following command:

    pip install azure-ai-ml azure-identity 

    To update an existing installation of the SDK to the latest version, use the following command:

    pip install --upgrade azure-ai-ml azure-identity 

    For more information, see Azure Machine Learning Package client library for Python.

  • A GitHub account. If you don't have one, sign up for free.

Step 1: Get the code

Fork the following repo at GitHub:

https://github.com/azure/azureml-examples 

Clone your forked repo locally.

git clone https://github.com/YOUR-USERNAME/azureml-examples 

Step 2: Authenticate with Azure

You'll need to first define how to authenticate with Azure. The recommended, more secure option is to sign in with OpenID Connect using a Microsoft Entra application or a user-assigned managed identity. If necessary, you can also use sign in with a service principal and secret. This approach is less secure and not recommended.

Generate deployment credentials

To use Azure Login action with OIDC, you need to configure a federated identity credential on a Microsoft Entra application or a user-assigned managed identity.

Option 1: Microsoft Entra application

Option 2: User-assigned managed identity

Create secrets

You need to provide your application's Client ID, Directory (tenant) ID, and Subscription ID to the login action. These values can either be provided directly in the workflow or can be stored in GitHub secrets and referenced in your workflow. Saving the values as GitHub secrets is the more secure option.

  1. In GitHub, go to your repository.

  2. Select Security > Secrets and variables > Actions.

    Screenshot of adding a secret

  3. Select New repository secret.

    Note

    To enhance workflow security in public repositories, use environment secrets instead of repository secrets. If the environment requires approval, a job cannot access environment secrets until one of the required reviewers approves it.

  4. Create secrets for AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_SUBSCRIPTION_ID. Copy these values from your Microsoft Entra application or user-assigned managed identity for your GitHub secrets:

    GitHub secretMicrosoft Entra application or user-assigned managed identity
    AZURE_CLIENT_IDClient ID
    AZURE_SUBSCRIPTION_IDSubscription ID
    AZURE_TENANT_IDDirectory (tenant) ID

    Note

    For security reasons, we recommend using GitHub Secrets rather than passing values directly to the workflow.

Step 3: Update setup.sh to connect to your Azure Machine Learning workspace

You need to update the CLI setup file variables to match your workspace.

  1. In your forked repository, go to azureml-examples/cli/.

  2. Edit setup.sh and update these variables in the file.

    VariableDescription
    GROUPName of resource group
    LOCATIONLocation of your workspace (example: eastus2)
    WORKSPACEName of Azure Machine Learning workspace

Step 4: Update pipeline.yml with your compute cluster name

You use a pipeline.yml file to deploy your Azure Machine Learning pipeline. The pipeline is a machine learning pipeline and not a DevOps pipeline. You only need to make this update if you're using a name other than cpu-cluster for your computer cluster name.

  1. In your forked repository, go to azureml-examples/cli/jobs/pipelines/nyc-taxi/pipeline.yml.
  2. Each time you see compute: azureml:cpu-cluster, update the value of cpu-cluster with your compute cluster name. For example, if your cluster is named my-cluster, your new value would be azureml:my-cluster. There are five updates.

Step 5: Run your GitHub Actions workflow

Your workflow authenticates with Azure, sets up the Azure Machine Learning CLI, and uses the CLI to train a model in Azure Machine Learning.

Your workflow file is made up of a trigger section and jobs:

  • A trigger starts the workflow in the on section. The workflow runs by default on a cron schedule and when a pull request is made from matching branches and paths. Learn more about events that trigger workflows.
  • In the jobs section of the workflow, you checkout code and log into Azure with the Azure login action using OpenID Connect.
  • The jobs section also includes a setup action that installs and sets up the Machine Learning CLI (v2). Once the CLI is installed, the run job action runs your Azure Machine Learning pipeline.yml file to train a model with NYC taxi data.

Enable your workflow

  1. In your forked repository, open .github/workflows/cli-jobs-pipelines-nyc-taxi-pipeline.yml and verify that your workflow looks like this.

    name: cli-jobs-pipelines-nyc-taxi-pipeline on: workflow_dispatch: schedule: - cron: "0 0/4 * * *" pull_request: branches: - main - sdk-preview paths: - cli/jobs/pipelines/nyc-taxi/** - .github/workflows/cli-jobs-pipelines-nyc-taxi-pipeline.yml - cli/run-pipeline-jobs.sh - cli/setup.sh jobs: build: runs-on: ubuntu-latest steps: - name: check out repo uses: actions/checkout@v2 - name: azure login uses: azure/login@v2 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: setup run: bash setup.sh working-directory: cli continue-on-error: true - name: run job run: bash -x ../../../run-job.sh pipeline.yml working-directory: cli/jobs/pipelines/nyc-taxi 
  2. Select View runs.

  3. Enable workflows by selecting I understand my workflows, go ahead and enable them.

  4. Select the cli-jobs-pipelines-nyc-taxi-pipeline workflow and choose to Enable workflow.

    Screenshot of enable GitHub Actions workflow.

  5. Select Run workflow and choose the option to Run workflow now.

    Screenshot of run GitHub Actions workflow.

Step 6: Verify your workflow run

  1. Open your completed workflow run and verify that the build job ran successfully. You see a green checkmark next to the job.

  2. Open Azure Machine Learning studio and navigate to the nyc-taxi-pipeline-example. Verify that each part of your job (prep, transform, train, predict, score) completed and that you see a green checkmark.

    Screenshot of successful Machine Learning Studio run.

Clean up resources

When your resource group and repository are no longer needed, clean up the resources you deployed by deleting the resource group and your GitHub repository.

Next steps