Skip to content

Latest commit

 

History

History
118 lines (71 loc) · 11 KB

iot-overview-scalability-high-availability.md

File metadata and controls

118 lines (71 loc) · 11 KB
titledescriptionms.serviceservicesauthorms.authorms.topicms.date
IoT solution scalability and high availability
An overview of the scalability, high availability, and disaster recovery options for an IoT solution.
azure-iot
iot
asergaz
sergaz
overview
03/13/2025

IoT solution scalability, high availability, and disaster recovery

This overview introduces the key concepts around the options for scalability, high availability, and disaster recovery in an Azure IoT solution. Each section includes links to content that provides further detail and guidance.

The following diagram shows a high-level view of the components in a typical edge-based IoT solution. This article focuses on the areas relevant to scalability, high availability, and disaster recovery in an edge-based IoT solution:

:::image type="content" source="media/iot-overview-scalability-high-availability/iot-edge-scalability-architecture.svg" alt-text="Diagram that shows the high-level IoT edge-based solution architecture highlighting scalability, high availability, and disaster recovery." border="false":::

The following diagram shows a high-level view of the components in a typical cloud-based IoT solution. This article focuses on the areas relevant to scalability, high availability, and disaster recovery in a cloud-based IoT solution:

:::image type="content" source="media/iot-overview-scalability-high-availability/iot-cloud-scalability-architecture.svg" alt-text="Diagram that shows the high-level IoT cloud-based solution architecture highlighting scalability, high availability, and disaster recovery." border="false":::


Scalability

An IoT solution might need to support millions of connected assets and devices. You need to ensure that the components in your solution can scale to meet the demands.

Deploy Azure IoT Operations on a multi-node cluster to ensure that you can handle increased traffic or workload demands. When Azure IoT Operations runs on a multi-node cluster, it can process more data and take advantage of the scalability and high-availability capabilities of Kubernetes.

You can horizontally scale the MQTT broker of Azure IoT Operations by adding more frontend replicas and backend partitions. The frontend replicas are responsible for accepting MQTT connections from clients and forwarding them to the backend partitions. The backend partitions are responsible for storing and delivering messages to the clients. The frontend pods distribute message traffic across the backend pods. The backend redundancy factor determines the number of data copies to provide resiliency against node failures in the cluster. To learn more, see Configure broker settings for high availability, scaling, and memory usage.

Azure Device Registry is a backend service that enables the cloud and edge management of assets. Device Registry projects assets defined in your edge environment as Azure resources in the cloud. It provides a single unified registry so that all apps and services that interact with your assets can connect to a single source. Device Registry also manages the synchronization between assets in the cloud and assets as custom resources in Kubernetes on the edge, allowing you to scale your solution to millions of connected assets.

You can scale the data flow profile to adjust the number of instances that run the data flows. Increasing the instance count can improve the throughput of the data flows by creating multiple clients to process the data. When using data flows with cloud services that have rate limits per client, increasing the instance count can help you stay within the rate limits. Scaling can also improve the resiliency of the data flows by providing redundancy in case of failures. To learn more, see Scaling data flow profiles.

Use the Device Provisioning Service (DPS) to provision devices at scale. DPS is a helper service for IoT Hub and IoT Central that enables zero-touch device provisioning at scale. To learn more, see Best practices for large-scale IoT device deployments.

Use the Device Update for IoT Hub helper service to manage over-the-air updates to your devices at scale.

You can scale the IoT Hub service vertically and horizontally. For an automated approach, see the IoT Hub autoscaler sample. Use IoT Hub routing to handle scaling out the services that IoT Hub delivers messages to. To learn more, see IoT Hub message routing.

For a guide to scalability in an IoT Central solution, see IoT Central scalability. If you're using private endpoints with your IoT Central solution, you need to plan the size of the subnet in your virtual network.

For devices that connect to an IoT hub directly or to an IoT hub in an IoT Central application, make sure that the devices continue to connect as your solution scales. To learn more, see Manage device reconnections after autoscale and Handle connection failures.

IoT Edge can help scale your solution. IoT Edge lets you move cloud analytics and custom business logic from the cloud to your devices. This approach lets your cloud solution focus on business insights instead of data management. Scale out your IoT solution by packaging your business logic into standard containers, deploy those containers to your devices, and monitor them from the cloud. For more information, see Azure IoT Edge.

Service tiers and pricing plans:

Service limits and quotas:


High availability and disaster recovery

IoT solutions are often business-critical. You need to ensure that your solution can continue to operate if a failure occurs. You also need to ensure that you can recover your solution following a disaster.

Azure IoT Operations features an MQTT broker that's enterprise grade and compliant with standards. The MQTT broker is scalable, highly available, and Kubernetes-native. It provides the messaging plane for IoT Operations, enables bidirectional edge/cloud communication, and powers event-driven applications at the edge. To ensure zero data loss and high availability during deployment upgrades, the MQTT broker implements rolling updates across the MQTT broker pods.

The state store is a distributed storage system, deployed as part of Azure IoT Operations. Using the state store, applications can get, set, and delete key-value pairs, without needing to install more services, such as Redis. The state store also provides versioning of the data, and also the primitives for building distributed locks, ideal for highly available applications. To learn more, see Persisting data in the state store.

On multi-node clusters with at least three nodes, you have the option of enabling fault tolerance for storage with Azure Container Storage enabled by Azure Arc when you deploy Azure IoT Operations.

Dapr is offered as part of MQTT broker, abstracting away details of MQTT session management, message QoS and acknowledgment, and built-in key-value stores, making it a practical choice for developing a highly available application.

The Azure IoT Operations SDKs (preview) are a suite of tools and libraries across multiple languages designed to aid the development of highly available applications for Azure IoT Operations.

For information on high availability across availability zones and regions for Azure Device Registry, see Reliability in Azure Device Registry.

To learn more about the high availability and disaster recovery capabilities of the cloud-based IoT services in your solution, see the following articles:

The following tutorials and guides provide more detail and guidance:


Related content

close