laitimes

Build vision AI applications at the edge with NVIDIA Metropolis microservices and APIs

author:NVIDIA China

NVIDIA Metropolis microservices provide powerful and customizable, cloud-native APIs and microservices for developing vision AI applications and solutions. The framework, which now includes NVIDIA Jetson, enables developers to quickly build and productize powerful, mature vision AI applications at the edge.

APIs enable seamless communication and integration between different applications and services, increasing flexibility, interoperability, and efficiency in software development. Video streaming and AI-based insights and analytics are two common capabilities for building video analytics applications.

In this article, we'll look at the API workflow for building vision AI apps and integrating them into any client app. We'll cover three key steps to building an app:

  • Stream video from the edge to any device using WebRTC using APIs.
  • Use tripwires to generate insights and alerts on people/object movement via API access.
  • Use a reference cloud for secure remote device API access.

Modular architecture

NVIDIA Metropolis microservices for Jetson provide a modular architecture with a large amount of software, including customizable and reusable microservices for building vision AI applications. The suite also provides platform services for infrastructure functions and reference clouds. Various microservices include the Video Storage Tool Suite (VST), an AI perception service based on NVIDIA DeepStream, and an analytics service. Each service provides APIs for configuring and accessing the functionality of the microservices.

These APIs are presented outside the system through the Ingress platform service, based on the standard pattern used in cloud-native architectures, using a single gateway to expose APIs within the system. The client application implements the microservice functionality by calling the corresponding API through the Ingress service. In addition, NVIDIA Metropolis microservices provide an IoT cloud module that enables clients to authenticate and authorize when accessing these APIs remotely.

Build vision AI applications at the edge with NVIDIA Metropolis microservices and APIs

Figure 1. Cloud-native NVIDIA Metropolis microservices for Jetson

Stream video via WebRTC

Video analytics systems often require a client application, such as a mobile app or browser, to view the video stream from a camera connected to the system. This feature is supported by a standardized call flow based on the VST API. VST microservices support remote streaming using the WebRTC (Real-Time Communication over Network) protocol, which is designed to reliably transmit video and other data from point to point over the Internet.

This section provides an overview of the basic concepts of the WebRTC protocol and the use of the VST API for WebRTC streaming. WebRTC is a powerful, open-source project that enables direct real-time communication between two peer-to-peer devices, such as a web browser and a Jetson device running VST.

The entity used for WebRTC streaming

A typical WebRTC session involves a few different entities:

User agent: A phone, browser, or web app that initiates communication using the VST API.

Signaling server: A network server implemented within a VST that establishes a session communication channel for WebRTC.

ICE (Establishing Interactive Connection) Server: A logical module in the VST-WebRTC stack that determines the best connection path between peer devices, necessary to traverse firewalls and NAT (Network Address Translators).

STUN (Session Traversal Utility for NAT) Server: An ICE server that helps discover public IP addresses and ports, which is essential when a peer device uses a private (NAT-based) IP address. The server is a third-party entity hosted on a public cloud network.

TURN (Traversal with Relay Around NAT) Server: Acts as a relay when point-to-point direct communication fails, and is only needed when the peer device is on a different network. Support is available through third-party services such as Twilio.

Build vision AI applications at the edge with NVIDIA Metropolis microservices and APIs

Figure 2. The entity used for WebRTC streaming

WebRTC session stage

WebRTC sessions use control paths and data paths to create sessions and stream.

The control path establishes and manages sessions between peer devices, with phases including initialization, signaling, ICE candidate switching, and connection establishment. VST enables user agents to perform these operations remotely through their APIs.

The data path supports real-time media data transfer as well as adaptive and quality control, and ultimately closes the connection.

WebRTC streaming via the VST API

Figure 3 shows the call flow between the client and the VST, capturing the control and data path that implements the WebRTC session.

Build vision AI applications at the edge with NVIDIA Metropolis microservices and APIs

Figure 3. WebRTC call flow using VST

The call stream first discovers various video streams from the client using the api/v1/sensor/list API.

The control and data paths are implemented based on the following call flows:

  • The client calls GET api/v1/live/iceServers or api/v1/replay/iceServers to get the list of ICE servers from the VST.
  • The client creates a local offer and sends the offer to the VST using POST api/v1/live/stream/start or api/v1/replay/stream/start.
  • VST creates an answer for the client and returns it as a response.
  • The client completes the ICE exchange of api/v1/live/iceCandidate or api/v1/replay/iceCandidate using GET and POST requests, with the peerid as the query parameter.
  • Once the peering connection is complete, the video data begins to flow.

Once streaming begins, clients can control streaming using the following streaming APIs:

  • Pause the video pipeline: api/v1/replay/stream/pause
  • Restore the video pipeline: api/v1/replay/stream/resume
  • Find a specific time in the video: api /v1/replay/stream/seek

Build client apps

You can use these concepts to add video streaming capabilities to browser-based web applications by using HTTP to call the VST API in JavaScript and taking advantage of WebRTC support in JavaScript that is supported by most browsers. A similar concept can be used to build native WebRTC client applications.

If you're using JavaScript to set up WebRTC streaming, follow these steps:

Initialize the peering connection

Create a new RTCPeerConnection object that is properly set.

Handling channel additions

  • Set up an event listener for ontrack events.
  • When a new channel is added, the remote video element is updated to display the incoming video stream.

Generate an offer

  • Use the createOffer method to generate a peering offer.
  • Set the local description of the peering connection as the resulting offer.

Send the offer to VST

  • Use peerConnection.localDescription to get a local description (offer).
  • Send an offer to VST using the appropriate launch API (e.g. api/v1/live/stream/start).

Receive a reply from VST

Once you receive the reply SDP from VST as a startup API response, use peerConnection.setRemoteDescription to set it to the remote description.

Processing candidate ICE

  • GET and POST requests using the api/v1/live/iceCandidate API to exchange candidate ICEs.
  • Use peerConnection.addIceCandidate to add the received candidate ICE to the peering connection.

Generate spatial insights and alerts for object movement

The Analytics microservice supports three people or object analytics modules:

  • Field of View (FOV): Counts people or objects within the camera's field of view.
  • Tripwires: Detects people or objects that cross a user-defined tripwire segment.
  • Region of Interest (ROI): Counts people or objects within a defined area of interest.

Combined, these modules form a powerful set of tools for understanding the movement of people or objects in physical space, with use cases ranging from retail warehouses to security and safety. The client app uses the API to identify a list of sensors, create tripwires, and retrieve counts and alerts for each feature.

This section will describe these operations of tripwires with an end-to-end example. A similar approach can be used to achieve FOV and ROI. For each case, call the HTTP API using the programming language or HTTP client of your choice.

Retrieve the list of sensors

The first step is to retrieve the name of the sensor for which the tripwire needs to be configured.

Call the VST API to list all sensors. Identify the sensor of interest from the returned list, and the name attribute of the sensor object will be used as the sensor ID in the subsequent steps to configure and retrieve tripwire counts and alarms. will <device-ip>be replaced with your device's IP address.

http://:30080/vst/api/v1/sensor/list

Create a tripwire configuration

In this step, you configure a tripwire that specifies the line on which you want to count the number of people crossing the line.

When configuring tripwires, specify the following properties:

  • Sensor ID: The ID of the sensor to be tripped.
  • Tripwire ID: The identification of the tripwire. A sensor may have multiple definitions, but each tripwire needs to have a unique identifier.
  • Traverse: A sequence of points from the segments that make up the tripwire.
  • Direction: A vector (two points) describing the direction of traversal (entry/exit).

Note that the coordinates of the point are the camera coordinates (image plane). The upper left corner is (0,0).

Client apps, such as the reference mobile app provided by NVIDIA Metropolis microservices, provide visual aids in the selection of points, eliminating the need to manually determine (x, y) locations. Figure 4 shows an example of creating and rendering tripwires through a mobile app. The user uses the touch interface in the app to select the tripwire anchor point and draw the tripwire (green line) and direction (red arrow).

Build vision AI applications at the edge with NVIDIA Metropolis microservices and APIs

Figure 4. Reference tripwire visual rendering in a mobile app provided by NVIDIA Metropolis microservices

To configure a tripwire with Id = main_door for a sensor with ID = Amcrest_3, call the following HTTP API using the programming language of your choice:

http://:30080/emdx/api/config/tripwire?sensorId=Amcrest_3

{
  "deleteIfPresent": false,
  "tripwires": [
    {
        "direction": {
          "entry": {
            "name": "Inside the room"
          },
          "exit": {
            "name": "Outside of the room"
          },
          "p1": { "x": 753, "y": 744},
          "p2": { "x": 448, "y": 856}
        },
        "id": "main_door",
        "name": "Main door",
        "wire": [
          { "x": 321, "y": 664 },
          { "x": 544, "y": 648 },
          { "x": 656, "y": 953 },
          { "x": 323, "y": 1067}
        ]
      }
  ],
  "sensorId": "Amcrest_3"
}           

Configuring Tripwire Alert Rules (Optional)

Configuring alert rules for a given tripwire is optional. An alert rule is a specific condition that needs to be met to generate an alert event.

To configure an alert rule that alerts whenever a person crosses a tripwire (main door) from the entrance direction, call the following API request:

http://:30080/emdx/api/config/rule/alerts/tripwire

{
  "sensorId": "Amcrest_3",
  "rules": [
    {
      "rule_id": "cd2218f6-e4d2-4ad4-9b15-3396e4336064",
      "id": "main_door",
      "type": "tripwire",
      "rule_type": "increment",
      "time_interval": 1,
      "count_threshold": 1,
      "direction": "entry"
    }
  ]
}           

Retrieve tripwire counts and alerts

This step will explain how to retrieve the number of people who crossed the line as previously defined. You also have the option to retrieve alerts that are generated based on the alert rules configured for that tripwire.

Counts can be queried for a specific tripwire (sensorId, tripwireId), time range (fromTimestamp, toTimestamp), and specified time window (fixedInterval). You can also choose to set the alert query parameter to true to retrieve alerts and counts:

http://:30080/emdx/api/metrics/tripwire/histogram?sensorId=Amcrest_3&tripwireId=main_door&fromTimestamp=2020-10-30T20:00:00.000Z&toTimestamp=2020-10-30T20:01:00.000Z&fixedInterval=1000&alerts=true

{
    "alerts": [
     {
            "count": 1,
            "description": "1 people entered tripwire",
            "duration": 1.000,
            "startTimestamp": "2020-10-30T20:00:59.000Z",
            "endTimestamp": "2020-10-30T20:01:00.000Z",
            "id": "unique-alert-id",
            "rule_type": "increment",
            "rule_id": "cd2218f6-e4d2-4ad4-9b15-3396e4336064",
            "sensorId": "Amcrest_3",
            "type": "tripwire",
            "direction": "entry",
            "directionName": "Inside the room", 
            "attributes": [..],
        }
     ],
    "counts": [
      {
        "agg_window": "1 sec",
        "histogram": [
          {
            "end": "2020-10-30T20:00:01.000Z",
            "start": "2020-10-30T20:00:00.000Z",
            "sum_count": 1
          }
        ],
        "attributes": [...],
        "sensorId": "Amcrest_3",
        "type": "exit"
      },
      {
        "agg_window": "1 sec",
        "histogram": [
          {
            "end": "2020-10-30T20:00:01.000Z",
            "start": "2020-10-30T20:00:00.000Z",
            "sum_count": 0
          },
          …..
        ],
        "attributes": [.. ],
        "sensorId": "Amcrest_3",
        "type": "entry"
      }
    ]
  }           

The histogram is returned separately for each direction. The entire time range is divided into time windows of fixedInterval. The intersection of each time window start,end is reported as a sum_count.

Retrieve tripwire alerts

To retrieve all alarms for a given sensor, call the following APIs:

http://:30080/emdx/api/alerts?sensorId=Amcrest_3&fromTimestamp=2020-10-30T20:00:00.000Z&toTimestamp=2020-10-31T01:00:00.000Z

Secure remote cloud API access

The API enables clients to remotely access device configurations and features using the HTTP protocol. During the development phase, it is recommended to call the API by pointing an HTTP request to the device IP address. But in production scenarios, the client is often unaware of the device's IP address.

In addition, Jetson devices may be behind firewalls, rendering them inaccessible, or they may use NAT-based IP addresses that may not be valid externally. IoT clouds facilitate remote API calls at the product level by providing a mechanism to forward requests from network-separated clients to devices in a secure manner.

In this section, we'll describe the mechanism by which the client obtains secure tokens and uses those tokens to generate HTTP over the cloud and forward it to the appropriate device.

While the focus of this section is to show how clients can call device APIs through the cloud, note that the cloud architecture provides a secure "device claim" mechanism for accessing a specific device through cloud authorization. All access to user devices via the cloud is authenticated and authorized, and users can only access devices they have previously requested.

The feature is highly customizable and integrates seamlessly with the existing security frameworks and cloud backend infrastructure of ODM and OEM operators.

A workflow that calls device APIs through the IoT cloud

The reference IoT cloud implementation uses Amazon Cognito as the Identity Provider (IdP), but users can also use any third-party identity provider. To access the device APIs through a cloud pointpoint, use the authentication and authorization call flow outlined below.

Use Amazon Cognito for authentication

Log in to the URI login page using the web console and authenticate with Amazon Cognito. After successful validation, Amazon Cognito returns a unique authorization code. Request from Amazon Cognito with an authorization code to issue a time-limited ID token. Present this ID token when calling the IoT Cloud Security API.

Build vision AI applications at the edge with NVIDIA Metropolis microservices and APIs

Figure 5. Call flows that use IDP for authorization

Generate a JWT token and call the device API

To access the IoT Device API, you must first request an authorization token from the IoT cloud security system. After the application is valid, IoT Cloud Security will issue a shortly signed authorization JWT token. The token is then used to call the device API through IoT Cloud Transport, and the IoT Cloud Transport Network verifies the token and forwards the request to the device.

Note that if the user doesn't have permission to perform an action based on a device declaration, an unauthorized HTTP error code will be returned.

Build vision AI applications at the edge with NVIDIA Metropolis microservices and APIs

Figure 6. A call flow that authorizes users through the IoT cloud

summary

Use NVIDIA Metropolis APIs and microservices to build powerful, market-ready vision AI applications at the edge. APIs perform a variety of NVIDIA Metropolis microservice functions in a standardized, secure, and distributed manner. The reference mobile app included in this release showcases a mature end-user app built using these APIs with a user-friendly interface that captures configuration, video streaming, analytics, alerting, cloud integration, and device declarations. The app contains the source code, and detailed instructions for using the various modules are available in the Mobile Apps section of the documentation for this release.

Download NVIDIA Metropolis microservices for Jetson:

Please register for our two webinars:

Accelerate edge AI development with Metropolis APIs and microservices for Jetson

How to build with Metropolis microservices for Jetson

Read on