laitimes

AWS ECS On Fargate 监控可观测最佳实践

author:Observe clouds

overview

Amazon ECS on Fargate provides users with a simple, efficient, and reliable containerized solution that allows users to focus on developing and running applications without worrying about the complexities of infrastructure management. At the same time, users need real-time visibility into the performance, availability, health, and resource usage of applications running in that environment. As a result, potential problems can be identified and taken to help users optimize resource usage, perceive problems, identify bottlenecks, and more to improve overall performance and user experience.

Observability Cloud fully supports observability capabilities on Amazon ECS on Fargate, including monitoring of basic resources, application link tracing, and log monitoring. The article will explain and demonstrate how observability can be implemented on this environment.

Amazon ECS On Fargate 简介

Amazon Elastic Container Service (Amazon ECS) is a highly scalable, high-performance container orchestration service that supports Docker containers to easily run and scale containerized applications. Amazon ECS, combined with Fargate, provides a way to run containers without having to manage infrastructure.

Fargate is a serverless compute engine that can run containers in Amazon ECS. It allows you to run containers without the need to provision or manage servers. Fargate is responsible for managing the underlying computing infrastructure, such as virtual machines, kernel patches, security updates, and more. You only need to focus on the packaging and deployment of containerized applications.

Key benefits of using Amazon ECS and Fargate include:

  • Serverless: There's no need to provision or manage infrastructure, and containerized applications can be launched and scaled quickly.
  • Simplicity: There's no need to manage the underlying operating system, clusters, or virtual machines. Fargate manages the infrastructure.
  • Scalability: Container instances can be automatically scaled up and down based on application needs.
  • High availability: Fargate runs containers across multiple Availability Zones, providing high availability.
  • Security: Fargate provides a secure computing environment and integrates with AWS security services.
  • Integrations: Seamless integration with other Amazon cloud services such as ALB, CloudWatch, IAM, and more.

Amazon ECS, combined with AWS Fargate, makes it easier to deploy, manage, and scale containerized applications without worrying about managing the underlying infrastructure. This serverless approach reduces operational costs, improves resource utilization, and accelerates application delivery.

AWS ECS On Fargate 中的监控数据采集说明

DataKit is an open-source, all-in-one data collection OneAgent of Observation Cloud, which provides support for all operating systems (Linux, Windows, and macOS) and has comprehensive data collection capabilities, including hosts, containers, middleware, tracing, logs, and other data collection capabilities.

In the ECS environment, monitoring data is collected through DataKit and uploaded to the observation cloud.

AWS ECS On Fargate 监控可观测最佳实践

As shown in the preceding figure, in the ECS On Fargate environment, in each ECS task that needs to access observable data, in addition to the business containers, you need to configure the corresponding DataKit and Log-router containers. DataKit collects the running metrics of containers and application link data in running tasks. Log data is collected through Log-router (Amazon Firelens - Fluent-bit), and the collected log data is transmitted to DataKit through logstream, which processes the data and uploads it to the observation cloud for subsequent query and analysis. Further descriptions of each type of monitoring data collection are as follows:

Metric collection

DataKit allows you to enable AWS ECS Fargate-related running metrics by setting the environment variable parameter "ENV_ECS_FARGATE: on". At the same time, you can use the "statsd" collection module in DataKit to collect, monitor, and analyze JVM running metrics of Java applications and rumetime running metrics of NodeJS applications.

Link acquisition

In the AWS ECS environment, when the application container is started, the corresponding call link data is generated by loading the ddtrace agent and sent to DataKit. DataKit runs as a sidecar and the application container in the same Amazon ECS task to receive the link data generated by the application and upload it to the observation cloud for query and analysis.

Log collection

In the AWS ECS environment, the AWS FireLens (Fluent-Bit plug-in) is used as a sidecar to run the same Amazon ECS task as the application container to collect application log data, send the collected log data to the logstream collection module of datakit, and finally upload it to the observation cloud for log query and analysis. There's no need to modify application deployment scripts, manually install additional software, or write additional code.

AWS ECS 任务配置说明

This section describes how to deploy and configure observational data collection in an ECS On Fargate environment.

Prefix conditions

Application mirroring

Taking a Java application as an example, if you need to collect the invocation link data of the Java application, you need to add the ddtrace java agent file to the application image in advance, and reserve a Java customized startup parameter entry to facilitate the subsequent flexible adjustment of the startup parameters through environment variables. Here's an example of this in a dockerfile:

Bash
COPY dd-java-agent.jar /dd-java-agent.jar
ENTRYPOINT ["sh", "-ec", "exec java ${JAVA_OPTS} -jar ${your_app.jar}"]
           

The download link of the latest Java DDTrace agent provided by Observation Cloud is as follows:

https://static.guance.com/dd-image/dd-java-agent.jar

Create an ECS task definition

In the same AWS ECS task, three containers are created: the application container, the datakit container, and the log-router container. The detailed container configuration is as follows:

Application containers

The configuration information of the application container in JSON format is as follows:

Bash
        {
            "name": "javatest",
            "image": "registry.cn-xxx.com/test/javatest:v2.0",
            "cpu": 1024,
            "portMappings": [
                {
                    "name": "javservice",
                    "containerPort": 9080,
                    "hostPort": 9080,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "DD_SERVICE",
                    "value": "java_service"
                },
                {
                    "name": "DD_ENV",
                    "value": "test"
                },
                {
                    "name": "JAVA_OPTS",
                    "value": "-javaagent:/dd-java-agent.jar"
                },
                {
                    "name": "DD_AGENT_HOST",
                    "value": "localhost"
                },
                {
                    "name": "DD_TRACE_AGENT_PORT",
                    "value": "9529"
                }
            ],
            "mountPoints": [],
            "volumesFrom": [],
            "logConfiguration": {
                "logDriver": "awsfirelens",
                "options": {
                    "Format": "json",
                    "Host": "localhost",
                    "Name": "http",
                    "Port": "9529",
                    "URI": "/v1/write/logstreaming?type=firelens&source=java&service=javatest&tags=project=test,app_name=java_app,cloud=amazon"
                }
            },
            "systemControls": []
        }      
           

There are two main configuration items, which are described below:

  • 通过 environment 变量 JAVA_OPTS 配置 Java 的启动参数以及 ddtrace 相关的参数。 DDtrace 具体支持的参数说明,可以参考如下链接: https://docs.guance.com/integrations/ddtrace-java/
  • logConfiguration,配置通过 AWS Firelens fluent-bit plugin (logstash) 来进行应用日志的采集和转发。 关于AWS Firelens plugin的详细信息,参考如下的链路: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/logstash

The following describes the parameters supported by logstreaming in the URI configuration:

  • type: data format, currently supports influxdb and firelens types, when the type is inflxudb ( /v1/write/logsreaming?type=influxdb ), it means that the data itself is a row protocol format, only built-in tags will be added, and no other operations will be done when the type is firelens ( /v1/write/logstreaming?type=firelens), the data format should be multiple logs in JSON format, and when this value is empty, the data will be processed by branching and Pipeline
  • source: identifies the source of the data
  • service:添加 service 标签字段
  • tags: Add custom tags, and separate multiple tags by commas
  • Pipeline: Specify the name of the pipeline that needs to be used for the data

Datakit 容器

The configuration information of the Datakit container in JSON format is as follows:

Bash
{
            "name": "datakit",
            "image": "pubrepo.guance.com/datakit/datakit:latest",
            "cpu": 0,
            "portMappings": [],
            "essential": false,
            "environment": [
                {
                    "name": "ENV_DATAKIT_INPUTS",
                    "value": "[[inputs.logstreaming]] \n ignore_url_tags = false"
                },
                {
                    "name": "ENV_DATAWAY",
                    "value": "https://openway.guance.com?token=tkn_098042exxxx"
                },
                {
                    "name": "ENV_HTTP_LISTEN",
                    "value": "0.0.0.0:9529"
                },
                {
                    "name": "ENV_DEFAULT_ENABLED_INPUTS",
                    "value": "dk,container,ddtrace,statsd"
                },
                {
                    "name": "ENV_ECS_FARGATE",
                    "value": "on"
                }
            ],
            "mountPoints": [],
            "volumesFrom": [],
            "systemControls": []
        }
           

Use the environment environment variable to configure DataKit. The following table describes how to use some variables:

  • ENV_DATAKIT_INPUTS:开启 logstreaming 采集模块。 主要用来接收通过 AWS Firelens fluent-bit 采集的日志数据
  • ENV_DATAWAY: the URL of the data upload route. Each workspace will have a one-to-one token
  • ENV_HTTP_LISTEN:DataKit 默认监听地址和端口
  • ENV_DEFAULT_ENABLED_INPUTS: Configure the data collector module that is enabled by default
  • ENV_ECS_FARGATE: Specifies whether to enable the collection of metrics related to container running in ECS

Log-Router 容器

Log-Router 容器 Json 格式的配置信息如下:

Bash
{
            "name": "log_router",
            "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:stable",
            "cpu": 0,
            "memoryReservation": 50,
            "portMappings": [],
            "essential": true,
            "environment": [],
            "mountPoints": [],
            "volumesFrom": [],
            "user": "0",
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/ecs-aws-firelens-sidecar-container",
                    "awslogs-region": "cn-northwest-1",
                    "awslogs-stream-prefix": "firelens"
                }
            },
            "systemControls": [],
            "firelensConfiguration": {
                "type": "fluentbit"
            }
        }
           

This section can be kept as default without additional adjustments.

运行 AWS ECS Fargate 任务

Create an ECS Fargate service based on the newly created task definition. After the service is successfully started, you will see the following three containers in the running state:

AWS ECS On Fargate 监控可观测最佳实践

Observe the display of the effect of use on the cloud

Once the configuration described above is completed and the task is successfully started on AWS, we can fully monitor the service operation in AWS ECS through the observation cloud. The following describes the effects of use:

About indicators

After you enable metric collection in ECS Fargate, you can monitor the running status of ECS containers in real time in the default infrastructure monitoring of the observation cloud.

AWS ECS On Fargate 监控可观测最佳实践

At the same time, you can also customize ECS-related dashboards through the scenario dashboard capability of the observation cloud. As shown in the figure below:

AWS ECS On Fargate 监控可观测最佳实践

About logs

You can use the log viewer of Observation Cloud to quickly view log information, including information fuzzy query and regular expression query. You can also use different filter conditions to directly filter and view the log information. As shown in the figure below:

AWS ECS On Fargate 监控可观测最佳实践

In addition, if you want to learn more about the link that generated the log, Observable Cloud also provides the ability to directly associate the log details page with the corresponding link. As shown in the figure below:

AWS ECS On Fargate 监控可观测最佳实践

About Application Links

When a service call occurs, you can see link-related information in the Application Performance Monitoring section of the observed cloud through the viewer, as shown in the following figure:

AWS ECS On Fargate 监控可观测最佳实践

You can click on a link to view the corresponding details, including the flame graph of the link call, the call dependencies, etc. At the same time, you can quickly realize the ability to correlate analysis by associating built-in views.

As shown in the following figure, when you associate the running metrics view of ECS Fargate, you can see the resource usage and running status of containers in ECS at the point in time (the location corresponding to the red vertical line) of the link that was called.

AWS ECS On Fargate 监控可观测最佳实践

If you associate the logs of the application, you can directly associate the log information generated when the service is invoked in the link for quick correlation analysis.

AWS ECS On Fargate 监控可观测最佳实践

If it is a Java application, you can also quickly correlate the running metrics of the JVM. Help users quickly understand whether there are any abnormalities in the running status of the JVM when a service is called, as shown in the following figure:

AWS ECS On Fargate 监控可观测最佳实践

The above-mentioned correlation analysis views can be flexibly customized, configured, added and deleted according to the analysis needs of users.

At this point, we have completed the implementation of the basic observability capabilities of the AWS ECS On Fargate environment.