laitimes

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

author:Enterprises go to the cloud those things

Introduction: At present, the industry has summarized several common service release strategies to solve the problem of traffic loss caused by the version upgrade process. This topic first analyzes the principles of these common publishing policies, and finally implements them in conjunction with Alibaba Cloud's cloud-native gateway.

Author | Young less

background

At present, the industry has summarized several common service release strategies to solve the problem of traffic loss during the version upgrade process. This topic first analyzes the principles of these common publishing policies, and finally implements them in conjunction with Alibaba Cloud's cloud-native gateway.

Publish the policy

Widely adopted service launch strategies include blue-green releases, A/B testing, and canary releases.

1. Blue-green release

Blue-green release requires redundant deployment of the new version of the service, generally the machine specifications and quantity of the new version are consistent with the old version, which is equivalent to the service has two sets of exactly the same deployment environment, but at this time only the old version is providing services externally, and the new version is used as a hot spare. When the service is versioned, we simply switch all traffic to the new version, which is hot spare. Because of redundant deployments, you don't have to worry about running out of resources for the new version. If there is a serious bug in the new version after it goes live, then we only need to cut all the traffic back to the old version, which greatly shortens the time for failure recovery. After the new version has been bug fixed and redeployed, switch traffic from the old version to the new version.

Blue-green publishing addresses unavailability during service release by using additional machine resources, and can also quickly cut traffic back to the old version when a new version of the service fails.

As shown in the figure, the old version of a service is v1, and the new version of v2 is redundantly deployed. When the version is upgraded, all existing traffic is switched to the new version v2.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

When the new version of v2 has a program bug or fails, you can quickly switch back to the old version of v1.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Advantages of a blue-green deployment:

1. The deployment structure is simple and the operation and maintenance are convenient;

2. The service upgrade process is simple to operate and the cycle is short.

Disadvantages of blue-green deployments:

1. Resource redundancy, two sets of production environments need to be deployed;

2. The new version of the fault has a large scope of impact.

2. A/B test

Compared with the traffic switching method of blue-green publishing, A/B tests to route traffic to the new version based on the meta information requested by the user, which is a grayscale publishing strategy based on the matching content of the request. Only requests that match specific rules are flown to the new version, and common practices include http headers and cookies. Examples based on the Http Header method, such as a User-Agent with a value of Android (a request from an Android system) can access the new version, while other systems still access the old version. Based on the example of cookie methods, cookies often contain user information with business semantics, such as regular users can access the new version, and VIP users still access the old version.

As shown in the figure, the current version of a service is v1, and now the new version of v2 is going live. Hopefully, Android users can try out new features and other system users will remain the same.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

By observing the success rate and RT comparison between the old version and the new version in the monitoring platform, when the overall service of the new version is expected, all requests can be switched to the new version v2, and finally in order to save resources, you can gradually go offline to the old version v1.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Advantages of A/B testing:

1. A new version of the service can be provided to a specific request or user, and the impact of the new version failure is small;

2. A complete monitoring platform needs to be built to compare the differences in the status of requests between different versions.

Disadvantages of A/B testing:

1. There is still resource redundancy, because the request capacity cannot be accurately assessed;

2. Long release cycle.

3. Canary release

In the blue-green release, due to the overall traffic switching, it is necessary to clone an environment for the new version according to the machine scale occupied by the original service, which is equivalent to requiring 1 times the original machine resources. In A/B testing, as long as we can estimate the size of the requests that match a particular rule, we can allocate additional machine resources to the new version on demand. Compared to the first two release strategies, the idea of canary release is to divert a small number of requests to the new version, so deploying the new version of the service requires only a very small number of machines. After verifying that the new version meets the expectations, gradually adjust the traffic weight ratio so that the traffic is slowly migrated from the old version to the new version, during which the new version of the service can be expanded according to the set traffic ratio, and the old version of the service is reduced, so that the underlying resources can be maximized.

As shown in the figure, the current version of a service is v1, and now the new version of v2 is going live. In order to ensure that the traffic is smooth and damaged during the service upgrade process, the canary release scheme is adopted to gradually migrate traffic from the old version to the new version.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Advantages of Canary Release:

1. Proportionally, the traffic is indiscriminately directed to the new version, and the impact of the new version of the failure is small;

2. Gradually expand the capacity of the new version during the release period, and at the same time reduce the capacity of the old version, and the resource utilization rate is high.

Disadvantages of canary release:

1. Traffic is indiscriminately directed to the new version, which may affect the experience of important users;

practice

Next, we will practice the three publishing strategies described above based on Alibaba Cloud's container O&M platform ACK and MSE Cloud Native Gateway. Here we use the simplest business architecture to demonstrate, that is, a cloud-native gateway, a back-end service (the current version information is returned in the response), and a registry. The registry determines the service discovery method in the business architecture, and we will practice different publishing strategies based on the two service discovery mechanisms of K8s Container Service and Nacos.

1, Prerequisites

  • Created alibaba cloud container o&M platform ACK
  • An MSE cloud-native gateway was created
  • Created msE registry Nacos (required when the service discovery mode is Nacos)

2. Service discovery method: K8s container service

In this example, we use the K8s native service discovery method, which registers the backend service with CoreDNS through declarative Service API resources. The backend service in the example provides a query for the current version of the interface/version, and the current version is v1. The cloud-native gateway deeply integrates the ACK, which can dynamically obtain service information from the ACK cluster in real time, making it easy to expose the backend service to external users through the cloud-native gateway.

The business architecture is as follows:

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

1. Deployment

Apply the following resources (Service and Deeployment) to the ACK cluster, and complete the deployment and release of the down-end service, and the current application version is v1.

apiVersion: v1
kind: Service
metadata:
  name: httpbin
spec:
  ports:
  - port: 8080
    protocol: TCP
  selector:
    app: httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpbin
      version: v1
  template:
    metadata:
      labels:
        app: httpbin
        version: v1
    spec:
      containers:
      - image: specialyang/spring-cloud-httpbin-k8s:v1
        imagePullPolicy: Always
        name: spring-cloud-httpbin-k8s
        ports:
        - containerPort: 8080           

In Cloud-Native Gateway Service Management - > Source Management, add the target ACK cluster.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Import the service httpbin to be exposed to the cloud-native gateway in service management.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Add the service version v1 to the policy configuration of the httpbin service, and note that you need to select the corresponding label to filter out the nodes of the v1 version, because at present we only deploy the v1 version, so the number of nodes of the v1 version accounts for 100% of the total number of instances.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Create a routing rule for the service in Route Management to expose the service to external users. The path of the API exposed by the httpbin service is /version, and the request is forwarded to the v1 version of the service httpbin.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Execute the following script to test the response results of the request.

for i in {1..10}; do curl "${GATEWAY_EXTERNAL_IP}/version"; echo "";  done
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1           

2. Blue-green deployment

The blue-green deployment needs to apply for the same resource specifications for the new version of the service according to the resource status occupied by the current version of the service, and after the deployment is completed, the traffic is switched to the new version of the service as a whole.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Deploy a new version of the httpbin service v2 with K8s' declarative API resources, and the number of replicas is also 3.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpbin
      version: v2
  template:
    metadata:
      labels:
        app: httpbin
        version: v2
    spec:
      containers:
      - image: specialyang/spring-cloud-httpbin-k8s:v2
        imagePullPolicy: Always
        name: spring-cloud-httpbin-k8s
        ports:
        - containerPort: 8080           

Add service version v2 to the policy configuration of the httpbin service, note that you need to select the corresponding label to filter out the nodes of the v2 version, and the number of nodes in the cluster now has the same v1 and v2 versions, so each accounts for 50%.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Now, we start using the idea of blue-green release to switch traffic from v1 to v2 as a whole, just by modifying the target service in the routing rule created above.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release
for i in {1..10}; do curl "${GATEWAY_EXTERNAL_IP}/version"; echo "";  done
version: v2
version: v2
version: v2
version: v2
version: v2
version: v2
version: v2
version: v2
version: v2
version: v2           

Now we find that the traffic for requests to access the API resource /version has all switched from v1 to v2.

3. A/B test

A/B testing routes traffic to the new version based on the meta information requested by the user, in other words, it can be dynamically routed based on the content of the request. For example, we want a User-Agent with a value of Android (a request from an Android system) to access the new version, while other systems still access the old version.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

We still take advantage of the v1, v2 deployment of httpbin deployed in the above practice. In addition, you need to create two routing rules:

  • Requests that match path to /version access service version v1
  • The match path is /version, and the User-Agent header contains Android's request access service version v2
Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Note that compared to the version routing rule, the request header matching rule needs to be added to the routing rule of version-v2.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Test the effect of A/B test with the following script.

// user agent中不含有 android
curl ${GATEWAY_EXTERNAL_IP}/version
version: v1

// user agent中含有 android
curl -H "User-Agent: Mozilla/5.0 (Linux; Android 4.0.3)" ${GATEWAY_EXTERNAL_IP}/version
version: v2           

As you can see, the current request is triaged according to the operating system of the source.

4. Canary release

Canary release allows a small part of the traffic to be drained to the new version of the service, and after the verification is passed, the traffic is gradually increased until the flow is completed, during which it can be accompanied by the expansion of the new version, and the reduction operation of the old version can be accompanied to maximize the utilization of resources.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

In a canary release strategy, the initial number of replicas deployed for a new version of the service does not need to be consistent with the original. We only need to keep the resources always meeting the grayscale traffic, so we set the number of replicas for the new version to 1, and you can see the proportion of the current number of nodes in each version in the service version module in the service policy.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Clearing the routing rules left over from other release policies, we create a new routing rule and forward traffic to the old and new versions according to the weights in the target service.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Among them, the target service needs to configure two destinations, the v1 and v2 versions of httpbin, and set the corresponding traffic ratio.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Test the effect of the canary release with the following script.

for i in {1..10}; do curl "${GATEWAY_EXTERNAL_IP}/version"; echo "";  done
version: v1
version: v1
version: v1
version: v1
version: v1
version: v2
version: v1
version: v2
version: v1
version: v1           

In the above test results, it can be found that 2 out of 10 requests are accessing the new version of v2, and its traffic ratio does meet the expected 8:2.

In the real business scenario, after the new version is verified, you can continue to increase the traffic weight to access the new version, pay attention to the expansion of the new version, and reduce the capacity of the old version as needed.

3. Service discovery method: Nacos Registration Center

The Kubernetes platform brings dynamic resiliency to containerized applications, accelerates the application delivery process, and improves the utilization of underlying resources, but its functional integrity and availability are slightly insufficient compared with other mainstream registries such as Nacos and Consul in terms of service discovery capabilities. Therefore, even though most of the business applications have been migrated to the Kubernetes operations platform, they still choose to retain the original registry for the business.

For this business scenario, we provide additional examples of how to perform blue-green releases, A/B tests, and canary releases for services when using a Nacos registry. The backend service in the example provides a query for the current version of the interface/version, and the current version is v1. The cloud-native gateway is deeply integrated with the MSE Nacos registry, which dynamically obtains service information from the Nacos instance in real time, making it easy to expose the backend service to external users through the cloud-native gateway.

Apply the following resources (Deployment) to the ACK cluster, complete the deployment of the end-of-pipe service, and publish the service to the Nacos registry, the current application version is v1. The following points need to be noted:

1. The variable ${NACOS_SERVER_ADDRESS} in the yaml resource needs to be replaced with your MSE Nacos address, if it is in a VPC with the gateway, then the intranet domain name is enough; otherwise, you need to configure the public domain name.

2. In K8s Service Discovery, the Labels information in the Pod can be regarded as the metadata information of the node. In the Nacos registry, the metadata information for a node depends on the information that the service carries when it registers. In the Spring Cloud framework, metadata information is added to nodes through environment variables spring.cloud.nacos.discovery.metadata.xxx non-intrusive, in this case, we use version as the version marker to distinguish between different versions of nodes. Therefore, you need to add the environment variable spring.cloud.nacos.discovery.metadata.version=v1 for the business container.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      labels:
        app: httpbin
    spec:
      containers:
      - image: specialyang/spring-cloud-httpbin-nacos:v1
        imagePullPolicy: Always
        name: spring-cloud-httpbin-nacos
        ports:
        - containerPort: 8080
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: ${NACOS_SERVER_ADDRESS}
        - name: spring.cloud.nacos.discovery.metadata.version
          value: v1           

In Service Management - > Source Management for Cloud Native Gateway, add the target MSE Nacos registry cluster.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

In Service Management, import the service httpbin to be exposed to the cloud-native gateway, and note that the service source is selected as the MSE Nacos registry.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

As with the example found by the K8s Service service, adding service version v1 to the policy configuration, the tag name and tag value can optionally be the metadata information we added when the httpbin service was registered version=v1. Requests that match path/version are then configured to be forwarded to the v1 version of the httpbin service.

for i in {1..10}; do curl "${GATEWAY_EXTERNAL_IP}/version"; echo "";  done
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1
version: v1           

Its release strategy is shown in the following figure:

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Deploy the new version of the httpbin service v2, note that the registry is consistent with the above, and add the environment variable spring.cloud.nacos.discovery.metadata.version=v2 for the business container, and the service will register with the specified Nacos when the business application starts, and carry the user-defined metadata information. Cloud-native gateways can use this metadata information to differentiate between different versions of nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      labels:
        app: httpbin
    spec:
      containers:
      - image: specialyang/spring-cloud-httpbin-nacos:v2
        imagePullPolicy: Always
        name: spring-cloud-httpbin-nacos
        ports:
        - containerPort: 8080
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: ${NACOS_SERVER_ADDRESS}
        - name: spring.cloud.nacos.discovery.metadata.version
          value: v2           

As with the blue-green publish operation in the K8s Service example, add service version v2 to the policy configuration of the httpbin service, and then modify the v1 version of the target service httpbin in the routing rule to version2, and after the publish is successful, see that the request result is all version: v2.

3. A/B test

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

We also use the previous example, where a User-Agent with a value of Android (a request from an Android system) can access the new version, while other systems still access the old version. The routing rules involved operate and are validated in the same way as the K8s Service example.

Read the pros and cons of Blue-Green Release, A/B Testing, and Canary Release

Similarly, the routing rules involved operate and are validated in the same way as the K8s Service example.

summary

This article briefly introduces and analyzes the common release strategies, and discusses each release strategy in detail in a graphical and textual manner, which is summarized as follows:

  • Blue-green release: The simple understanding is traffic switching, according to the idea of hot standby, redundant deployment of new versions of the service.
  • A/B testing: The simple understanding is to route request traffic to different versions of the service based on the content of the request (header, cookie).
  • Canary release: Is a release strategy based on traffic proportion, deploying one or a small batch of new versions of the service, diverting a small number (such as 1%) of requests to the new version, and gradually increasing the proportion of traffic until all user traffic is switched to the new version.

Cloud-native gateways use hosted as your traffic portal, provide rich traffic governance capabilities, support multiple service discovery methods such as K8s Service, Nacos, Eurake, ECS, and domain names, and support service versions and grayscale publishing capabilities in a unified model. In the above practice, it can be found that the two service discovery methods are only in different locations where the metadata information is located, but the grayscale release model in the service version management and routing rules is consistent, and you can easily learn to grayscale publish services with different service discovery methods to ensure a smooth and undamaged version upgrade process.

301 Moved Permanently

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Read on