NGINX Tutorial: Improve Uptime and Resilience with a Canary Deployment

Original: https://www.nginx.com/blog/microservices-march-improve-kubernetes-uptime-and-resilience-with-a-canary-deployment/

Note: This tutorial is part of Microservices March 2022: Kubernetes Networking.


Your organization is successfully delivering apps in Kubernetes and now the team is ready to roll out v2 of a backend service. But there are valid concerns about traffic interruptions (a.k.a. downtime) and the possibility that v2 might be unstable. As the Kubernetes engineer, you need to find a way to ensure v2 can be tested and rolled out with little to no impact on customers.

You decide to implement a gradual, controlled migration using the traffic splitting technique “canary deployment” because it provides a safe and agile way to test the stability of a new feature or version. Your use case involves traffic moving between two Kubernetes services, so you choose to use NGINX Service Mesh because it’s easy and delivers reliable results. You send 10% of your traffic to v2 with the remaining 90% still routed to v1. Stability looks good, so you gradually transition larger and larger percentages of traffic to v2 until you reach 100%. Problem solved!

The easiest way to do this lab is to register for Microservices March 2022 and use the browser-based lab that’s provided. If you want to do it as a tutorial in your own environment, you need a machine with:

Note: This blog is written for minikube running on a desktop/laptop that can launch a browser window. If you’re in an environment where that’s not possible, then you’ll need to troubleshoot how to get to the services via a browser.

To get the most out of the lab and tutorial, we recommend that before beginning you:

This tutorial uses these technologies:

This tutorial includes three challenges:

  1. Deploy a Cluster and NGINX Service Mesh
  2. Deploy Two Apps (a Frontend and a Backend)
  3. Use NGINX Service Mesh to Implement a Canary Deployment

Challenge 1: Deploy a Cluster and NGINX Service Mesh

Deploy a minikube cluster. After a few seconds, a message confirms the deployment was successful.

$ minikube start \ 
--extra-config=apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key \ 
  --extra-config=apiserver.service-account-key-file=/var/lib/minikube/certs/sa.pub \ 
  --extra-config=apiserver.service-account-issuer=kubernetes/serviceaccount \ 
  --extra-config=apiserver.service-account-api-audiences=api 
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default 

Did you notice this minikube command looks different from other Microservices March tutorials?

Deploy NGINX Service Mesh

NGINX Service Mesh is maintained by F5 NGINX and uses NGINX Plus as the sidecar. Although NGINX Plus is a commercial product, you get to use it for free as part of NGINX Service Mesh.

There are two options for installation:

Helm is the simplest and fastest method, so it’s what we use in this tutorial.

  1. Download and install NGINX Service Mesh:
  2. helm install nms ./nginx-service-mesh  --namespace nginx-mesh --create-namespace 
  3. Confirm that the NGINX Service Mesh pods are deployed, as indicated by the value Running in the STATUS column.
  4. kubectl get pods --namespace nginx-mesh 
    NAME                                  READY   STATUS 
    grafana-7c6c88b959-62r72              1/1     Running 
    jaeger-86b56bf686-gdjd8               1/1     Running 
    nats-server-6d7b6779fb-j8qbw          2/2     Running 
    nginx-mesh-api-7864df964-669s2        1/1     Running 
    nginx-mesh-metrics-559b6b7869-pr4pz   1/1     Running 
    prometheus-8d5fb5879-8xlnf            1/1     Running 
    spire-agent-9m95d                     1/1     Running 
    spire-server-0                        2/2     Running 
    

It may take 1.5 minutes for all pods to deploy. In addition to the NGINX Service Mesh pods, there are pods for Grafana, Jaeger, NATS, Prometheus, and Spire. Check out the docs for information on how these tools work with NGINX Service Mesh.

Challenge 2: Deploy Two Apps (a Frontend and a Backend)

The app deployment consists of two microservices:

Install the Backend-v1 App

  1. Using the text editor of your choice, create a YAML file called 1-backend-v1.yaml with the following contents:
  2. apiVersion: v1 
    kind: ConfigMap 
    metadata: 
      name: backend-v1 
    data: 
      nginx.conf: |- 
        events {} 
        http { 
            server { 
                listen 80; 
                location / { 
                    return 200 '{"name":"backend","version":"1"}'; 
                } 
            } 
        } 
    --- 
    apiVersion: apps/v1 
    kind: Deployment 
    metadata: 
      name: backend-v1 
    spec: 
      replicas: 1 
      selector: 
        matchLabels: 
          app: backend 
          version: "1" 
      template: 
        metadata: 
          labels: 
            app: backend 
            version: "1" 
          annotations: 
        spec: 
          containers: 
            - name: backend-v1 
              image: "nginx" 
              ports: 
                - containerPort: 80 
              volumeMounts: 
                - mountPath: /etc/nginx 
                  name: nginx-config 
          volumes: 
            - name: nginx-config 
              configMap: 
                name: backend-v1 
    --- 
    apiVersion: v1 
    kind: Service 
    metadata: 
      name: backend-svc 
      labels: 
        app: backend 
    spec: 
      ports: 
        - port: 80 
          targetPort: 80 
      selector: 
        app: backend 
    
  3. Deploy backend-v1:
  4. $ kubectl apply -f 1-backend-v1.yaml 
    configmap/backend-v1 created 
    deployment.apps/backend-v1 created 
    service/backend-svc created 
    
  5. Confirm that the backend-v1 pod and services deployed, as indicated by the value Running in the STATUS column.
  6. $ kubectl get pods,services 
    NAME                              READY   STATUS 
    pod/backend-v1-745597b6f9-hvqht   2/2     Running 
    
    NAME                  TYPE        CLUSTER-IP       PORT(S) 
    service/backend-svc   ClusterIP   10.102.173.77    80/TCP 
    service/kubernetes    ClusterIP   10.96.0.1        443/TCP 
    

You may be wondering, “Why are there two pods running for backend-v1?”

Deploy the Frontend App

  1. Create a YAML file called 2-frontend.yaml with the following contents. Notice the pod uses cURL to issue a request to the backend service (backend-svc) every second.
  2. apiVersion: apps/v1 
    kind: Deployment 
    metadata: 
      name: frontend 
    spec: 
      selector: 
        matchLabels: 
          app: frontend 
      template: 
        metadata: 
          labels: 
            app: frontend 
        spec: 
          containers: 
          - name: frontend 
            image: curlimages/curl:7.72.0 
            command: [ "/bin/sh", "-c", "--" ] 
            args: [ "sleep 10; while true; do curl -s http://backend-svc/; sleep 1 && echo ' '; done" ] 
    
  3. Deploy frontend:
  4. $ kubectl apply -f 2-frontend.yaml 
    deployment.apps/frontend created 
    
  5. Confirm that the frontend pod deployed, as indicated by the value Running in the STATUS column. Again, note there are also two pods for each app because they are part of NGINX Service Mesh.
  6. $ kubectl get pods 
    NAME                         READY   STATUS    RESTARTS 
    backend-v1-5cdbf9586-s47kx   2/2     Running   0 
    frontend-6c64d7446-mmgpv     2/2     Running   0 
    

Check Logs

Next, you will inspect the logs to verify that traffic is flowing from frontend to backend-v1. The command to retrieve logs requires you to piece it together using this format:

kubectl logs -c frontend <insert the full pod id displayed in your Terminal>

TIP: Once you’ve created this command, save it somewhere that’s easy to retrieve, as it will be used repeatedly in this tutorial.

The full pod ID is available in the previous step (frontend-6c64d7446-mmgpv) and is unique to your deployment. When you submit the command, the logs should report that all traffic is routing to backend-v1, which is expected since it’s your only backend.

$ kubectl logs -c frontend frontend-6c64d7446-mmgpv 

{"name":"backend","version":"1"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"1"} 

Inspect the Dependency Graph with Jaeger

What’s more interesting is that the NGINX Service Mesh sidecars deployed alongside the two apps are collecting metrics as the traffic flows. You can use this data to derive a dependency graph of your architecture with Jaeger.

  1. Use minikube service jaeger to open the Jaeger dashboard in a browser.
  2. Click the “System Architecture” tab, where you should see a very simple architecture (hover your cursor over the graph to get the labels). The DAG tab provides a magnified view. Imagine if you had dozens or even hundreds of backend services being accessed by frontend – this would be a very interesting graph!

Add Backend-v2

You will now deploy a second backend app backend-v2 that will also serve frontend. As the version number suggests, backend-v2 is a new version of backend-v1.

  1. Create a YAML file called 3-backend-v2.yaml with the following contents, and notice:
  2. apiVersion: v1 
    kind: ConfigMap 
    metadata: 
      name: backend-v2 
    data: 
      nginx.conf: |- 
        events {} 
        http { 
            server { 
                listen 80; 
                location / { 
                    return 200 '{"name":"backend","version":"2"}'; 
                } 
            } 
        } 
    --- 
    apiVersion: apps/v1 
    kind: Deployment 
    metadata: 
      name: backend-v2 
    spec: 
      replicas: 1 
      selector: 
        matchLabels: 
          app: backend 
          version: "2" 
      template: 
        metadata: 
          labels: 
            app: backend 
            version: "2" 
          annotations: 
        spec: 
          containers: 
            - name: backend-v2 
              image: "nginx" 
              ports: 
                - containerPort: 80 
              volumeMounts: 
                - mountPath: /etc/nginx 
                  name: nginx-config 
          volumes: 
            - name: nginx-config 
              configMap: 
                name: backend-v2 
    
  3. Deploy backend-v2:
  4. $ kubectl apply -f 3-backend-v2.yaml 
    configmap/backend-v2 created 
    deployment.apps/backend-v2 created 
    
  5. Confirm that the backend-v2 pod and services deployed, as indicated by the value Running in the STATUS column.

Inspect the Logs

Using the same command as earlier, inspect the logs. You should now see evenly distributed responses from both backend versions.

$ kubectl logs -c frontend frontend-6c64d7446-mmgpv 

{"name":"backend","version":"1"} 
{"name":"backend","version":"2"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"2"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"2"} 
{"name":"backend","version":"1"} 

Return to Jaeger

Return to the Jaeger tab to see if NGINX Service Mesh was able to correctly map both backend versions.

Challenge 3: Use NGINX Service Mesh to Implement a Canary Deployment

So far in this tutorial, you deployed two versions of backend: v1 and v2. While you could immediately move all traffic to v2, it’s best practice to test stability before entrusting a new version with production traffic. A canary deployment is a perfect technique for this use case.

What is a Canary Deployment?

As discussed in the blog How to Improve Resilience in Kubernetes with Advanced Traffic Management, a canary deployment is a type of traffic split that provides a safe and agile way to test the stability of a new feature or version. A typical canary deployment starts with a high share (say, 99%) of your users on the stable version and moves a tiny group (the other 1%) to the new version. If the new version fails, for example crashing or returning errors to clients, you can immediately move the test group back to the stable version. If it succeeds, you can switch users from the stable version to the new one, either all at once or (as is more common) in a gradual, controlled migration.

This diagram depicts a canary deployment using an Ingress controller to split traffic.

Canary Deployments Between Services

In this tutorial, you’re going to set up a traffic split so 90% of the traffic goes to backend-v1 and 10% goes to backend-v2.

While an Ingress controller is used to split traffic when it’s flowing from clients to a Kubernetes service, it can’t be used to split traffic between services. There are two options for implementing this type of canary deployment:

Option 1: The Hard Way
You could instruct a proxy on the frontend pod to send 9 out of 10 requests to backend-v1. But imagine if you have dozens of replicas of frontend. Do you really want to be manually updating all those proxies? No! It would be error-prone and time-consuming.

Option 2: The Better Way
Observability and control are excellent reasons to use a service mesh. And you can do so much more. A service mesh is also the ideal tool for implementing a traffic split between services because you can apply a single policy to all of your frontend replicas served by the mesh!

Using NGINX Service Mesh for Traffic Splits

NGINX Service Mesh implements the Service Mesh Interface (SMI), a specification that defines a standard interface for service meshes on Kubernetes, with typed resources such as TrafficSplit, TrafficTarget, and HTTPRouteGroup. With these standard Kubernetes configurations, NGINX Service Mesh and the NGINX SMI extensions make traffic splitting policies, like canary deployment, simple to deploy with minimal interruption to production traffic.

In this diagram from the blog How Do I Choose? API Gateway vs. Ingress Controller vs. Service Mesh, you can see how NGINX Service Mesh implements a canary deployment between services with conditional routing based on HTTP/S criteria.

NGINX Service Mesh’s architecture – like all meshes – has a data plane and a control plane. Because NGINX Service Mesh leverages NGINX Plus for the data plane, it’s able to perform advanced deployment scenarios.

Create the Canary Deployment

The NGINX Service Mesh control plane can be controlled with Kubernetes Custom Resource Definitions (CRDs). It uses the Kubernetes services to retrieve a list of pod IP addresses and ports. Then, it combines the instructions from the CRD and informs the sidecars to route traffic directly to the pods.

  1. Using the text editor of your choice, create a YAML file called 5-split.yaml that defines the traffic split using the TrafficSplit CRD.
  2. apiVersion: split.smi-spec.io/v1alpha3 
    kind: TrafficSplit 
    metadata: 
      name: backend-ts 
    spec: 
      service: backend-svc 
      backends: 
      - service: backend-v1 
        weight: 90 
      - service: backend-v2 
        weight: 10 
    

    Notice how there are three services defined in the CRD (and you have created only one so far):

  3. Before implementing the traffic split, you must create the missing services. Create a YAML file called 4-services.yaml with the following contents:
  4. apiVersion: v1 
    kind: Service 
    metadata: 
      name: backend-v1 
      labels: 
        app: backend 
        version: "1" 
    spec: 
      ports: 
        - port: 80 
          targetPort: 80 
      selector: 
        app: backend 
        version: "1" 
    --- 
    apiVersion: v1 
    kind: Service 
    metadata: 
      name: backend-v2 
      labels: 
        app: backend 
        version: "2" 
    spec: 
      ports: 
        - port: 80 
          targetPort: 80 
      selector: 
        app: backend 
        version: "2" 
    
  5. Add your services:
  6. $ kubectl apply -f 4-services.yaml 
    service/backend-v1 created 
    service/backend-v2 created 
    

Observe the Logs

Before you implement the traffic split, check the logs to see how traffic is flowing without the TrafficSplit CRD in place. You should see that traffic is being evenly split between v1 and v2.

$ kubectl logs -c frontend frontend-6c64d7446-mmgpv 

{"name":"backend","version":"1"} 
{"name":"backend","version":"2"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"2"} 
{"name":"backend","version":"1"} 
{"name":"backend","version":"2"} 

Implement the Canary Deployment

  1. Apply the TrafficSplit CRD.
  2. $ kubectl apply -f 5-split.yaml 
    trafficsplit.split.smi-spec.io/backend-ts created 
    
  3. Observe the logs again. Now, you should see 90% of traffic being delivered to v1, as defined in 5-split.yaml.
  4. $ kubectl logs -c frontend frontend-6c64d7446-mmgpv 
    
    {"name":"backend","version":"1"} 
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"1"} 
    {"name":"backend","version":"1"} 
    {"name":"backend","version":"1"} 
    

Execute a Rollover to V2

It’s rare that you’ll want to do a 90/10 traffic split and then immediately move all your traffic over to the new version. Instead, best practice is to move traffic incrementally. For example: 0%, 5%, 10%, 25%, 50%, and 100%. To illustrate how easy it can be to implement an incremental rollover, you’ll change the weighting to 20/80 and then 0/100.

  1. Edit 5-split.yaml so that backend-v1 gets 20% of traffic and backend-v2 gets the remaining 80%.
  2. apiVersion: split.smi-spec.io/v1alpha3 
    kind: TrafficSplit 
    metadata: 
      name: backend-ts 
    spec: 
      service: backend-svc 
      backends: 
      - service: backend-v1 
        weight: 20 
      - service: backend-v2 
        weight: 80 
    
  3. Apply the changes:
  4. $ kubectl apply -f 5-split.yaml 
    trafficsplit.split.smi-spec.io/backend-ts configured 
    
  5. Observe the logs to see the changes in action:
  6. $ kubectl logs -c frontend frontend-6c64d7446-mmgpv 
    
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"1"} 
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"2"} 
    
  7. To complete the rollover, edit 5-split.yaml so that backend-v1 gets 0% of the traffic and backend-v2 gets 100%.
  8. apiVersion: split.smi-spec.io/v1alpha3 
    kind: TrafficSplit 
    metadata: 
      name: backend-ts 
    spec: 
      service: backend-svc 
      backends: 
      - service: backend-v1 
        weight: 0 
      - service: backend-v2 
        weight: 100 
    
  9. Apply the changes:
  10. $ kubectl apply -f 5-split.yaml 
    trafficsplit.split.smi-spec.io/backend-ts configured 
    
  11. Observe the logs to see the change in action. All responses have shifted to backend-v2 which means your rollover is complete!
  12. $ kubectl logs -c frontend frontend-6c64d7446-mmgpv 
    
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"2"} 
    {"name":"backend","version":"2"} 
    

    Next Steps

    You can use this blog to implement the tutorial in your own environment or try it out in our browser-based lab (register here). To learn more on the topic of exposing Kubernetes services, follow along with the other activities in Unit 4: Advanced Kubernetes Deployment Strategies:

    1. Watch the high-level overview webinar
    2. Review the collection of technical blogs and videos

    NGINX Service Mesh is completely free. You can download it using Helm (the method leveraged in this tutorial) or through F5 Downloads.

Retrieved by Nick Shadrin from nginx.com website.