How to Use NGINX Service Mesh for Rate Limiting

It doesn’t matter whether the intent is malicious (brute‑force password guessing and DDoS attacks) or benign (customers flocking to a sale) – a high volume of HTTP requests can overwhelm your services and cause your apps to crash. An easy solution to the problem is rate limiting, which restricts the number of requests each user can make in a given time period. In a Kubernetes environment, however, a significant part of the total volume of traffic reaching a service might be outside of the purview of the Ingress controller, in the form of communication with other services. In this situation it often makes sense to set up rate‑limiting policies using a service mesh.

Configuring rate limiting with NGINX Service Mesh is a simple task which you can complete in less than 10 minutes. Check out this demo to see it in action and read on to learn how to define and apply a rate‑limiting policy.

Demo: Configuring Rate Limiting with NGINX Service Mesh

The demo uses three containers injected with the NGINX Service Mesh sidecar: a backend service, a frontend service, and a bash terminal. The NGINX Service Mesh control plane has also been deployed.

The frontend service sends a request to the backend service every second, and we can see the responses in the frontend service’s log:

backend v1
backend v1
backend v1
backend v1

Applying a Rate-Limiting Policy (1:00)

Suppose we don’t want the backend service to be receiving so many requests. We can define a rate‑limiting policy as a custom resource with the following fields:

destination – The service that’s receiving requests; here it’s our backend service.
sources – A list of clients from which requests come, each subjected to the rate limit. Here we’re defining just one source, our frontend service.
rate – The rate limit. Here it’s 10 requests per minute, or 1 every 6 seconds.

apiVersion: specs.smi.nginx.com/v1alpha1
kind: RateLimit
metadata:
  name: backend-rate-limit
  namespace: default
spec:
  destination:
    kind: Service
    name: backend-svc
    namespace: default
  sources:
  - kind: Deployment
    name: frontend
    namespace: default
  name: 10rm
  rate: 10r/m
  burst: 0
  delay: nodelay

We run this command to activate the policy:

$ kubectl create -f rate-limit.yaml

In the log for the frontend, we see that five of every six requests is denied with this message:

<html>
<head><title>503 Service Temporarily Unavailable</title</head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx/1.19.5</center>
</body>
</html>

Applying the Rate Limit to All Clients (2:32)

The rate limit applies only to the client named in the sources field (our frontend service). The backend service accepts requests from all other clients at whatever rate they send them. We can illustrate this by repeatedly sending requests in the bash terminal; each request receives the backend v1 response that indicates success.

There are two ways to apply the rate limit to all clients. The first is to add their names to the sources field. The second, and much simpler, way is to remove the sources field entirely. We do that by running this command to edit the policy:

$ kubectl edit ratelimits.specs.smi.nginx.com backend-rate-limit

After saving the edited policy, we again make requests in the bash terminal and see that requests from that source that exceed the rate limit get rejected with the formatted 503 error shown above.

Allowing Bursts of Requests (3:43)

There are a couple of other fields we can add to the policy to customize the rate limit. We know that some apps are “bursty”, sending multiple requests in rapid succession. To accommodate this, we can add the burst field. Here we set it to 3, meaning that the backend service accepts that many additional requests in each six‑second period. Requests beyond that are rejected.

The delay field controls how the allowed burst of requests is fed to the backend service. Without it (that is, by default), burst requests are queued and are sent according to the rate limit, interleaved with new requests. To send burst requests immediately, we set the delay field to the value nodelay.

You can also set the delay field to an integer. For example, if we set it to 3 and increase the burst field to 5, then when five or more burst requests arrive with a six‑second period, three are sent immediately, two are queued, and the rest are rejected.

We can observe the effect in the log of setting burst: 3 and delay: nodelay. We see three extra requests are accepted before a request is rejected:

backend v1
backend v1
backend v1
backend v1
<html>
<head><title>503 Service Temporarily Unavailable</title</head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx/1.19.5</center>
</body>
</html>
. . .

Removing the Rate Limit (6:30)

The final action in our demo is run this command to deactivate the rate‑limiting policy and accept all requests:

$ kubectl delete -f rate-limit.yml

Try NGINX Service Mesh for Rate Limiting

For details on the burst and delay parameters, see the reference documentation. For a discussion of other traffic‑management patterns, read How to Improve Resilience in Kubernetes with Advanced Traffic Management on our blog.

NGINX Service Mesh is completely free and available for immediate download and can be deployed in less than 10 minutes! To get started, check out the docs and let us know how it goes via GitHub.