Table of contents

Introduction

TCP Health Checks

2.1.

Passive TCP health checks

2.2.

Active TCP Health Checks

2.3.

Fine-tuning TCP health checks

UDP Health Checks

3.1.

Passive UDP Health Checks

3.2.

Active UDP Health Checks

3.3.

Fine-tuning UDP Health Checks

FAQs

Key Takeaways

Last Updated: Mar 27, 2024

Easy

NGINX TCP UDP Health Checks

Author Toohina Barua

Do you think IIT Guwahati certified course can help you in your career?

Yes

Introduction

Load balancing is the process of dividing network traffic among numerous backend servers in an efficient manner.
NGINX can proxy and load balance TCP (Transmission Control Protocol) traffic and UDP (User Datagram Protocol) traffic.
NGINX and NGINX Plus can continually test the health of upstream servers, avoid the servers that have failed, and add the recovered servers into the load‑balanced group.
From this article, we will learn about TCP and UDP Health checks. So let’s dive in.

Also see, Must Do Coding Questions

TCP Health Checks

We can send periodic health checks, including customized active health checks in NGINX Plus, to observe the health of TCP servers in the upstream group.

Passive TCP health checks

If a connection attempt to an upstream server fails or fails with an error, NGINX can label the server as unavailable and stop forwarding requests to it for a set period of time. Include the following options in the server directive to define the situations under which NGINX considers an upstream server unavailable:
fail_timeout: The amount of time until the server is considered unavailable if a specified number of connection attempts fail. Also, the time that NGINX considers the server inaccessible once it has been marked as such.
max_fails: The maximum number of failed requests for NGINX to consider the server unavailable during the specified time period.

10 seconds and 1 try are the default settings. NGINX identifies the server as unavailable for 10 seconds if a connection attempt times out or fails at least once in a 10-second interval. The following example demonstrates how to configure these parameters for two failures in 30 seconds:

upstream stream_backend {
    server backend1.example.com:12345 weight=5;
    server backend2.example.com:12345 max_fails=2 fail_timeout=30s;
    server backend3.example.com:12346 max_conns=3;
}

Server Slow Start: A recently recovered upstream server can soon get overburdened with connections, resulting in the server being reported as unavailable once more. After it has been recovered, a slow start allows an upstream server to regain its weight from zero to its nominal value gradually. This can be accomplished using the upstream server directive's slow start parameter.

upstream backend {
    server backend1.example.com:12345 slow_start=30s;
    server backend2.example.com;
    server 192.0.0.1 backup;
}

Active TCP Health Checks

A wide range of failure kinds can be tested using health checks. NGINX Plus, for example, may monitor upstream servers for responsiveness on a regular basis and avoid servers that have failed.
NGINX Plus sends upstream servers special health check requests and looks for a response that meets particular criteria. The health check fails if a connection to the server cannot be made, then the server is considered unhealthy. Client connections to sick servers are not proxied by NGINX Plus. If an upstream group has many health checks defined, the failure of any of them is enough to mark the related server as unhealthy.

To enable active health checks, follow these steps:

1. Set up a shared memory zone, which is a specific location where the NGINX Plus worker processes can share counter and connection state information. Specify the name of the zone (here, stream_backend) and the amount of RAM (64 KB) in the upstream server group with the zone directive:

stream {
    #...
    upstream stream_backend {
        zone   stream_backend 64k;
        server backend1.example.com:12345;
        server backend2.example.com:12345;
        server backend3.example.com:12345;
    }
    #...
}

2. With the health check directive, enable active health_checks for the upstream group:

stream {
    #...
    server {
        listen        12345;
        proxy_pass    stream_backend;
        health_check;
        #...
    }
}

Fine-tuning TCP health checks

Nginx Plus connects to each server in a group of upstream servers every 5 seconds by default. Nginx Plus deems the health tests to have failed, labels the server as unhealthy, and stops relaying client connections to the server if the connection cannot be formed.

Add the following options to the health_check directive to modify the default behavior:

interval: It specifies how frequently Nginx Plus sends health check queries in seconds (default is 5 seconds).
passes: the server must pass several consecutive health checks in order to be considered healthy. 1 is the default value.
fails: the server must fail to reply to numerous consecutive health checks in order to be labeled unhealthy. 1 is the default value.

The period between TCP health checks is increased to 10 seconds in the example below; the server is declared unhealthy after three consecutive failed health checks, and it must pass two consecutive tests to be regarded healthy again.

stream {
    #...
    server {
        listen       12345;
        proxy_pass   stream_backend;
        health_check interval=10 passes=2 fails=3;
    }
    #...
}

You can also read about mock interview.

UDP Health Checks

This section shows how to set up different health checks for UDP servers in a load-balanced upstream server group.

Passive UDP Health Checks

If the server responds with an error or times out, NGINX can identify it as unavailable and stop delivering UDP datagrams to it for a period of time.
The max_fails parameter for an upstream server controls the number of consecutive failed connection attempts during a given time period (default value is 1).
The fail_timeout argument specifies the time interval (default value is 10 seconds). The parameter also determines how long NGINX considers the server unavailable after it has been marked as such.
NGINX identifies the server as unavailable for 10 seconds if a connection attempt times out or fails at least once in a 10-second interval. The following example demonstrates how to configure these parameters for two failures in 60 seconds:

upstream dns_upstream {
    server 192.168.136.130:53 fail_timeout=60s;
    server 192.168.136.131:53 fail_timeout=60s;
}

Active UDP Health Checks

Active Health Checks are only available for NGINX Plus and allow for a wider range of failure types to be tested. Instead of waiting for a DNS client's real TCP request to fail before declaring the DNS server down (as in passive health checks), NGINX Plus will issue specific health check requests to each upstream server and check for a response that meets particular criteria. The health check fails if a connection to the server cannot be made, then the server is considered unhealthy. Client connections to sick servers are not proxied by NGINX Plus. If more than one health check is specified, the failure of any of them is sufficient to declare the upstream server unhealthy.

To enable active health checks:

1. With the zone directive in the upstream group, create a shared memory zone — a special place where the NGINX Plus worker processes can share state information about counters and connections. Specify the zone name (dns_zone in this case) and the zone size (64k in this example) in the zone directive:

stream {
    #...
    upstream dns_upstream {
        zone   dns_zone 64k;
        server 192.168.136.130:53;
        server 192.168.136.131:53;
        server 192.168.136.132:53;
    }
    #...
}

2. Set the udp parameter to the health check directive in the server block that passes traffic to the upstream group (through proxy_pass):

stream {
    #...
    server {
         listen       53 udp;
         proxy_pass   dns_upstream;
         health_check udp;
    }
    #...
}

Fine-tuning UDP Health Checks

By passing the following parameters to the health check directive, you can fine-tune the health check:

interval: How frequently do you do it? (in seconds) NGINX Plus makes requests for health checks (default is 5 seconds)
passes: The number of times a server must reply to a health check in order to be considered healthy (default is 1)
fails: The number of times the server must fail to react to health checks in a row to be labelled unhealthy (default is 1)

The period between UDP health checks is increased to 20 seconds in the example below; the server is declared sick after two consecutive failed health checks, and it must pass two consecutive tests to be regarded healthy again.

server {
    listen       53 udp;
    proxy_pass   dns_upstream;
    health_check interval=20 passes=2 fails=2 udp;
}

Must Read, 8085 Microprocessor Pin Diagram

FAQs

What is TCP?
Transmission Control Protocol is a network communication protocol that allows data to be exchanged between systems. The data is transferred in packets in this method. It incorporates error-checking, ensures delivery, and keeps the data packets in order.
What is UDP?
User Datagram Protocol is the same as the TCP protocol, but it doesn't have error-checking and data recovery. If you utilise this protocol, the data will be delivered indefinitely if you utilize this protocol, regardless of the receiving end's problems.
Which is more reliable: TCP or UDP?
When compared to UDP, TCP is more reliable since it supports error checking and guarantees data delivery to the destination router.
Which is better speed and performance wise: TCP or UDP?
TCP is slower and less efficient than UDP in terms of performance. In addition, TCP is more heavier than UDP.
How can you verify server responses to health checks?
You can verify server responses to health checks by configuring a number of tests. These tests are defined within the match {} configuration block.

Key Takeaways

From this article, we learned about TCP and UDP health checks in NGINX and the steps to perform them.
But this is not enough; you need something extra to excel in web development truly. If you want to learn more about web development, you can read our articles or take our highly curated Web Development course.