Introduction
Load balancing is the process of dividing network traffic among numerous backend servers in an efficient manner.
NGINX can proxy and load balance TCP (Transmission Control Protocol) traffic and UDP (User Datagram Protocol) traffic.
NGINX and NGINX Plus can continually test the health of upstream servers, avoid the servers that have failed, and add the recovered servers into the load‑balanced group.
From this article, we will learn about TCP and UDP Health checks. So let’s dive in.
Also see, Must Do Coding Questions
TCP Health Checks
We can send periodic health checks, including customized active health checks in NGINX Plus, to observe the health of TCP servers in the upstream group.
Passive TCP health checks
If a connection attempt to an upstream server fails or fails with an error, NGINX can label the server as unavailable and stop forwarding requests to it for a set period of time. Include the following options in the server directive to define the situations under which NGINX considers an upstream server unavailable:
fail_timeout: The amount of time until the server is considered unavailable if a specified number of connection attempts fail. Also, the time that NGINX considers the server inaccessible once it has been marked as such.
max_fails: The maximum number of failed requests for NGINX to consider the server unavailable during the specified time period.
10 seconds and 1 try are the default settings. NGINX identifies the server as unavailable for 10 seconds if a connection attempt times out or fails at least once in a 10-second interval. The following example demonstrates how to configure these parameters for two failures in 30 seconds:
upstream stream_backend {
server backend1.example.com:12345 weight=5;
server backend2.example.com:12345 max_fails=2 fail_timeout=30s;
server backend3.example.com:12346 max_conns=3;
}
Server Slow Start: A recently recovered upstream server can soon get overburdened with connections, resulting in the server being reported as unavailable once more. After it has been recovered, a slow start allows an upstream server to regain its weight from zero to its nominal value gradually. This can be accomplished using the upstream server directive's slow start parameter.
upstream backend {
server backend1.example.com:12345 slow_start=30s;
server backend2.example.com;
server 192.0.0.1 backup;
}
Active TCP Health Checks
A wide range of failure kinds can be tested using health checks. NGINX Plus, for example, may monitor upstream servers for responsiveness on a regular basis and avoid servers that have failed.
NGINX Plus sends upstream servers special health check requests and looks for a response that meets particular criteria. The health check fails if a connection to the server cannot be made, then the server is considered unhealthy. Client connections to sick servers are not proxied by NGINX Plus. If an upstream group has many health checks defined, the failure of any of them is enough to mark the related server as unhealthy.
To enable active health checks, follow these steps:
1. Set up a shared memory zone, which is a specific location where the NGINX Plus worker processes can share counter and connection state information. Specify the name of the zone (here, stream_backend) and the amount of RAM (64 KB) in the upstream server group with the zone directive:
stream {
#...
upstream stream_backend {
zone stream_backend 64k;
server backend1.example.com:12345;
server backend2.example.com:12345;
server backend3.example.com:12345;
}
#...
}
2. With the health check directive, enable active health_checks for the upstream group:
stream {
#...
server {
listen 12345;
proxy_pass stream_backend;
health_check;
#...
}
}
Also Read - Ibegin TCS
3. Reduce the wait between two consecutive health checks if necessary using the health_check_timeout directive. For health checks, this directive overrides the proxy_timeout setting, as the delay for health checks must be substantially shorter:
stream {
#...
server {
listen 12345;
proxy_pass stream_backend;
health_check;
health_check_timeout 5s;
}
}
4. NGINX Plus sends health check signals to the port given in the upstream block's server directive by default. For health checks, you can choose a different port, which is very useful when monitoring the health of many services on the same host. To override the port, use the health_check directive's port parameter:
stream {
#...
server {
listen 12345;
proxy_pass stream_backend;
health_check port=12346;
health_check_timeout 5s;
}
}
Also See, YII Framework
Fine-tuning TCP health checks
Nginx Plus connects to each server in a group of upstream servers every 5 seconds by default. Nginx Plus deems the health tests to have failed, labels the server as unhealthy, and stops relaying client connections to the server if the connection cannot be formed.
Add the following options to the health_check directive to modify the default behavior:
- interval: It specifies how frequently Nginx Plus sends health check queries in seconds (default is 5 seconds).
- passes: the server must pass several consecutive health checks in order to be considered healthy. 1 is the default value.
- fails: the server must fail to reply to numerous consecutive health checks in order to be labeled unhealthy. 1 is the default value.
The period between TCP health checks is increased to 10 seconds in the example below; the server is declared unhealthy after three consecutive failed health checks, and it must pass two consecutive tests to be regarded healthy again.
stream {
#...
server {
listen 12345;
proxy_pass stream_backend;
health_check interval=10 passes=2 fails=3;
}
#...
}
You can also read about mock interview.




