Modern Load Balancing: L4 vs L7, Algorithms, and Health Checks
Load balancing distributes incoming traffic across many backend servers so no single one becomes a bottleneck. It’s also how you get high availability — if one server dies, the load balancer simply stops sending traffic to it.
Layer 4 vs Layer 7
Load balancers operate at one of two layers of the OSI model:
| Layer 4 (Transport) | Layer 7 (Application) | |
|---|---|---|
| What it sees | IPs and ports only | Full HTTP, headers, paths, cookies |
| Speed | Very fast (millions of conn/sec) | Slower (TLS terminate, parse HTTP) |
| Routing decisions | By IP/port hash | By URL path, host, header, cookie |
| Examples | AWS NLB, HAProxy TCP mode | NGINX, Envoy, AWS ALB, Cloudflare |
The classic algorithms
- Round-robin: rotate through servers in order. Simple, fair when all servers are equal.
- Least connections: send the next request to the server with the fewest open connections. Better when request times vary wildly.
- IP hash: the client’s IP picks the server. Provides session stickiness without cookies.
- Weighted: bigger servers get a higher share. Useful for heterogeneous fleets.
- Power-of-two-choices: pick two servers at random, send to the less loaded. Surprisingly close to optimal at very low cost.
Health checks are everything
A load balancer is only as good as its health checks. The classic mistakes:
- Checking
/instead of a real/healthendpoint — your homepage might 200 while the database is on fire. - Health endpoint checks downstream dependencies — one slow database takes the whole fleet out of rotation simultaneously.
- Too aggressive thresholds — a single failed check pulls a node, causing thundering herds and cascading failures.
The sweet spot: a lightweight /health that checks only local liveness, plus a separate /ready for orchestrators.
Connection draining
When you remove a backend (deploy, scale-in), the load balancer should stop sending new connections but let existing ones finish. AWS calls this “deregistration delay,” NGINX calls it graceful shutdown. Always set this — without it, every deploy means dropped requests.
Global load balancing
One load balancer per region only takes you so far. For a global service you layer GeoDNS or Anycast on top — users hit the closest region, and inside that region a regional LB picks a server. Cloudflare, Fastly, and the major clouds all offer this as managed services.
What to learn next
Load balancers live in front of cloud VPCs and often pair with a CDN. Understanding TCP and TLS is essential to debug them.