Load Balancing - What is it and how does it work?

Building resilient applications that can handle high-volume user requests without slowing down requires the right tools to manage demand across multiple systems. Load balancing is one of the most common ways to achieve this.

What Is Load Balancing?

Load balancing is the process of distributing client requests among a pool of application servers for optimal resource usage and system reliability. It involves the use of either a hardware device or software known as a load balancer, which serves as an intermediary between clients and servers.

For instance, Facebook and other popular social media platforms rely on this process to maintain low response times and minimal downtime while serving millions of people daily.

Even brief slowdowns or interruptions in apps and sites with a much smaller user base can lead to churn because customers know there will always be an alternative to meet these expectations.

How Does Load Balancing Work?

Load balancers sit between incoming client traffic and a group of backend servers, receiving and routing requests to the right server based on a chosen algorithm and preconfigured rules.

The server handles the request, sending its response back through the load balancer to the client in most implementations. In some setups, the server may respond directly to the client.

Server load balancing isn't restricted to specific protocols and can work at transport and application layers. Developers may use it for traffic over HTTP/HTTPS, TCP, UDP, WebSockets, and more.

Additionally, while the load balancer doesn't launch or terminate servers itself, it can include or exclude new ones in response to scaling policies.

Load Balancing Process

Let's examine this process using a finance app as an example.

When a customer initiates a fund transfer, the banking client sends the corresponding request to the load balancer, which acts as a reverse proxy, directing and optimizing it and other inbound requests.

Depending on the setup, the load balancer regularly assesses the health of the servers using techniques like TCP connection tests, HTTP checks, or pinging. It will only allow healthy servers to receive new requests, skipping those that are unresponsive or that failed health checks.

It routes the request and other traffic to the appropriate server(s) based on its configurations and real-time load, using a developer-selected algorithm to determine which will receive and process the fund transfer.

If the customer started their transfer during peak hours, there would be a greater pool of potential servers to receive it than if they sent their money during a lower demand period.

After processing the transfer, the server sends a response back to the load balancer, which then routes it to the client.

Despite multiple steps, the request-response cycle typically completes in milliseconds. From the point of view of the bank account owner, their transaction was practically instantaneous.

Load Balancing Algorithms

Developers have several algorithms to choose from that distribute network flows differently. Below are some of the most common:

Round Robin

This load balancing algorithm cycles through the list of available servers, assigning requests sequentially. While it distributes them evenly by count, it does not consider factors like server speed or the request duration.

IP-Based

Also known as IP hash, this algorithm directs requests from the same user to the same server by generating a unique hash from the client's IP address to the chosen server. This is the ideal method for session persistence.

Weighted Round Robin

This algorithm routes network traffic based on the capacity of the servers. Those with a higher capacity have more requests routed to them.

Least Connection Method

This method balances traffic by routing requests to servers with the fewest active connections. It's particularly effective when sessions are of variable length or when some servers might overload faster than others.

Least Response Time Method

This algorithm compares response times and sends users to the servers that respond the fastest. It's best suited for latency-sensitive applications like real-time multiplayer gaming or interactive web conferencing, where milliseconds can impact the user experience.

Hardware vs. Software Load Balancers

This traffic-distribution technology is available as either hardware or software, each providing tradeoffs in reliability, performance, and cost.

Hardware Load Balancers

These are dedicated physical devices, historically valued for their specialized chips, dependability, speed, and high throughput.

The hardware type has higher initial costs and requires regular on-site maintenance or replacement when faulty. They're also difficult to scale up or down; as a result, they've become less prevalent in cloud-based setups.

Software Load Balancers

These are software solutions that run on virtual machines or general-purpose servers. They're highly scalable and compatible with modern cloud and virtualized environments.

Hardware implementations are still well-suited for enterprise data centers where maximum efficiency, reliability, and security are mandatory.

In contrast, running software is best suited for organizations that require cost-effective application scalability and agility, such as with workloads that fluctuate or when rapid deployment is necessary.

Benefits of Load Balancing

This process provides numerous benefits for both infrastructure and end-user experiences. These include:

Improved Application Performance and Fault Tolerance

By intelligently forwarding network traffic to multiple servers, you prevent resource bottlenecks. This provides low latency and reliability while optimizing resource utilization.

It can also prevent downtime by redirecting workloads to other available servers when one fails.

High Scalability

This process makes it easy to add more servers to the pool to meet spikes in traffic and keep it flowing smoothly. When demand drops, teams can save money by removing excess servers.

Enhanced Security

Load balancers act as the first defense against distributed denial of service (DDoS) and similar cyber threats. You can configure them to filter abnormal or suspicious requests before they reach the backend.

Predictable Planning

By providing consistent resource usage, load balancers simplify the forecasting of traffic patterns, which can help your organization better plan for capacity and growth with a lower risk of overprovisioning resources.

Load Balancer Types

There are different device types that distribute network traffic in specialized ways. These include:

Application Load Balancers

Designed for cloud platforms, the application type operates on Layer 7 of the Open Systems Interconnection (OSI) model, with routing based on headers, URL paths, or cookies. They use configurable rules and policies to direct traffic, making them suitable for modern web applications, microservices, and APIs.

DNS Load Balancers

This type distributes traffic at the DNS level. When a client sends a request, the DNS server responds with different server IP addresses in response to DNS queries for a single domain.

There are four methods:

Round Robin DNS: This uses the authoritative name server, which holds DNS records called A and AAAA records. Whenever the client sends a query, it cycles through a list of A or AAAA records for a domain, returning a server each time to spread out requests.
Weighted DNS: In this method, each server's IP is assigned a weight, with more powerful servers appearing more frequently in DNS responses and receiving more traffic.
Geolocation: DNS responses are based on the client's geographic location. When a user sends a request, the DNS server determines the user's geographic location based on their IP address. The DNS server then returns the IP address of the nearest server.
Health Check-Based: This method involves health monitoring and prioritizes the IP addresses of the servers that are online and responsive.

Network Load Balancers

This network type operates on Layer 4 (transport layer) of the OSI model. They receive requests and route them based on network data, such as IP addresses and TCP/UDP ports.

They're fast and efficient for high-throughput scenarios like livestreaming and online gaming. More advanced solutions can perform packet inspection.

Global Server Load Balancers

This type distributes traffic to a geographically distributed network of interconnected servers. Traffic is usually directed to the optimal data center based on proximity, latency, and load.

This method works well for large enterprise setups with global data centers and high-volume throughput.

Virtual Load Balancers

Virtual types use software to distribute traffic within virtualized or cloud environments without using dedicated hardware. Depending on the product, they can operate at Layer 4, Layer 7, or both. Examples include AWS Elastic Load Balancer, Azure Load Balancer, and VMware Avi Load Balancer.

Load Balancing Best Practices

Below are some best practices for optimal results:

Use Monitoring Tools

Actively monitor both server and load balancer performance. Use tools like Prometheus, Grafana, or Datadog to track health, network speeds, and error rates. It's also crucial to configure alerts to detect anomalies that may need quick responses.

Choose the Right Type

Always select the tool type based on the application's specific needs, such as scaling, latency sensitivity, and throughput.

For instance, you should pick the:

Network type if you need fast, high-throughput workloads.
Application type for HTTP-based apps with complex routing.
Global server type when you have a worldwide user base.

Leverage Autoscaling

To handle fluctuation more efficiently, implement autoscaling to automatically add or drop servers. Configure the load balancer to detect and route traffic to the new servers, maintaining optimal performance during spikes or drops.

Implement Caching Where Appropriate

Use caching mechanisms like in-memory data stores, content delivery networks (CDNs), and reverse proxies. Caching static or frequently accessed content reduces the number of requests reaching backend servers, improves response times, and reduces load.

Encrypt Traffic

Set your load balancer up to handle SSL/TLS termination, which decrypts requests before they are forwarded to the backend. You can use passthrough mode for end-to-end encryption.

You must also regularly update and manage SSL certificates to prevent spoofing, compliance violations, and other potential security issues.

Frequently Asked Questions

What Are the Two Types of Load Balancing?

The two main load balancing algorithms are static and dynamic.

Static load balancing algorithms distribute network traffic according to predefined rules and do not factor in server status or workload. In contrast, dynamic load balancing algorithms rely on server status, taking into account response times, current load, and more.

What Happens if a Load Balancer Goes Down?

When one of these systems fails, the service becomes unavailable, as requests cannot be routed to the backend servers.

To prevent this, you should implement a redundant or highly available setup, such as active-passive failover or clustering. With this setup, whenever one goes down, another can take over immediately.

What Are the Disadvantages of Load Balancing?

This process has challenges, including added configuration complexity and the risk of a single point of failure.

Load balancers have several configuration options that all play a role in performance, like timeouts, health checks, algorithms, and failover settings. When improperly set up, your application may experience bottlenecks, downtime, uneven traffic distribution, or total system failure.

Additionally, deploying and maintaining the necessary tools can increase infrastructure costs.

What’s the Difference Between Load Balancing and Caching?

Load balancing is the process of distributing traffic across backend servers to prevent one from being overloaded. Specialized hardware or software directs client requests to the most appropriate servers.

Caching is the process of temporarily storing data for quick access and to minimize server requests.

What Is Load Balancing in APIs?

Load balancing API traffic is essentially the same process.

Consider a popular dating app that uses a chat API to facilitate communication between potential matches.

When a user sends a message, the app delivers its request to the API provider’s load balancer, which directs it to the right server based on the dev team’s rules, chosen algorithm, and more.

The server’s response is usually returned through the same tool to the app, presenting the user with a sent indicator or an error message once processed.

Along with API throttling and rate limiting, this is one of the many methods developers can use to scale to meet customer needs as their product becomes more popular.