# Hachyderm Load Balancing
### The "professional" Solution
If you want to avoid any single point of failure, you would probably end up with an ingress architecture like this:

The infrastructure is connected to the internet via two routers, which have indepent uplinks. Both routers announce the same IP range to the internet via BGP. If one router goes down, the BGP session to the ISP times out and the route via the failed router will be removed from the ISP's routers. All traffic from the internet to our infrastructure would flow via the remaining router, as this one still has an established BGP connection to the ISP.
Connected to the routers is a pair of HTTP load balancers. Every load balancer announces the IP address `63.228.108.114/32` to the routers. Routers do not implement special health checks to detect if a peer is still online. Instead they rely on BGP sessions. Network participants announce prefixes via BGP. As soon as a BGP session breaks down, a router will remove all routes it has received from this specific peer. If a router receives routes for the same destination from multiple peers, it will distribute traffic over those peers using round-robin. This is called Equal Cost Multi Path (ECMP). In the drawing above, both load balancers would receive an equal amount of traffic for `63.228.108.114`.
You can use nginx, haproxy etc. to build the load balancers. The load balancers can offload SSL. And you can implement different traffic distribution mechanisms. The simplest solution would be to distribute every connection round-robin to the servers. If you want to have *sticky sessions* the HTTP requests need to provide a session identifier (e.g. a Session Cookie). This session identifier can be used by the load balancer to identify the server, this request should be forwarded to. The load balancer itself can inject a Cookie into the HTTP responses, which holds information about the target server. So for every following request, the same server would be hit. Note, that session cookies are only necessary if you want to have stickiness over different TCP connections! A TCP connection is identified by its 5-tuple: Source IP, Destination IP, Protocol (TCP), Source Port and Destination Port. HTTP/1.1 keeps TCP connections open to allow multiple requests to be sent to the server without requiring a new handshake for every request. HTTP/2 allows multiple requests in parallel over the same TCP connection. You only need Sticky Sessions if you want to hit the same server, after establishing a *new* TCP connection, which has a different 5-tuple.
### Single Load Balancer instance
The above concept avoids single points of failure. But this comes at a price: You need a pair of uplinks, a pair of routers and a pair of load balancers. When you cut this in half, you would end up with this solution:

You introduced three single point of failures: The uplink, the router and the load balancer.
### Layer 4 Load Balancing

If you do not require session stickyness, you could remove the load balancer and let the router (or in our case the pfSense) directly send the packets to the servers. There are two options how to forward the requests to the servers:
1. Routing - Every server would listen to the public IP `63.228.108.114` and serve traffic directly.
2. NAT Port Forwarding with load balancing. The pfSense would do a DNAT and load balance the requests to the servers. pfSense also supports health checks on the endpoints. So if one server goes down, pfSense would remove it from the target pool.
### Sticky Sessions Load Balancing
If you like the last architecture, but you really want to have sticky sessions, you could put small load balancer instances on every server. The pfSense would distribute the requests evenly over all servers. But the requests would not directly hit the web app, but the load balancer instance. The load balancer then decides, according to the session cookie, to which server this request will be forwarded.

This solution comes with a CPU and networking overhead for every server. But the requests will not hit the disk and stay in the RAM. I don't expect a lot of additional load to the system using this architecture.