Tonight at our LUG we had a great presentation about building a large scale LTSP network. As part of this they needed to spread the load out across a number of different machines. But they didn't have the ability to deploy a traditional stand-alone load-balancer.
I mentioned options of using CLUSTERIP or unifying the load-balancer with the application machines, so that a dedicated load-balancer isn't needed. I wanted to give some more information about these options because they aren't as well known as the more traditional methods. Read on for more information.
CLUSTERIP is an iptables module which allows the same IP address to be set up on multiple machines. It uses a multicast MAC address. All machines receive the incoming requests, but those based on a hashing algorithm all but one of the machines ignore the requests. You can specify hashing to be based on the source IP, IP/port, or source IP/port and destination port.
Up front you have to specify how many nodes are in the CLUSTERIP cluster, and a node number for each node that it runs on. So, adding or removing nodes requires either a complete restart of the CLUSTERIP rules, or to set up enough CLUSTERIP node rules to handle expansion, but some clusters will have multiple node numbers, perhaps an uneven number.
Distributing the CLUSTERIP rules is probably best done by linux-ha.
However, you can also run a traditional load-balancer to distribute the load. There is a recipe for this topology which has the load-director running on the same node as the services at the Ultramonkey site.
This is mentioned for use in two-node clusters serving both as load-balancers and handling requests. However, it should be possible to have more than just the two service nodes. The benefit of this mechanism is that you can use the normal load-balancing algorithms, and spread the load unevenly, add and remove nodes, etc…
The method that they selected was to set up a DHCP server on each service node, and have each one have 1/Nth the number of leases. Once one server fills up, leases from the next will start being used. This is a fine solution, and is quite simple (which is good), but it may also limit some flexibility and performance that you might otherwise see if the load were more evenly split based on actual usage.
Considering the tight timeline that they were under in the deployment being presented about, the DHCP mechanism is probably the best solution. However, I did want to mention that there were other alternatives.comments powered by Disqus