Your Linux Data Center Experts

IPVS load-balancers work great, but typically they are deployed as one or more usually two additional machines. For people wanting to deploy a 2 machine cluster and grow it, or even start with one machine but configure it to grow, this additional set of machines is a large overhead.

A much less frequently used option, but one that has been around forever, is the CLUSTERIP iptables module. This module works by allowing multiple machines to cooperatively have the same IP address, but only one will handle a specific connection or client machine.

Read on for more details about this little used, but very useful clustering mechanism.

Overview

CLUSTERIP is available in many stock kernels, including CentOS 5 and Ubuntu Hardy. This is an iptables module, which uses a hash of incoming packets and if the hash matches the local node identifier, the packet is passed, otherwise the packet is dropped.

This accomplished by using a multicast MAC address so that multiple machines all receive the packets for a single IP address.

So, multiple machines have the same IP, and get all the incoming packets, but only one of the systems allows those packets through the firewall.

Getting Started

First of all you will need to have the IP address up on all machines of the cluster. For example, by editing the /etc/sysconfig/network-scripts or /etc/network/interfaces configuration files. In these examples we'll use the IP address 10.31.3.236.

Now set up iptables rules for this IP, for example:

iptables -I INPUT -d 10.31.3.236 -j CLUSTERIP --new \
   --hashmode sourceip --clustermac 01:aa:7b:47:f7:d7 \
   --total-nodes 2 --local-node 1

That's on the first node, on the second node use “–local-node 2”.

Now you just need to put up the services on the machines for response.

You can check the status of the CLUSTERIP by reviewing the output of “/proc/net/ipt_CLUSTERIP/10.31.3.236”, it'll say “1” on the primary node, or be blank if it's not up.

You can also modify the CLUSTERIP while it's up by doing “echo +1” or “echo -1” to add or remove node 1 from the local machine.

Steady IP

One nice thing about CLUSTERIP is that the IP is always up on both machines, so, you don't have to do any tweaks with restarting or reconfiguring services as the IP address comes or goes, as you would with a traditional HA where the IP address moves around.

This can also be achieved with IPVS by using the infrequently used “direct routing” configuration.

Very fast fail over/back

If you want to move a service from one node to another, it is accomplished simply by echoing the node number out to the ipt_CLUSTERIP file as mentioned above. No ifconfig, no sending of gratuitous ARPs, no restarting of named or other services.

No return-packet processing

Usually an IPVS cluster would handle all outbound response packets, leading to additional loading of your central load-balancer. The CLUSTERIP setup results in all response packets going out directly to the destination.

This can also be accomplished by using IPVS direct routing.

All nodes receive all packets

This is the weakness of CLUSTERIP. Usually it's not a problem, but if you were under a DDoS all of the machines are going to be processing all of these packets, even if they are mostly rejected.

Of course, it's important to realize that an IPVS cluster also suffers from this, it's just that this issue is seen only on the load-balancer, not on the end application nodes.

This also means that if your service is highly inbound oriented, such as file uploads, the bandwidth will be consumed on all of the cluster members links – meaning that 10 machines with gigabit will sill only have an aggregate of 1 gigabit of incoming bandwidth.

So, CLUSTERIP will not scale indefinitely because of these issues. But IPVS is going to suffer from these issues to some extent as well.

Our test setup

I've been wanting to try this out for years, it's been around a very long time but we've just never had any client demand for it.

Recently we've been adjusting some of our hosting infrastructure, including moving around some of our DNS servers. I decided to use CLUSTERIP on a test cluster for our primary DNS cache machines, and configure them as both load-balanced and highly available.

DNS does have quite a lot of ability to fail over to alternate servers built in, but if your primary servers aren't responding it just causes all sorts of other services and processes to slow down as they have to wait for the secondary servers to be tried. So having a primary resolver that is never down is good.

Not that our previous server was down much, I think the outage time was counted in minutes over the last 5 years. But this was still a nice opportunity to test out CLUSTERIP in the lab to see if we could roll it out into production.

For this use it worked out extremely nicely. We ended up also putting our tertiary DNS server for answering external queries on it as well. After testing an internal review, we rolled it out and it's all gone very smoothly.

The only real issue I ran into in testing is with our tertiary DNS server. It's IP address is in the same IP block as mirrors.tummy.com, and these servers were configured to pull updates from mirrors. So, the DNS cluster would try connecting to get updates via the CLUSTERIP address, which would only work on one of the machines. My solution to that was simply to use external mirror servers for the updates of these two machines.

In Conclusion

CLUSTERIP is a very nice system for implementing small to moderate size clusters, say of 2 to 20 machines handling maybe a quarter of a million incoming packets-per-second. Well worth the time to investigate and test.

comments powered by Disqus

Join our other satisfied clients. Contact us today.