Your Linux Data Center Experts

Are the Cores Worth It?

By  Sean Reifschneider Date March 17, 2007

Introduction

Recently, a hosting client of ours has been needing to upgrade their system to handle spikes in traffic. We offered several choices, including options using a pair of quad-core Xeon CPUs. Our client did some googling around and found a benchmark showing that Apache couldn't use all the cores in a dual quad-core system.

I was a bit surprised by these results, until I noticed two things: the domain the review was at included the word "game", and a quick look at the review showed it was running Apache on Windows. A gamer site I figured would be much more oriented towards desktop applications than web servers (indeed all the other applications were desktop apps or transcoders). Since we only host Linux machines, I figured the Windows results wouldn't be very useful either.

Worse, the review included precious little detail about what sort of content was being served, where the benchmark was being run (on the server or on a dedicated test machine), etc...

However, there's been a lot of press over the last 6 months crying "How can anyone use 4 or 8 cores?" Sure, on a desktop system that may very well be true. However, a web server handling 200 simultaneous remote users is a very different thing.

Given my knowledge about using Apache on Linux, and all these unknowns, and hoping to be able to provide some guidance to this client, I set aside a day to get to the bottom of this.

The Short Answer

The basic question is: Can Apache make use of all 8 cores in on a system with a pair of quad-core CPUs? Here's what "top" thinks:

In short, all 8 cores have 0% CPU time spent idle, with 8 instances of Apache being the top CPU user. More detailed benchmark results follow.

The Hardware

These tests are entirely funded by tummy.com, and I don't have access to a huge set of hardware. I'm mostly operating from what I have on hand. I'm comparing:

Notice the price similarity between the CPUs. While these may sound slow to many of you used to 3.2+GHz CPUs, remember that these are the latest Intel CPUs which do more work in fewer clocks. 2.0GHz in these CPUs is more like a 3.2GHz Xeon EM64T or 5000-series CPU.

This is running on a SuperMicro X7DB3 motherboard with 4GB of 667MHz FB-DIMM ECC DDR2 SDRAM (2x2GB).

The client machine is a Celeron 3.2GHz connected over 100mbps ethernet (peak traffic during the tests was around 4mbps, so I didn't bother to move it over to my gigabit switch).

The Software

The server is running Fedora Core 6 as the operating system. This is for many reasons: it's got recent kernel and other software, and it's roughly the same bits that are released in Red Hat Enterprise 5. CentOS 5, the community rebuild of Red Hat Enterprise, is not yet available. CentOS is normally what I'd put on a server.

The system was running the stock Apache for Fedora, version 2.2.3. Apache was configured with the prefork worker, configured to start and always run 200 servers, so forking time would not be involved in the test.

Early tests I ran with Apache serving up static pages, but Apache ended up using just 25% of one of the 4 cores, while saturating my test client node's CPU.

What I was really interested in, however, was cases where Apache was serving up dynamic content, which would likely be more CPU bounded than disc or network limited.

I set up Apache with mod_python (since I'm familiar with Python) and configured it to use the Pylons web application framework. I created a simple application in Pylons with a route to the index page, calling a controller that ran "time.localtime(time.time())" and stored it in the context. It then rendered a view which displayed a short message with the current time, wrapped inside a simple "autohandler" template which displayed the message "Autohandler called".

In other words, a simple application, but relying on a lot of moving parts provided by Pylons. Indicative of a dynamic application written in mod_perl, or PHP, though using a fairly heavy-weight framework. The time.time() call involves a system call back into the kernel, just as any I/O normally would in a web application.

The Results

For every configuration I would reboot from scratch, and then on the test client run a small test run to make sure everything was started up. Then I would run the "ab" (Apache Benchmark) program with 50,000 requests at a concurrency of 25, 75, and 150.

Graph: Requests Per Second

In this case, I just averaged the request per second over the 3 runs (the different concurrency values resulted in similar request/second values).

Partial E5310 (1.6GHz) 1-core332 Partial 5130 (2.0GHz) 1-core404 Single E5310 (1.6GHz) 4-cores1128 Dual 5130 (2.0GHz) 4-cores1356 Dual E5310 (1.6GHz) 8-cores2104

Values are requests handled per second, larger values are better.

For the "Partial" (single core) results, I rebooted the system with the kernel option "maxcpus=1" to cause the OS to only see a single of the cores. All other tests were run without "maxcpus" set, with just a single or pair of CPUs in the system.

These results indicate that Apache on Linux is clearly able to take advantage of 8 CPUs for this workload. The 2xQuad CPUs are quite a lot faster than the 2xDual CPUs, even though the dual CPUs are running at a higher clock rate.

Also note that the efficiency per core goes down as the number of cores goes up. The 8 core results are only around 6.3 times faster than the benchmark results for a single one of those cores. 100% scalability would put that number at 8 times. Note that the 4 core CPU only does marginally better, a 3.4 times a single core. This is the subject of my next graph.

Graph: Efficiency

For this graph, I took the above numbers and divided by the core speed in gigahertz totaled among the cores. So the dual quad 1.6GHz would result in a division by 12.8.

Partial E5310 (1.6GHz) 1-core208 Partial 5130 (2.0GHz) 1-core202 Single E5310 (1.6GHz) 4-cores176 Dual 5130 (2.0GHz) 4-cores170 Dual E5310 (1.6GHz) 8-cores164

Values are in requests per second per gigahertz. Larger values are better.

As you can see, the work done per gigahertz goes down pretty dramatically as the number of cores goes up. However, the pair of quad core CPUs is only a few percent less efficient than the pair of dual core CPUs at a higher clock speed.

Compared with Clusters

Most of our clients who are running high traffic sites are using a number of smaller machines, behind a load-balancer. There are many benefits to this, including that once it's set up you can easily scale it to additional machines (unless you run into some bottleneck, like the database). Simply adding more machines increases the capacity.

Also, if one of the web serving machines fails, you only lose a bit of capacity, not all access. In our environment, we mitigate this by having spare hardware on hand so that a hardware replacement doesn't rely on a vendor sending out a tech, etc...

The cluster solution also increases complexity. Typical configurations require a load-balancer up front (another machine to pay for, configure, and maintain) and if you are worried about a service outage that would usually be a pair of machines running high availability (more cost and complexity). Data also needs to be distributed among the cluster, log and session data needs be be consolidated, etc...

In the past, deploying large machines has been amazingly expensive. A 4 CPU system could easily run 5 times as expensive as a 2 CPU system, and an 8 CPU system could quickly run 10 or 20 times more.

Conclusions

Apache under Linux can absolutely take advantage of 8 CPU cores when running processor-intensive tasks, such as dynamic page generation. The quad core CPUs provide 55% more performance for generating these dynamic pages than a similarly-priced pair of dual core CPUs.

Clearly, if your task requires completing a single simple job as quickly as possible, fewer cores with higher clock rates will be best. However, for multi-user server loads, the dual quad core solution is very attractive.

The quad core CPUs provide a cost-effective alternatives to clusters of smaller machines.

Shameless Plug

tummy.com, ltd. provides expert-level Linux managed hosting services out of our world-class facility located in Denver, Colorado. See the menu on the upper left side of this page for more information on our hosting services.

comments powered by Disqus