Are the Cores Worth It?
Introduction
Recently, a hosting client of ours has been needing to upgrade their system to handle spikes in traffic. We offered several choices, including options using a pair of quad-core Xeon CPUs. Our client did some googling around and found a benchmark showing that Apache couldn't use all the cores in a dual quad-core system. I was a bit surprised by these results, until I noticed two things: the domain the review was at included the word "game", and a quick look at the review showed it was running Apache on Windows. A gamer site I figured would be much more oriented towards desktop applications than web servers (indeed all the other applications were desktop apps or transcoders). Since we only host Linux machines, I figured the Windows results wouldn't be very useful either. Worse, the review included precious little detail about what sort of content was being served, where the benchmark was being run (on the server or on a dedicated test machine), etc... However, there's been a lot of press over the last 6 months crying "How can anyone use 4 or 8 cores?" Sure, on a desktop system that may very well be true. However, a web server handling 200 simultaneous remote users is a very different thing. Given my knowledge about using Apache on Linux, and all these unknowns, and hoping to be able to provide some guidance to this client, I set aside a day to get to the bottom of this.The Short Answer
The basic question is: Can Apache make use of all 8 cores in on a system with a pair of quad-core CPUs? Here's what "top" thinks:
In short, all 8 cores have 0% CPU time spent idle, with 8 instances of
Apache being the top CPU user. More detailed benchmark results follow.
The Hardware
These tests are entirely funded by tummy.com, and I don't have access to a huge set of hardware. I'm mostly operating from what I have on hand. I'm comparing:-
Two Dual-Core Intel 5130 2.0GHz "Woodcrest" CPUs, street price
$350 each.
Two Quad-Core Intel E5310 1.6GHz "Clovertown" CPUs, street price
$360 each.
The Software
The server is running Fedora Core 6 as the operating system. This is for many reasons: it's got recent kernel and other software, and it's roughly the same bits that are released in Red Hat Enterprise 5. CentOS 5, the community rebuild of Red Hat Enterprise, is not yet available. CentOS is normally what I'd put on a server. The system was running the stock Apache for Fedora, version 2.2.3. Apache was configured with the prefork worker, configured to start and always run 200 servers, so forking time would not be involved in the test. Early tests I ran with Apache serving up static pages, but Apache ended up using just 25% of one of the 4 cores, while saturating my test client node's CPU. What I was really interested in, however, was cases where Apache was serving up dynamic content, which would likely be more CPU bounded than disc or network limited. I set up Apache with mod_python (since I'm familiar with Python) and configured it to use the Pylons web application framework. I created a simple application in Pylons with a route to the index page, calling a controller that ran "time.localtime(time.time())" and stored it in the context. It then rendered a view which displayed a short message with the current time, wrapped inside a simple "autohandler" template which displayed the message "Autohandler called". In other words, a simple application, but relying on a lot of moving parts provided by Pylons. Indicative of a dynamic application written in mod_perl, or PHP, though using a fairly heavy-weight framework. The time.time() call involves a system call back into the kernel, just as any I/O normally would in a web application.The Results
For every configuration I would reboot from scratch, and then on the test client run a small test run to make sure everything was started up. Then I would run the "ab" (Apache Benchmark) program with 50,000 requests at a concurrency of 25, 75, and 150.Graph: Requests Per Second
In this case, I just averaged the request per second over the 3 runs (the different concurrency values resulted in similar request/second values).
Partial E5310 (1.6GHz) 1-core332
Partial 5130 (2.0GHz) 1-core404
Single E5310 (1.6GHz) 4-cores1128
Dual 5130 (2.0GHz) 4-cores1356
Dual E5310 (1.6GHz) 8-cores2104
Values are requests handled per second, larger values are better.
For the "Partial" (single core) results, I rebooted the system with
the kernel option "maxcpus=1" to cause the OS to only see a single of the
cores. All other tests were run without "maxcpus" set, with just a single
or pair of CPUs in the system.
These results indicate that Apache on Linux is clearly able to take
advantage of 8 CPUs for this workload. The 2xQuad CPUs are quite a lot
faster than the 2xDual CPUs, even though the dual CPUs are running at a
higher clock rate.
Also note that the efficiency per core goes down as the number of
cores goes up. The 8 core results are only around 6.3 times faster than
the benchmark results for a single one of those cores. 100% scalability
would put that number at 8 times. Note that the 4 core CPU only does
marginally better, a 3.4 times a single core. This is the subject of my
next graph.
Graph: Efficiency
For this graph, I took the above numbers and divided by the core speed in gigahertz totaled among the cores. So the dual quad 1.6GHz would result in a division by 12.8.
Partial E5310 (1.6GHz) 1-core208
Partial 5130 (2.0GHz) 1-core202
Single E5310 (1.6GHz) 4-cores176
Dual 5130 (2.0GHz) 4-cores170
Dual E5310 (1.6GHz) 8-cores164
Values are in requests per second per gigahertz. Larger values are
better.
As you can see, the work done per gigahertz goes down pretty
dramatically as the number of cores goes up. However, the pair of quad
core CPUs is only a few percent less efficient than the pair of dual core
CPUs at a higher clock speed.