Recently, a hosting client of ours has been needing to upgrade their
system to handle spikes in traffic. We offered several choices, including
options using a pair of quad-core Xeon CPUs. Our client did some googling
around and found a benchmark showing that Apache couldn't use all the cores
in a dual quad-core system.
I was a bit surprised by these results, until I noticed two things:
the domain the review was at included the word "game", and a quick look at
the review showed it was running Apache on Windows. A gamer site I figured
would be much more oriented towards desktop applications than web servers
(indeed all the other applications were desktop apps or transcoders).
Since we only host Linux machines, I figured the Windows results wouldn't
be very useful either.
Worse, the review included precious little detail about what sort of
content was being served, where the benchmark was being run (on the server
or on a dedicated test machine), etc...
However, there's been a lot of press over the last 6 months crying
"How can anyone use 4 or 8 cores?" Sure, on a desktop system that may very
well be true. However, a web server handling 200 simultaneous remote users
is a very different thing.
Given my knowledge about using Apache on Linux, and all these
unknowns, and hoping to be able to provide some guidance to this client, I
set aside a day to get to the bottom of this.
The Short Answer
The basic question is: Can Apache make use of all 8 cores in on a
system with a pair of quad-core CPUs? Here's what "top" thinks:
In short, all 8 cores have 0% CPU time spent idle, with 8 instances of
Apache being the top CPU user. More detailed benchmark results follow.
These tests are entirely funded by tummy.com, and I don't have access
to a huge set of hardware. I'm mostly operating from what I have on hand.
Two Dual-Core Intel 5130 2.0GHz "Woodcrest" CPUs, street price
Two Quad-Core Intel E5310 1.6GHz "Clovertown" CPUs, street price
Notice the price similarity between the CPUs. While these may sound
slow to many of you used to 3.2+GHz CPUs, remember that these are the
latest Intel CPUs which do more work in fewer clocks. 2.0GHz in these CPUs
is more like a 3.2GHz Xeon EM64T or 5000-series CPU.
This is running on a SuperMicro X7DB3 motherboard with 4GB of
667MHz FB-DIMM ECC DDR2 SDRAM (2x2GB).
The client machine is a Celeron 3.2GHz connected over 100mbps ethernet
(peak traffic during the tests was around 4mbps, so I didn't bother to move
it over to my gigabit switch).
The server is running Fedora Core 6 as the operating system. This is
for many reasons: it's got recent kernel and other software, and it's
roughly the same bits that are released in Red Hat Enterprise 5. CentOS 5,
the community rebuild of Red Hat Enterprise, is not yet available. CentOS
is normally what I'd put on a server.
The system was running the stock Apache for Fedora, version 2.2.3.
Apache was configured with the prefork worker, configured to start and
always run 200 servers, so forking time would not be involved in the test.
Early tests I ran with Apache serving up static pages, but Apache
ended up using just 25% of one of the 4 cores, while saturating my test
client node's CPU.
What I was really interested in, however, was cases where Apache was
serving up dynamic content, which would likely be more CPU bounded than
disc or network limited.
I set up Apache with mod_python (since I'm familiar with Python) and
configured it to use the Pylons web application framework. I created a
simple application in Pylons with a route to the index page, calling a
controller that ran "time.localtime(time.time())" and stored it in the
context. It then rendered a view which displayed a short message with the
current time, wrapped inside a simple "autohandler" template which
displayed the message "Autohandler called".
In other words, a simple application, but relying on a lot of moving
parts provided by Pylons. Indicative of a dynamic application written in
mod_perl, or PHP, though using a fairly heavy-weight framework. The
time.time() call involves a system call back into the kernel, just as any
I/O normally would in a web application.
For every configuration I would reboot from scratch, and then on the
test client run a small test run to make sure everything was started up.
Then I would run the "ab" (Apache Benchmark) program with 50,000 requests
at a concurrency of 25, 75, and 150.
Graph: Requests Per Second
In this case, I just averaged the request per second over the 3 runs
(the different concurrency values resulted in similar request/second
Values are requests handled per second, larger values are better.
For the "Partial" (single core) results, I rebooted the system with
the kernel option "maxcpus=1" to cause the OS to only see a single of the
cores. All other tests were run without "maxcpus" set, with just a single
or pair of CPUs in the system.
These results indicate that Apache on Linux is clearly able to take
advantage of 8 CPUs for this workload. The 2xQuad CPUs are quite a lot
faster than the 2xDual CPUs, even though the dual CPUs are running at a
higher clock rate.
Also note that the efficiency per core goes down as the number of
cores goes up. The 8 core results are only around 6.3 times faster than
the benchmark results for a single one of those cores. 100% scalability
would put that number at 8 times. Note that the 4 core CPU only does
marginally better, a 3.4 times a single core. This is the subject of my
For this graph, I took the above numbers and divided by the core
speed in gigahertz totaled among the cores. So the dual quad 1.6GHz would
result in a division by 12.8.
Values are in requests per second per gigahertz. Larger values are
As you can see, the work done per gigahertz goes down pretty
dramatically as the number of cores goes up. However, the pair of quad
core CPUs is only a few percent less efficient than the pair of dual core
CPUs at a higher clock speed.
Compared with Clusters
Most of our clients who are running high traffic sites are using a
number of smaller machines, behind a load-balancer. There are many
benefits to this, including that once it's set up you can easily scale it
to additional machines (unless you run into some bottleneck, like the
database). Simply adding more machines increases the capacity.
Also, if one of the web serving machines fails, you only lose a bit of
capacity, not all access. In our environment, we mitigate this by having
spare hardware on hand so that a hardware replacement doesn't rely on a
vendor sending out a tech, etc...
The cluster solution also increases complexity. Typical
configurations require a load-balancer up front (another machine to pay
for, configure, and maintain) and if you are worried about a service outage
that would usually be a pair of machines running high availability (more
cost and complexity). Data also needs to be distributed among the cluster,
log and session data needs be be consolidated, etc...
In the past, deploying large machines has been amazingly expensive. A
4 CPU system could easily run 5 times as expensive as a 2 CPU system, and
an 8 CPU system could quickly run 10 or 20 times more.
Apache under Linux can absolutely take advantage of 8 CPU cores when
running processor-intensive tasks, such as dynamic page generation. The
quad core CPUs provide 55% more performance for generating these dynamic
pages than a similarly-priced pair of dual core CPUs.
Clearly, if your task requires completing a single simple job as
quickly as possible, fewer cores with higher clock rates will be best.
However, for multi-user server loads, the dual quad core solution is very
The quad core CPUs provide a cost-effective alternatives to clusters
of smaller machines.
tummy.com, ltd. provides expert-level Linux managed hosting services
out of our world-class facility located in Denver, Colorado. See the menu
on the upper left side of this page for more information on our hosting
comments powered by Disqus