Saturday April 21, 2007 at 17:54
Subject: Lazyweb: Streaming distributed disc sectors.
Keywords:
I/O, Lazyweb, Technical
Posted by: Sean Reifschneider
Dear Lazyweb, I would like a devicemapper module kind of like the
device snapshot module, which would memorize disc access patterns and then
store these sectors sequentially. In the future, this disc access pattern
could be replayed, streaming from the disc instead of having to seek all
over the place.
The reasoning for this is that if a disc sector is, say, 1KB, and a
disc can seek 120 times per second (8.5ms average seek time), worst case
performance when booting is around 120KB/sec on the disc. This same disc,
when streaming, can run around 55MB/sec.
If you memorize a common disc pattern, like the pattern that happens
when you are booting, and later replay those blocks into the buffer cache
by streaming instead of seeking, it could be a huge win. Thanks for
getting right on this, Lazyweb. More discussion follows.
One problem with disc storage is that data throughput goes way down
if you have to seek. If the average seek time on a hard drive is 8.5ms
(one 120th of a second), and the rotational speed is 7200RPM (120RPS).
If the file-system has a block size of 1KB, then accessing a bunch of
small pieces of data would be dominated by seek time, not by transfer
speed.
Now, the file-system will try to cluster common data close to each
other, but even small reads where the disc arm doesn't have to move very
far can take a lot of time. Depending on where the data is on a track,
even an short seek may need to wait 120th of a second for the disc to
rotate around to where the required data is. This is one reason that 15K
RPM discs tend to have much lower (roughly half) average seek times than
7200RPM discs.
The Linux disc I/O system takes advantage of this fact by over-reading
data. When it has a request for one sector of the disc, it will actually
read a bit more, called "read ahead". See the "-a" option of "hdparm"
for more information.
This is what I refer to as a data locality issue. It's particularly
noticeable in database. For example, I once had a database that I loaded
from data sorted by one key, and then was trying to access it based on
another key. The data was about twice the size of RAM, so I couldn't cache
it. Even with an index, the record size was around 20 bytes, but I was
limited to 120-ish seeks per second (or 120*20 bytes per second, 2.4KB/sec
of real data throughput per second).
It was actually much faster for me to have two databases, one
populated based on one key and one populated based on the other key.
I imagine it wouldn't be too hard to implement a device mapper similar
to how the snapshot mapper works. At boot time, the mapper could replay it's
history into the buffer cache. Some sort of control mechanism could be
used to tell the mapper that the end of recording is done, and cause the
memorized read pattern to be streamed onto the history device.
The history device could be another partition on the same device,
probably smaller in size than physical RAM, or it could instead be another
device. Say, a solid state disc, possibly even something like an SD or CF
card on a laptop (efm's Laptop has an SD slot built-in).
This would provide a general-purpose way of making use of the new
discs that include a small solid-state component and a larger traditional
spinning disc.
I don't expect, as a mapper, it would be particularly hard to
implement. I've had this idea kicking around for 6 months or so, I just
won't have the time to implement it. So, I'm posting it for Lazyweb to
implement. :-)
(Post Reply)
(Post Reply)
| Comment |
Chris Subject: iRAM |
If you're not trying to use much more than physical memory, you could just go with something like Gigabyte's iRAM, and set it up as an extra fast swap/cache.
http://techreport.com/reviews/2006q1/gigabyte-iram/index.x?pg=1
- Chris.
| Comment |
Author:
Sean Reifschneider Subject: The iRAM. |
As I said previously, this scheme would only really work when trying to use less than physical RAM worth of commonly-accessed data on the disc, not more. The iRAM and other solid state discs (including possibly the newer, cheaper, laptop-size drives in the 4-32GB range) may help with this as well, if used as primary storage. Just enabling them as a swap device won't help. As far as I know, you can't make Linux use a SSD as an extra "cache" device, the cache has to be main memory, and even if you increase the size you still have issues with the initial load of the cache. For example, during the initial boot. Part of why I proposed this solution was to speed up boot times.
Sean