Lately, Evelyn has come up with a very quotable, well, quote:
"Data is heavy." -- Evelyn Mitchell, 2011
We are frequently dealing with the storage, organization, and access of all the data we are accumulating. Part of this is that we can just store so much data, 3TB in a single 3.5" drive is plenty. But, we get less than 100 opportunities to access that per second (10+ms average access time).
Then we have to back it up, preferably off-site, and regularly re-read it to verify that it's good. And if the need to recover from backups arises, are you prepared for how long it takes to recover billions of files?
We're currently working with one client who wants the ability to roll back the entire system to a previous backup within an hour. However, one directory they have takes several hours just to run a "du" on...
I really wish Linux had some sort of hierarchical storage sub-system, but that just doesn't seem to be on anyone's radar. Something that could manage a pool of huge but slow hard drives, smaller but much faster hard drives, SSDs and non-volatile DRAM, automatically migrating chunks of data between the most appropriate storage technology, would be great.
Adaptec is taking a step in that direction with "MaxCache" and "Hybrid' RAID controllers, but both of these are strictly read-caching mechanisms. We also have the "flashcache" kernel driver, which can offer write-back caching to an SSD, but cannot survive a power failure or kernel crash without data corruption...
Meanwhile we just keep generating and storing more and more data.
So, yeah, data is heavy.comments powered by Disqus