Your Linux Data Center Experts

I've been running some tests with real, live data on the ZFS-FUSE devel branch that I mentioned previously I am testing.

My initial tests were performing rather poorly, in the area of 1MB/sec for 250GB. In other words, it took 3 days to rsync 250GB. However, a conveniently-timed thread on the ZFS-FUSE mailing list saved the day. The remainder of this message includes some tools and techniques for determining how big your ARC needs to be to get good performance.

First of all, you can run the following command to get a histogram of the deduplication effectiveness: “zdb -DD poolname”. For example:

root@zfsbackup1:~# zdb -DD backups
DDT-sha256-zap-duplicate: 597405 entries, size 346 on disk, 147 in core
DDT-sha256-zap-unique: 1829460 entries, size 355 on disk, 151 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    1.74M    172G    155G    155G    1.74M    172G    155G    155G
     2     511K   30.5G   27.6G   27.7G    1.03M   61.7G   55.7G   55.8G
     4    52.5K   1.22G    901M    917M     248K   5.46G   3.95G   4.02G
     8    8.67K    174M    108M    111M    85.1K   1.74G   1.09G   1.12G
    16    4.76K   64.9M   36.8M   38.4M     110K   1.41G    823M    861M
    32    6.05K   42.4M   22.5M   24.6M     272K   1.86G   1011M   1.08G
    64      427   18.8M   16.6M   16.7M    37.0K   1.52G   1.35G   1.35G
   128       89    751K    674K    701K    15.2K    118M    105M    109M
   256       48   1.80M   1.65M   1.66M    17.8K    692M    632M    637M
   512       54   3.85M   3.60M   3.60M    33.8K   2.25G   2.08G   2.08G
    1K       10    133K    133K    136K    14.5K    172M    172M    176M
    2K        6    130K    130K    132K    16.4K    361M    361M    364M
    4K        6      3K      3K   4.79K    31.2K   15.6M   15.6M   24.9M
    8K        2     12K      5K   5.59K    16.2K   96.9M   40.4M   45.2M
   16K        3   1.50K   1.50K   2.40K    66.4K   33.2M   33.2M   53.0M
 Total    2.31M    204G    183G    183G    3.71M    250G    222G    222G

dedup = 1.21, compress = 1.13, copies = 1.00, dedup * compress / copies = 1.36

root@zfsbackup1:~#

This tells us that we have 597405+1829460 deduplication entries. That's 2426865, or the 2.31M in the “Total” line – the total there seems to be reporting Mebientries instead of millions, for some unfathomable reason. 1.74MiE (Mebientries) are not duplicated, and 3 entries are duplicated 16K or more times (66.4K times total for those 3 blocks, referring to the “referenced blocks” column).

Each entry takes 250 bytes, so 2,426,865*250 = 578MiB of RAM.

My problem in my testing was that I was using the default ARC size of 100MiB, or enough for at most 400K entries. But by default the ARC also caches data pages, so my cache was running out of steam pretty quickly.

You can run “zfs set primarycache=metadata poolname” to cause it to only cache meta-data, like the dedup hashes.

Now, I also was running with “dedup=verify”, which means that in addition to reading the dedup hashes, I also needed to read the data blocks. So I probably did need to cache data and metadata blocks. Though I would kind of expect that I would only need to read the blocks that hashed to something already in the DDT, in which case it's probably a duplicate and we'll be reading the data block and then not doing a write, rather than just doing a write. So, I wouldn't expect THAT much of a hit in performance doing that.

In the end, I increased my ARC, via an /etc/zfs/zfsrc file entry, from the default 100MB to 4500MB (I have 6GB of RAM on this test system). With that change, my deduplication performance has gone up to acceptable levels. Which is great, because deduplication (at 1.21:1) seems to be slightly more effective on this test data-set than compression (1.19:1, for a combined rate of 1.44:1).

Much of this information is thanks to watashki on the ZFS-FUSE mailing list. I just wanted to write it down in a slightly more tuner-friendly manner than what is on the list.

comments powered by Disqus

Join our other satisfied clients. Contact us today.