I've been running some tests with real, live data on the ZFS-FUSE devel branch that I mentioned previously I am testing.
My initial tests were performing rather poorly, in the area of 1MB/sec for 250GB. In other words, it took 3 days to rsync 250GB. However, a conveniently-timed thread on the ZFS-FUSE mailing list saved the day. The remainder of this message includes some tools and techniques for determining how big your ARC needs to be to get good performance.
First of all, you can run the following command to get a histogram of the deduplication effectiveness: "zdb -DD poolname". For example:
root@zfsbackup1:~# zdb -DD backups DDT-sha256-zap-duplicate: 597405 entries, size 346 on disk, 147 in core DDT-sha256-zap-unique: 1829460 entries, size 355 on disk, 151 in core DDT histogram (aggregated over all DDTs): bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 1.74M 172G 155G 155G 1.74M 172G 155G 155G 2 511K 30.5G 27.6G 27.7G 1.03M 61.7G 55.7G 55.8G 4 52.5K 1.22G 901M 917M 248K 5.46G 3.95G 4.02G 8 8.67K 174M 108M 111M 85.1K 1.74G 1.09G 1.12G 16 4.76K 64.9M 36.8M 38.4M 110K 1.41G 823M 861M 32 6.05K 42.4M 22.5M 24.6M 272K 1.86G 1011M 1.08G 64 427 18.8M 16.6M 16.7M 37.0K 1.52G 1.35G 1.35G 128 89 751K 674K 701K 15.2K 118M 105M 109M 256 48 1.80M 1.65M 1.66M 17.8K 692M 632M 637M 512 54 3.85M 3.60M 3.60M 33.8K 2.25G 2.08G 2.08G 1K 10 133K 133K 136K 14.5K 172M 172M 176M 2K 6 130K 130K 132K 16.4K 361M 361M 364M 4K 6 3K 3K 4.79K 31.2K 15.6M 15.6M 24.9M 8K 2 12K 5K 5.59K 16.2K 96.9M 40.4M 45.2M 16K 3 1.50K 1.50K 2.40K 66.4K 33.2M 33.2M 53.0M Total 2.31M 204G 183G 183G 3.71M 250G 222G 222G dedup = 1.21, compress = 1.13, copies = 1.00, dedup * compress / copies = 1.36 root@zfsbackup1:~#
This tells us that we have 597405+1829460 deduplication entries. That's 2426865, or the 2.31M in the "Total" line -- the total there seems to be reporting Mebientries instead of millions, for some unfathomable reason. 1.74MiE (Mebientries) are not duplicated, and 3 entries are duplicated 16K or more times (66.4K times total for those 3 blocks, referring to the "referenced blocks" column).
Each entry takes 250 bytes, so 2,426,865*250 = 578MiB of RAM.
My problem in my testing was that I was using the default ARC size of 100MiB, or enough for at most 400K entries. But by default the ARC also caches data pages, so my cache was running out of steam pretty quickly.
You can run "zfs set primarycache=metadata poolname" to cause it to only cache meta-data, like the dedup hashes.
Now, I also was running with "dedup=verify", which means that in addition to reading the dedup hashes, I also needed to read the data blocks. So I probably did need to cache data and metadata blocks. Though I would kind of expect that I would only need to read the blocks that hashed to something already in the DDT, in which case it's probably a duplicate and we'll be reading the data block and then not doing a write, rather than just doing a write. So, I wouldn't expect THAT much of a hit in performance doing that.
In the end, I increased my ARC, via an /etc/zfs/zfsrc file entry, from the default 100MB to 4500MB (I have 6GB of RAM on this test system). With that change, my deduplication performance has gone up to acceptable levels. Which is great, because deduplication (at 1.21:1) seems to be slightly more effective on this test data-set than compression (1.19:1, for a combined rate of 1.44:1).
Much of this information is thanks to watashki on the ZFS-FUSE mailing list. I just wanted to write it down in a slightly more tuner-friendly manner than what is on the list.comments powered by Disqus