Your Linux Data Center Experts

rsync, already one of my favorite tools, just keeps getting better and better. Recently, a client had me upgrade the rsync on some of their systems because they needed features in the absolute latest version. This prompted me to go through and look at some of the newer features in rsync. Here are some of the more notable ones (to me anyway).

(2.6.9) –remove-source-files option was added which will remove files from the source side if they exist or are copied to the remote end. This is slightly different than the –remove-sent-files option, which would not remove files on the source side if they existed in both locations (since the file wasn't literally sent). In the past I have built my own functionality similar to this (back in 1998, not much option at that time) which would scp a file to the destination, then run a checksum, then finally delete the source if the checksum succeeded. I used this for many systems where I wanted data gathered on one system to be processed on another.

Version 2.6.8 is available on FC6.

(2.6.7) –append option is ideal for log files which are appended to. However, it will only transfer files that are larger on the source than the destination, or do not exist. Existing files of the same or larger size are (if I read the documentation correctly) not checked. Note that this option will check the beginning of the file for changes as well, just in case it also changed, the right thing is done. –inplace is probably better for this sort of optimization though.

(2.6.7) –min-size to specify the minimum size of files to transfer.

(2.6.7) –human-readable option (-h) reports statistics in more useful “human readable” format.

(2.6.7) –chmod=MODE option allows overriding the mode of files on the destination.

(2.6.4) New on-the-wire protocol version introduced.

(2.6.4) –delete-during option which causes files to be deleted from a directory only when rsync is working on it. Previously, the “–delete” option would delete files before doing any transfer work. This, of course, requires that the whole file-system information be transferred and processed.

(2.6.4) –delete-WHEN options use less memory. This is great, because rsync on systems with many files can use a pile of memory.

(2.6.4) –max-size option to specify the maximum size of files to transfer.

(2.6.4) –omit-dir-times option avoids a last-pass through all directories to update the times on them. On an rsync with many directories, this can improve performance if you don't care about directory times.

(2.6.4) –filter option that allows for very rich tuning of which files to transfer or avoid. 8 pages of documentation were added to discuss how to use the filter rules.

(2.6.4) –delay-updates option which causes all updates to be written to a temporary directory, and then moved from the temporary directory at the very end of the transfer. This reduces the window where files are in flux. If a transfer takes hours, the original file set will be left alone for those hours, and then changed in a much shorter time at the end of the transfer. Debian now recommends using this option for mirrors.

(2.6.4) –progress stops reporting updates if rsync is put in the background.

(2.6.4) –fuzzy option will look in a directory for a file to base a new file off of. I'm real excited about this option because it should make rsyncing rotating log directories much smarter. Where “maillog.3” gets renamed to “maillog.4” and so on… What I really want to know is if it copies the data from the old file, or if it links them. The latter, in the case of rotatig log-files could be a huge savings if you are doing the “link to snapshot directory” backup mechanism or using file-system snapshots. Currently, the rotated files are treated as duplicated in the snapshots.

(2.6.4) –remove-sent-files will remove files that are sent from one system to the other. However, race conditions can cause interrupted transfers to leave files on both systems. Hence the adding of –remove-source-files in 2.6.9.

Version 2.6.3 is available on CentOS and RHEL 4.4.

(2.6.3) –partial-dir can specify a directory for partially transferred files to be stored in, instead of overwriting the destination file. I've wanted this in the past when rsyncing partially transferred bit-torrent ISO files or the like. I'd like to keep a partial transfer, but if I interrupt it I'd like it to not truncate the file where it stopped.

(2.6.3) –keep-dirlinks option will keep destination directory links, so if you run out of space on one partition you can push parts of the data out to another mount-point. I imagine mirror admins will love this, though we currently have just one huge volume for mirrors.tummy.com.

(2.6.3) –inplace argument which will overwrite a file instead of the normal writing mechanism where it opens a new file to write to, and then when done renames it to the final location name. We use this when doing file-system snapshots to reduce incremental transfer sizes. If you have a 2GB log-file that has 100MB appended to it, –inplace will use 2.1GB for 2 snapshots, without it you will use 4.1GB. You cannot use this with the “link to snapshot directory” backup trick though.

(2.6.3) –batch options have a new implementation which fix some bugs with the older implementation. I had tried the –batch option in the past but never got it to successfully work. The idea is that if you know the source and destination, you can create a batch file of the differences, and then distribute that. User can use that to re-play the changes. For example, Fedora could distribute a batch file that would contain the differences between FC5 and FC6 DVD ISOs, which may be much smaller than a full ISO download.

(2.6.2) Reduced memory consumption and CPU utilization.

(2.6.0) SSH is now the default remote shell. No more need for “-e ssh”.

(2.6.0) Added –files-from and –from0 options to read a list of files to transfer.

(2.5.3) –batch options “actually work”.

(2.5.2) –ignore-existing will ignore files that exist on the destination end.

That takes us back to the end of 2000. I found a lot of new options in there that were good to know about, particularly the –fuzzy and –append options which I may find useful.

comments powered by Disqus

Join our other satisfied clients. Contact us today.