Tuesday December 26, 2006 at 14:30
Subject: New features in rsync.
Keywords:
rsync, Technical
Posted by: Sean Reifschneider
rsync, already one of my favorite tools, just keeps getting better
and better. Recently, a client had me upgrade the rsync on some of their
systems because they needed features in the absolute latest version. This
prompted me to go through and look at some of the newer features in rsync.
Here are some of the more notable ones (to me anyway).
(2.6.9) --remove-source-files option was added which will remove files
from the source side if they exist or are copied to the remote end. This
is slightly different than the --remove-sent-files option, which would not
remove files on the source side if they existed in both locations (since
the file wasn't literally sent). In the past I have built my own
functionality similar to this (back in 1998, not much option at that time)
which would scp a file to the destination, then run a checksum, then
finally delete the source if the checksum succeeded. I used this for many
systems where I wanted data gathered on one system to be processed on
another.
Version 2.6.8 is available on FC6.
(2.6.7) --append option is ideal for log files which are appended to.
However, it will only transfer files that are larger on the source than the
destination, or do not exist. Existing files of the same or larger size
are (if I read the documentation correctly) not checked. Note that this
option will check the beginning of the file for changes as well,
just in case it also changed, the right thing is done. --inplace is
probably better for this sort of optimization though.
(2.6.7) --min-size to specify the minimum size of files to transfer.
(2.6.7) --human-readable option (-h) reports statistics in more useful
"human readable" format.
(2.6.7) --chmod=MODE option allows overriding the mode of files on the
destination.
(2.6.4) New on-the-wire protocol version introduced.
(2.6.4) --delete-during option which causes files to be deleted from a
directory only when rsync is working on it. Previously, the "--delete"
option would delete files before doing any transfer work. This, of course,
requires that the whole file-system information be transferred and
processed.
(2.6.4) --delete-WHEN options use less memory. This is great, because
rsync on systems with many files can use a pile of memory.
(2.6.4) --max-size option to specify the maximum size of files to
transfer.
(2.6.4) --omit-dir-times option avoids a last-pass through all
directories to update the times on them. On an rsync with many
directories, this can improve performance if you don't care about directory
times.
(2.6.4) --filter option that allows for very rich tuning of which
files to transfer or avoid. 8 pages of documentation were added to discuss
how to use the filter rules.
(2.6.4) --delay-updates option which causes all updates to be written
to a temporary directory, and then moved from the temporary directory at
the very end of the transfer. This reduces the window where files are in
flux. If a transfer takes hours, the original file set will be left alone
for those hours, and then changed in a much shorter time at the end of the
transfer. Debian now recommends using this option for mirrors.
(2.6.4) --progress stops reporting updates if rsync is put in the
background.
(2.6.4) --fuzzy option will look in a directory for a file to base a
new file off of. I'm real excited about this option because it should make
rsyncing rotating log directories much smarter. Where "maillog.3" gets
renamed to "maillog.4" and so on... What I really want to know is if it
copies the data from the old file, or if it links them. The latter, in the
case of rotatig log-files could be a huge savings if you are doing the
"link to snapshot directory" backup mechanism or using file-system
snapshots. Currently, the rotated files are treated as duplicated in the
snapshots.
(2.6.4) --remove-sent-files will remove files that are sent from one
system to the other. However, race conditions can cause interrupted
transfers to leave files on both systems. Hence the adding of
--remove-source-files in 2.6.9.
Version 2.6.3 is available on CentOS and RHEL 4.4.
(2.6.3) --partial-dir can specify a directory for partially
transferred files to be stored in, instead of overwriting the destination
file. I've wanted this in the past when rsyncing partially transferred
bit-torrent ISO files or the like. I'd like to keep a partial transfer,
but if I interrupt it I'd like it to not truncate the file where it
stopped.
(2.6.3) --keep-dirlinks option will keep destination directory links,
so if you run out of space on one partition you can push parts of the data
out to another mount-point. I imagine mirror admins will love this, though
we currently have just one huge volume for mirrors.tummy.com.
(2.6.3) --inplace argument which will overwrite a file instead of the
normal writing mechanism where it opens a new file to write to, and then
when done renames it to the final location name. We use this when doing
file-system snapshots to reduce incremental transfer sizes. If you have a
2GB log-file that has 100MB appended to it, --inplace will use 2.1GB for 2
snapshots, without it you will use 4.1GB. You cannot use this with the
"link to snapshot directory" backup trick though.
(2.6.3) --batch options have a new implementation which fix some bugs
with the older implementation. I had tried the --batch option in the past
but never got it to successfully work. The idea is that if you know the
source and destination, you can create a batch file of the differences, and
then distribute that. User can use that to re-play the changes. For
example, Fedora could distribute a batch file that would contain the
differences between FC5 and FC6 DVD ISOs, which may be much smaller than a
full ISO download.
(2.6.2) Reduced memory consumption and CPU utilization.
(2.6.0) SSH is now the default remote shell. No more need for "-e
ssh".
(2.6.0) Added --files-from and --from0 options to read a list of files
to transfer.
(2.5.3) --batch options "actually work".
(2.5.2) --ignore-existing will ignore files that exist on the
destination end.
That takes us back to the end of 2000. I found a lot of new options
in there that were good to know about, particularly the --fuzzy and
--append options which I may find useful.
(Post Reply)
(Post Reply)