mycroes

There's always time to play

Thursday, August 26, 2010

rsync with --delete-excluded

While setting up daily (offsite) automated backups I ran into a few issues. First of all backups didn't complete before people were getting to work again, so I had to manually stop them and start them at a lower transferrate. This is easily done by passing rsync the --bwlimit=<kbps> option.

Next I often want to sync just part of the tree, so I would add --exclude=/<folder> to the options to exclude all folders I don't want. However, I also exclude some files and I use --delete, which has the nasty side-effect of not deleting the excluded files on the receiving end (if they were deleted on the sender), thus leaving non-empty folders on the receiver and generating errors because the non-empty folders aren't deleted. There's an option that 'fixes' this, and that's --delete-excluded. This option will delete excluded files on the receiving end. You can guess that combined with my --exclude=/<folder> this would result in deleting an entire branch of the tree that should not be removed... The solution is to specify that the exclude is a receiving side exclude, because excludes are server side exclude by default when --delete-excluded is also provided. This can be done by using a filter rule instead of an exclude rule, resulting in the following option: --filter=-r_/<folder>. The - is to specify it's an exclude, the r specifies it's for the receiving side and the _ seperates the modifiers from the path (space is also allowed, but using an underscore prevents the need for quoting or even double-quoting). Now there's one nasty issue remaining: the excluded folder will still be parsed on the sender, so let's make it an exclude for both sender and receiver: --filter=-rs_/<folder>.

Using the above it's now possible to exclude files from an rsync transfer, without removing them on the receiving side, but with deletion of exclude files on the receiving end. In short: rsync --exclude='*.tmp' --filter='-rs_/important/' --delete --delete-excluded <source> <dest> will leave the important folder alone on the destination, but will remove all .tmp files in the destination.

10 comments:

Frank Groeneveld said...

So you're saying that if I have this tree:

A
A/B/
A/B/C
A/file.txt
A/B/C/excluded.txt

using --delete-excluded will result in the removal of A (and file.txt) while only excluded.txt should be removed?

Michael Croes said...

The context in which you're using 'should' in your comment is a bit vague due to a missing rsync command and missing source tree...

Frank Groeneveld said...

Ok, suppose we have the same tree on source and destination, and we sync with this command:

rsync --exclude=excluded.txt --delete-excluded

You're saying C is removed?

Michael Croes said...

No, 'excluded.txt' from the destination will be deleted, but even when excluding 'excluded.txt', the directory 'C' is being synced. Unless -m/--prune-empty-dirs is added, in which case both B and C will be deleted because they're 'empty'.

Frank Groeneveld said...

What do you mean with "You can guess that combined with my --exclude=/ this would result in deleting an entire branch of the tree that should not be removed..." in the original post then?
I don't see why I would need to use filter rules with the weird syntax etc, when I can do exactly the same with just --exclude and --delete-excluded

Michael Croes said...

The issue is I use --exclude=/Videos because videos are large and I don't always want to sync them. But as soon as you combine that with --delete-excluded, the Videos folder on the remote end will be deleted, which is not what I wanted. This example is merely for saving bandwidth, not for saving storage on the receiving side.

orange80 said...

This is awesome!!! Huge help for my backups! Thanks for posting :)

s1nglemalted said...

Thanks Michael for the write up! This and the 'FILTER RULES' in the official rsync docs has helped me out.

I have the same requirements for my rsync operation as you do where I need particular file patterns to be excluded but also need to have a desctructable method for cloning the source system.

There's two points I'll make, just in case it helps anyone else out.

1) The --exclude switch actually works just like --filter using the "-" rule.

For example, in my case I needed to exclude any "config.inc.php" files within the tree structure, so I initially was using :

--exclude='config.inc.php'

Now, with the --filter switch I can replace this with :

--filter='-rs_config.inc.php'

Which allows me to include the 'r' and 's' rules to ensure this file is excluded on both the sender and receiver sides.

Using the --filter switch rather than the --exclude switch has freed me up from having to also use the --delete-excluded switch (where a lot of my confusion was).

2) I also stumbled upon another useful rule when using --filter; the 'p' rule ('perishable').

What this means is that the filter I use 'p' on will be ignored if the containing folder is being deleted.

So my above example becomes :

--filter='-rsp_config.inc.php'

Which in a normal rsync operation will exclude the "config.inc.php" file, except in the case where the folder where the file resides has been deleted off the source, the folder will also be deleted off the destination including that file.

Without the 'p' rule if you deleted the folder on the source, the rsync operation would not be able to delete the folder on the destination because of the filter. Which is not what I wanted.

So in summary my rsync switches look like so :

rsync --delete -avze --filter="-rsp_config.inc.php" ...

And multiple filter switches can be added for as many as you need.

cheers

Michael Croes said...

Thanks for the detailed explanation!

Bellocarico said...

-–bwlimit=KBps and not –-bwlimit=kbps
Apart from that great article!