Your Linux Data Center Experts

I'm working on replacing our BackupPC backup infrastructure (because BackupPC just takes too long), and one of the things I needed to do was schedule backup jobs. In BackupPC you can tell it to run 4 jobs in parallel, and whenever it wakes up if there are slots free and backups to run, it will start some more.

I wanted similar capabilities, but without writing my own scheduler; it's not rocket science, but it's still a complicated bit of code. Ideally, to improve on BackupPC, I'd like to have one job start as soon as another ends, rather than waiting for the next scheduler wake-up.

As I've mentioned before, xargs can manage running multiple jobs. You can specify how many to run in parallel, and it gets the list of arguments to run from stdin. So, what I came up with is a crontab which looks like this:

00 22 * * * echo 1.example.com 2.example.com [...] \
      15.example.com | xargs --max-args=1 --max-procs=4 /path/to/harness
00 09 * * * echo a.example.org b.example.org c.example.org \
      | xargs --max-args=1 --max-procs=1 /path/to/harness

The first line starts at 10pm and runs the harness with the system name to back up as the argument. It runs it for 15 hosts, running 4 in parallel. The second cron entry starts at 9am and runs the 3 example.org backups one at a time (they are hosted off-site and no need to hit their network or ours harder than necessary).

In the past I would manually add the cron entries for each host at specific times, but sometimes jobs would run long and load would go way up, or sometimes there were idle periods where nothing happened… This is definitely an improvement over that, with minimal additional coding.

Wherever possible: Avoid writing code.

comments powered by Disqus

Join our other satisfied clients. Contact us today.