Wednesday June 02, 2010 at 00:20
Subject: cron+xargs: The Scheduler of the Stars
Keywords:
Command-line, cron, NCLUG, Technical, Tricks, xargs
Posted by: Sean Reifschneider
Related entries:Tricks: Using xargs to feed multiple CPUs. by Sean Reifschneider, Monday April 19, 2010 at 13:17
I'm working on replacing our BackupPC backup infrastructure (because
BackupPC just takes too long), and one of the things I needed to do was
schedule backup jobs. In BackupPC you can tell it to run 4 jobs in
parallel, and whenever it wakes up if there are slots free and backups to
run, it will start some more.
I wanted similar capabilities, but without writing my own scheduler;
it's not rocket science, but it's still a complicated bit of code.
Ideally, to improve on BackupPC, I'd like to have one job start as soon as
another ends, rather than waiting for the next scheduler wake-up.
As I've mentioned before, xargs can manage running multiple jobs. You
can specify how many to run in parallel, and it gets the list of arguments
to run from stdin. So, what I came up with is a crontab which looks like
this:
(Post Reply)
00 22 * * * echo 1.example.com 2.example.com [...] \
15.example.com | xargs --max-args=1 --max-procs=4 /path/to/harness
00 09 * * * echo a.example.org b.example.org c.example.org \
| xargs --max-args=1 --max-procs=1 /path/to/harness
The first line starts at 10pm and runs the harness with the system
name to back up as the argument. It runs it for 15 hosts, running 4 in
parallel. The second cron entry starts at 9am and runs the 3 example.org
backups one at a time (they are hosted off-site and no need to hit their
network or ours harder than necessary).
In the past I would manually add the cron entries for each host at
specific times, but sometimes jobs would run long and load would go way up,
or sometimes there were idle periods where nothing happened... This is
definitely an improvement over that, with minimal additional coding.
Wherever possible: Avoid writing code.
(Post Reply)
| Comment |
Ole Tange Subject: Using GNU Parallel for readibility |
With GNU Parallel http://www.gnu.org/software/parallel/ your crontab may be slightly easier to read:
00 22 * * * parallel -j+0 /path/to/harness ::: 1.example.com 2.example.com ... 15.example.com 00 09 * * * parallel -j1 /path/to/harness ::: a.example.org b.example.org c.example.orgWatch the intro video for GNU Parallel: http://www.youtube.com/watch?v=OpaiGYxkSuQ
| Comment |
Author:
Sean Reifschneider Subject: I've since changed the starting process... |
Some of my cron entries were getting extremely unwieldy, so I made a
program that looks at the database and prints out the backups that should
be run. So my crontab entries have changed to:
00 20 * * * listbackupstorun | xargs --max-args=1 --max-procs=8 /path/to/zfsharness 00 22 * * * listbackupstorun | xargs --max-args=1 --max-procs=8 /path/to/zfsharness [...]In my case here, I know that the backup names don't contain spaces or other odd characters, but I could easily null-separate them if necessary. The thing I think GNU parallel would really help out with is the "semaphore" mode. Right now I have my backups grouped so that the ones that run at 8pm are done by 10pm, when another 25 backups run. And those are done by midnight when another 40 backups run. And those are done when the 3am backups start. It would be easier if I just had one window, say starting at 10pm and could just run all 100+ backups then. But, most of our backups need to run at 11pm or midnight, but some I can run a bit earlier and get out of the way... The problem with xargs is that it doesn't communicate among instances, so if one group were to overrun it's window, I could end up with a lot of backups running. So, the semaphore option could really help out there. So, I'll probably have to look at converting those over to using parallel. Thanks.