Your Linux Data Center Experts

xargs is a great command-line tool for parceling huge lists of files to not exceed command-line limits on length or numbers of arguments. However, it also has some arguments that cause it to manage running multiple, parallel jobs. Read on for how I used this to cut one of my jobs execution time by 75%.

xargs is frequently used with the “find” command, such as:

find . -type f | xargs egrep '^#!.*python'

The above will create a list of files under the current directory, and look for python in the “#!” line. If there are more files than one command-line can handle, the egrep will be run multiple times. Note that in newer “find” implementations, you can use the “+” argument (though on SuSE this seems to produce “too many arguments” errors):

find . -type f -exec egrep '^#!.*python' '{}' +

One little-known feature of xargs is that it can run multiple jobs in parallel via the “-P” or “–max-procs” argument. I used this some time ago to transcode a bunch of music:

find flac -type f -name \*.flac -print0 | xargs -0 -n1 -P4 ./convert

Where I have a quad-core CPU, and “convert” is a script that checks for the destination file to already exist (so I can run it to only transcode new music), pulls out the tags via “metaflac” and then decompresses the flac (“flac –stdout –decode”) and re-encodes it as an Ogg/Vorbis with “oggenc”.

So, two thumbs up for using xargs to feed multiple CPUs with CPU-intensive jobs.

comments powered by Disqus

Join our other satisfied clients. Contact us today.