Your Linux Data Center Experts

Roger Parmenter, at a recent NCLUG, asked about how to split up a tar file into smaller tar files. I had pointed him at some backup software I had written years ago called “pyntar”, which I thought had done that. After some re-learning, I realized that pyntar didn't split on even tar boundaries. It does index tar files for fast recovery, and split them at a particular size.

However, I realized that, understanding the that file format and having a Python module I wrote, I could easily build a program that did just that. I put together “pytarsplit” and made it available on our FTP site at ftp://ftp.tummy.com/pub/tummy/pytarsplit/.

You pipe tar data to it and call it with arguments specifying the split size in bytes and a file name prototype. For example:

tar c . | pytarsplit 5000000 /tmp/splittarfile.%05d.tar

This will create a tar of the current directory, and write it to files named “splittarfile.00001.tar”, “splittarfile.00002.tar” and so on. Unlike using the Unix “split” command, each of the resulting tar files will be a self-contained tar file that you can run “tar” on, or you can cat them all together to return to the original file.

Of course, because of the way it operates, some of the resulting files will be larger than the split size. pytarsplit will try to keep it under the specified size, but in the event that an entry of the tar file is larger than the split size, the resulting part will end up being that size. In Roger's test backup, he had a couple of chunks that were 700MB, and I said “Oh, have a few ISOs in your home directory, eh?

Roger left the Hacking Society meeting fairly happy with the results. I'll probably refine it more as I have time.

comments powered by Disqus

Join our other satisfied clients. Contact us today.