Earlier this month we switched our bandwidth accounting system to use new software. Previously we used a kernel module on a system, which was conceptually very cool. But with CentOS 4 going end-of-life, that kernel module doesn't work on CentOS 6 and I wanted to go a different route rather than fix the module.
So earlier this month, after extensive testing in parallel with our normal traffic stats collection software, I flipped the switch and started using the new software. Except that I forgot to re-enable the database update code, so it was only updating the graphs. I had disabled it to prevent double-accounting when running the old and new code in parallel during testing.
But, I log all those updates to syslog. I was able to pull out the logs of all those updates and replay them into the database. Logging of updates is a great safety net.
Which leads me to my next point. The simple code I had written to replay the logs, just was going to do one big commit at the end. I didn't think about how many updates we were talking about, over 2.5 million… I also wasn't writing out status, so I wasn't sure how far along it was. So I eventually killed it after an hour of running, and added code to commit every 1,000 updates and then write out a progress message.
With these changes, it ended up finishing after 18 minutes, giving me progress all the way along. Of course, I could have consolidated the data down in the program and done only a few updates, but 18 minutes was totally acceptable.comments powered by Disqus