Saturday January 21, at 16:43
Subject: A word on logging and database commits.
Keywords:
Logging, PostgreSQL
Posted by: Sean Reifschneider
Earlier this month we switched our bandwidth accounting system to use
new software. Previously we used a kernel module on a system, which was
conceptually very cool. But with CentOS 4 going end-of-life, that kernel
module doesn't work on CentOS 6 and I wanted to go a different route rather
than fix the module.
So earlier this month, after extensive testing in parallel with our
normal traffic stats collection software, I flipped the switch and started
using the new software. Except that I forgot to re-enable the database
update code, so it was only updating the graphs. I had disabled it to
prevent double-accounting when running the old and new code in parallel
during testing.
But, I log all those updates to syslog. I was able to pull out the
logs of all those updates and replay them into the database. Logging of
updates is a great safety net.
Which leads me to my next point. The simple code I had written to
replay the logs, just was going to do one big commit at the end. I didn't
think about how many updates we were talking about, over 2.5 million... I
also wasn't writing out status, so I wasn't sure how far along it was. So
I eventually killed it after an hour of running, and added code to commit
every 1,000 updates and then write out a progress message.
With these changes, it ended up finishing after 18 minutes, giving me
progress all the way along. Of course, I could have consolidated
the data down in the program and done only a few updates, but 18 minutes
was totally acceptable.
(Post Reply)
(Post Reply)