I've often had a love/hate relationship with the OOM killer, only without the love. The OOM killer always seems to look at the processes and decide that "hey, this one process is big, but it's really active, so I'll kill off some other processes that aren't being used right now."
Of course, that leads to the OOM killer terminating things like the SSH daemon to save the multi-gigabyte process that's leaking memory and showing no signs of slowing... In most of those cases, I'd prefer that it's decision were reversed.
Through my playing around with zfs-fuse (a process that can get to be quite big, but that you almost never want to OOM kill), I've found that there's a way to immunize processes against the OOM killer.
More recent kernels have a "oom_adj" file under /proc/$PID which you can echo values between +15 and -17 into. If you run:
echo -17 >/proc/`cat /var/run/sshd.pid`/oom_adj
the "-17" value should get set for your SSH daemon process so that it avoids being a candidate for the OOM killer.
You can read more details about this and other memory-handling issues in the excellent LWN article from 2009 Taming the OOM killer.comments powered by Disqus