Wednesday January 10, 2007 at 16:10
Subject: Python+Linux time.sleep() returning IOError 514
Keywords:
Bug, Linux, Python
Posted by: Sean Reifschneider
A client reported that they were doing "time.sleep(1)" and sometimes
it was raising an IOError exception with errno set to 514. There isn't
much discussion of this on google, and many of the hits go off in the wrong
direction, so I wanted to blog on it so others can find more information
about this easier. The short answer is that it's probably a bug in the
Linux kernel...
The system in question is a dual CPU dual core 2GHz Xeon 5100 series
processor. This code is working fine on other systems, including one with
4 several-year-old Xeon CPUs (with hyperthreading enabled), so it seems to
be something at least somewhat related to this particular system.
Python's "time.sleep()" is implemented by calling the select
system-call, to allow for sub-second sleeps. In looking at recent Linux
kernel source (2.6.19.1), I see that errno 514 is ERESTARTNOHAND (restart
if no handler), and is in a section marked as "should never be seen by user
programs".
So, it would seem that the kernel is leaking this information where it
shouldn't be. I dug some into fs/select.c, and I see two possibilities for
leaking. The first I think is likely the problem:
(Post Reply)
-
In sys_select(), if STICK_TIMEOUTS is not set, and the
copy_to_user() call (fs/select.c:418 for 2.6.19.1) returns non-zero,
ERESTARTNOHAND could be propagated to user-space. Moving the "if (ret
== -ERESTARTNOHAND)" block outside one or both of the "if" blocks it's
currently in could reduce this. However, I don't fully understand the
implications of this move.
In sys_pselect7(), it has similar code, but after the block
mentioned above it has an "if (ret == -ERESTARTNOHAND)" block
(fs/select.c:500 for 2.6.19.1), but it never changes the
ERESTARTNOHAND into -EINTR. So, it looks like ERESTARTNOHAND can
definitely propagate back to userspace here.
try: time.sleep() except IOError, e: if e.errno != 514: raiseIt would be nice to know more about whether this really should be making it back to user-space, or not.
(Post Reply)
| Comment |
Hiro Sugawara Subject: ERESTARTNOHAND in userspace |
I've seen the same thing here. The platform is an x86_64 with SMP running 2.6.12. The server is a multi-threaded process. So far, only one incident has been reported.
There is an interesting posting by IBM for their S390 Linux at http://www-128.ibm.com/developerworks/linux/linux390/linux-2.6.5-s390-25-april2004.html that refers to the exact same symptom, but it patches the assembly code in entry.S which I found little similarity to my case.
| Comment |
Zac Conn Subject: ERESTARTNOHAND |
I have a similar problem here, did you find any final conclusion about this?
| Comment |
Author:
Sean Reifschneider Subject: Don't know about the solution. |
I never worked this to complete solution. I believe what happened was that the client worked around this in their code.