Your Linux Data Center Experts

I'm currently in Iceland for the Python Need for Speed Sprint. The goals of the sprint are to improve the performance of Python. Read on for details of the first day.

Things started slowly on Sunday, with a few of us hacking and catching up on rest and then going to CCP games for a kick-off par-tay. Lots of chatting related to Python and the sprint, but also not. I had started, shortly after I arrived at 7am local time, and worked on getting a list of the performance-related patches from the tracker. I got about half way done before I realized I was so tired that I could no longer read, so I went to bed.

The John Benediktsson from EWT (who is currently hiring, I can get you in touch with them if you want more information) and Hilmar Petursson from CCP, the organizations sponsoring the sprint, spoke about what they are using Python for and then there was much mingling. A few folks went to a bar afterwards, I was with the crew that went back to the hotel. Incidentally, much of the same people who actually showed up at 9am Monday morning. A coincidence? I think not! :-)

On Monday we spent a long time on the initial planning, getting priorities on the tasks and organizing the items in the list. Many of us spent a long time on the problems with the current set of benchmarks, “pybench”, which has a high level of variability in it. Even when running many iterations, though running more makes it a bit better.

Part of it is that benchmarks are just hard to get right. Part of it seems to be that it's not running for very long on modern processors (when it was built, the target was to have it run 20 seconds, it's under 4 now). Another part is the very serious discussion about whether it tests the things we're interested in. It's an interesting base-line, but you mostly can't trust anything under 50% differences.

Richard Jones updated a patch for “zombie frames”, in which frame objects are not returned to the free list, they are kept associated with the code object, and require less initializing when the code object is called next. A small but measurable performance improvement in function calls.

Steve Holden had done some testing on Saturday and Sunday running pybench comparing 2.5a2 to 2.4.3, and found that the 2.5 alpha was around 10% slower. Further looking showed that it seems to be largely in the try/except handling, and it's probably related to some new code added for new object exceptions. That really wasn't looked any further into though.

We really spent a lot of time and energy dealing with the benchmarking issue. It can be incredibly hard to get good benchmarks. I remember a file-system performance BoF I went to at Usenix in which we spent 3 hours of the 2 hour BoF talking talking about problems with the current benchmarks and how to make a better one. Little if any time was spent on talking about performance directly.

I spent some time trying to convert and then time test converting the standard Python integer to the C “long long” type. I got far enough along that I could run some benchmarks, but a full implementation would require many, many changes in the Python core, and probably many or most external C extensions. It's just a huge change. But, Tim Peterson thought it might make a minimal negative performance impact “because Python is so stinking slow”. Of course, he means in relation to C code.

The thing is that Python automatically will check for overflow of the native type, and will up-convert to a Python long integer, which is arbitrary precision. Of course, arbitrary precision is much, much slower than native integers. So, if you have integers that are between the “long” and “long long” size, then having Python integer objects be “long long” is a huge win. I measured between 25 and 34% improvement. However, for math that is entirely restricted to “long” reduces performance by 11%. So, for normal math it's a relatively nasty change, and even less likely to be useful on 64-bit platforms.

Further discussion will need to be done before that goes in.

A few quickies, because I wasn't involved much in them. Georg Brandl worked on a C implementation of the decimal module. Bob Ippolito got some gzip performance improvements included, and Andrew Dalke and Fredrick Lundh got some pretty good speed improvements out of unicode and regular strings.

So, there were some definite gains made today. Still need to look at the try/except issues that are slowing down 2.5a2, but progress is being made in other areas.

If you're interested in helping out, check the Need for Speed sprint page, and coordinate work on the irc.freenode.net #nfs channel. Yes, #nfs. It was Steve Holden's fault. :-) We've only had a few people come into the channel and not read the topic before asking Network File-system questions. I've been pleasantly surprised.

Tune in tomorrow for more exciting news from the world of the “import future”.

comments powered by Disqus

Join our other satisfied clients. Contact us today.