Last week, May 21 through 28, 2006, EWT LLC (a security trading company) flew a bunch of Python hackers to Iceland and put us up at a hotel to have a Python sprint. CCP Games (makers of the EVE Online MMO game) provided excellent local support, making it easy for us foreigners to feel more at home. EWT and CCP are both using Python, and both would benefit from improved performance, so they are sponsoring the sprint with the theme being “Need for Speed”.
This event was an experiment for EWT and CCP, trying a novel way to work with the Python community to the benefit of their businesses. Typically, Sprints have been arranged around various Python sprints, where Python hackers were gathered anyway, and spending 3 or 4 days around the sprint specifically working on Python tasks. EWT organized and provided the majority of the funds related to making this sprint happen, between the Python conferences.
It boils down to this: EWT and CCP gain business benefits from Python and other Open Source technologies. So what happens if they provide resources to the Python community to help things along. While they were primarily looking for performance improvements, it was made very clear that the participants could work on whatever they wished.
In fact, I had to prod them to get specific information about their priorities for things they thought needed speeding up. Their desire to not dictate the direction of the sprint was admirable. However, they also have valuable feedback for the community and I worked to get them to share this information while also feeling comfortable that they weren't forcing their agenda on the community.
So, what were the results?
I'll go into details shortly, but first I need to talk about benchmarking. A lot of time was spent this week on just this topic. One of the standard benchmarks, “pybench”, was producing extremely variable results. For example, comparing one run of pybench to another on the same interpreter could easily show 10 to 20% variability. A lot of work went into trying to get better results from pybench with some improvements made and some groundwork laid for potentially improving pybench.
Huge improvements were made to strings (550% for unicode, 600% for regular) and integer conversion (12 to 35% for one change, 4% to 500% for another, depending on length). A 50% improvement in exception handling and 30% improvement in gzip “readline” performance, and “struct” speedups of 20%. Talking about huge, math changes to psyco resulted in 2,000% to 50,000% improvements. The decimal module was re-written in C, and while it isn't complete it's a good start for the google Summer of Code work on this.
There were many significant speed improvements made for specific areas. Overall there was a small but measurable speed improvement. Strings and function/frame work are probably the biggest improvements that the average Python user will benefit from, with some other changes being more (string to int/long conversion) or less (gzip readline, psyco math) general use speed-ups.
I believe that the sprint was quite obviously a huge success for the Python community.
It's important to note that in only one case were changes made specifically targeting pybench results. That one case was that exceptions were much slower because of the move to having exceptions be new-style classes. pybench at the beginning of the week was showing a 60% slowdown in exceptions. By the end of the week, this was changed to be a 30% improvement over 2.4.3. However, this specific part of pybench is one of the shortest running benchmarks, and therefore this 90% improvement results in fairly little overall change.
Sadly, many of the changes made aren't very well represented by tests within pybench, so many performance changes aren't evident here.
* pybench results between Python 2.5a2 (development test release, before the Sprint started) to current development: 0.16% faster.
Largely this is an artifact of how pybench computes the results. Because comparing 2.4.3 to 2.5a2 shows a 0.15% slowdown, and 2.5a2 to trunk shows a 0.16% speedup, but 2.4.3 to trunk shows a 7% speedup.
While exceptions were sped up by 57%, for loops seem to have been slowed down by 45%. Because of struggles throughout the week with pybench, the for loop slowdown was never addressed during the sprint, but probably will be before final 2.5 release. Also, the for loop test takes 3x as long as the exception test, so the 45% slowdown counts 3x more than the 57% speedup. That's just the way pybench works…
* Pystone results: 2.4.3 = 41322.3, 2.5a2 = 39840.6, trunk = 40983.6.
Pystone is a pretty limited benchmark, but is one of the ones we tested. Other people saw much more significant results on their systems, but on my test system I was seeing an overall slowdown from 2.4.3, but a reasonable speedup during the week.
* String performance using “stringbench”: 550% improvement.
Many changes were made to the string routines, which provided huge improvements. Apparently, Python's string handling was fairly slow in comparison to other similar languages, and these changes should really help. CCP particularly appreciated these changes because they're using Unicode strings for everything and Unicode strings used to be nearly half as fast as regular strings. They're now about 30% slower, but vastly faster than in current releases.
* Many integer/long conversion changes: up to 500% faster.
One patch was applied that improved string to integer or long conversion performance by 12% to 35%. Another change improved speed by 4% (2 byte string) to 507% (900 bytes), but decreased performance by 4% for 1-byte strings.
* Exception handling sped up by 60%.
In 2.5, exceptions are now all “new style” objects. However, the original code for this was significantly slower than in 2.4.3. This was re-designed so that it is now significantly faster than 2.4.3 was.
* The struct module gained the ability to compile conversions, which resulted in a 20% improvement in performance.
* Frame optimizations improved pystone benchmark performance by 10%.
* The gzip readline() method was made 30 to 40% faster.
* 30% speed improvement in function calls when all arguments are Python objects.
* Sped up interpreter startup time by 9% by reducing the number of file open calls made.
* psyco changes producing multi-thousand percent gains.
Goals of the Need for Speed Sprint. Some accomplished, but many we did not have time to work on.
Successes of the sprint. More details on many of the above items.
Steve Holden's Sprint wrap-up of the sprint. Includes many links to photos and other blog entries.comments powered by Disqus