Tuesday May 23, 2006 at 09:16
Subject: Python Need for Speed Sprint Report, first day.
Keywords:
Python, Sprint
Posted by: Sean Reifschneider
Related entries:Python Need for Speed Sprint Report, Mid-week Progress Report by Sean Reifschneider, Thursday May 25, 2006 at 12:18
Wrap-up of the Python Need for Speed Sprint by Sean Reifschneider, Monday May 29, 2006 at 16:24
I'm currently in Iceland for the Python Need for Speed Sprint.
The goals of
the sprint are to improve the performance of Python. Read on for details
of the first day.
Things started slowly on Sunday, with a few of us hacking and catching
up on rest and then going to CCP
games for a kick-off par-tay. Lots of chatting related to Python and
the sprint, but also not. I had started, shortly after I arrived at 7am
local time, and worked on getting a list of the performance-related patches
from the tracker. I got about half way done before I realized I was so
tired that I could no longer read, so I went to bed.
The John Benediktsson from EWT (who is currently hiring, I can get
you in touch with them if you want more information) and Hilmar Petursson
from CCP, the organizations sponsoring the sprint, spoke about what they
are using Python for and then there was much mingling. A few folks went
to a bar afterwards, I was with the crew that went back to the hotel.
Incidentally, much of the same people who actually showed up at 9am
Monday morning. A coincidence? I think not! :-)
On Monday we spent a long time on the initial planning, getting
priorities on the tasks and organizing the items in the list. Many of us
spent a long time on the problems with the current set of benchmarks,
"pybench", which has a high level of variability in it. Even when running
many iterations, though running more makes it a bit better.
Part of it is that benchmarks are just hard to get right. Part of it
seems to be that it's not running for very long on modern processors (when
it was built, the target was to have it run 20 seconds, it's under 4 now).
Another part is the very serious discussion about whether it tests the
things we're interested in. It's an interesting base-line, but you mostly
can't trust anything under 50% differences.
Richard Jones updated a patch for "zombie frames", in which frame
objects are not returned to the free list, they are kept associated with
the code object, and require less initializing when the code object
is called next. A small but measurable performance improvement in
function calls.
Steve Holden had done some testing on Saturday and Sunday running
pybench comparing 2.5a2 to 2.4.3, and found that the 2.5 alpha was around
10% slower. Further looking showed that it seems to be largely in the
try/except handling, and it's probably related to some new code added for
new object exceptions. That really wasn't looked any further into though.
We really spent a lot of time and energy dealing with the benchmarking
issue. It can be incredibly hard to get good benchmarks. I remember a
file-system performance BoF I went to at Usenix in which we spent 3 hours
of the 2 hour BoF talking talking about problems with the current
benchmarks and how to make a better one. Little if any time was spent on
talking about performance directly.
I spent some time trying to convert and then time test converting the
standard Python integer to the C "long long" type. I got far enough along
that I could run some benchmarks, but a full implementation would require
many, many changes in the Python core, and probably many or most external C
extensions. It's just a huge change. But, Tim Peterson thought it might
make a minimal negative performance impact "because Python is so stinking
slow". Of course, he means in relation to C code.
The thing is that Python automatically will check for overflow of the
native type, and will up-convert to a Python long integer, which is
arbitrary precision. Of course, arbitrary precision is much, much slower
than native integers. So, if you have integers that are between the "long"
and "long long" size, then having Python integer objects be "long long" is
a huge win. I measured between 25 and 34% improvement. However, for math
that is entirely restricted to "long" reduces performance by 11%. So, for
normal math it's a relatively nasty change, and even less likely to be
useful on 64-bit platforms.
Further discussion will need to be done before that goes in.
A few quickies, because I wasn't involved much in them. Georg Brandl
worked on a C implementation of the decimal module. Bob Ippolito got some
gzip performance improvements included, and Andrew Dalke and Fredrick Lundh
got some pretty good speed improvements out of unicode and regular strings.
So, there were some definite gains made today. Still need to look at
the try/except issues that are slowing down 2.5a2, but progress is being
made in other areas.
If you're interested in helping out, check the Need for Speed sprint
page, and coordinate work on the irc.freenode.net #nfs channel. Yes, #nfs.
It was Steve Holden's fault. :-) We've only had a few people come into
the channel and not read the topic before asking Network File-system
questions. I've been pleasantly surprised.
Tune in tomorrow for more exciting news from the world of the
"import future".
(Post Reply)
(Post Reply)