Also see my related articles on networking
I set up for PyCon 2008, 2010, and
2012.
How do you make 600 Python geeks happy? Well, wireless network access
is a good start...
Last year at PyCon 2006, the hotel ran the wireless network. Despite our
repeatedly telling them that we were going to be heavily using the wireless
network ("No, really, we're going to be heavily using the wireless
network."), they really weren't prepared for our level of use. We also
had problems with their technical support, things like the DHCP server
giving out leases with a gateway in a different network from the lease, and
their support people rebooting APs to "fix" it (which, not surprisingly, it
didn't).
It was so bad last year, that we decided to run our own wireless
network this year. The wired network last year worked reasonably well,
though there were some issues with DHCP there as well. So, I volunteered
to run the network for 2007.
Last Year
The wireless networking for PyCon 2006 was amazingly bad. The survey results
break it down as a satisfaction of 44% "very low" and 38% "low", 15%
"high" and 3% "very high". Those people in the latter categories were
probably using wired, I'd imagine.
I took charge last year of helping people with the network, and
the largest problems we had were that people couldn't associate. Some of
these people were probably running into issues on their client, but I
suspect some of them were not able to get associated because the APs
weren't able to handle any more associations.
The largest issue was that people who were associated were seeing huge
amounts of loss pinging the gateway. I don't think I ever saw less than
10% loss pinging the gateway, and usually more like 50% This was probably
due to having only 4 APs, all sharing the same 802.11b channel, so there
was a huge contention in the RF spectrum.
This Year
This year things were almost entirely reversed from last year.
The survey the satisfaction with the network was 22% "very high" and
48% "high", with 20% "low" and 1% "very low". Despite having 41% more
attendees than last year (and probably an even larger number of people
using wireless, just because more people have 802.1a/b/g devices),
well over half the people were happy with the networking.
Were they just happy because in comparison with last year it was so
much better? Maybe, but a full third of attendees weren't at last years
conference. Maybe that's the third that wasn't happy with it. :-)
Another side effect of running our own networking was that we were
able to collect all sorts of stats. Last year the hotel only gave us MRTG
graphs which included not only our network utilization but the hotel in
total, including guest rooms. They had the huevos to claim that, based on
the usage on these graphs, that there was no problem with our networking.
We didn't know last year that these stats included guest rooms.
Here are some statistics:
Total attendees
593
Total unique DHCP clients
623
Total DHCP requests
24,400
Max DHCP requests from a single client
4,537
Peak number of clients connected
340 (Saturday at 4:44pm)
95th percentile number of clients connected
263
Peak number of 802.11a clients connected
92 (Friday at noon)
Peak number of 802.11b clients connected
47 (Friday at 9:10am)
Peak number of 802.11g clients connected
198 (Saturday at 4:44pm)
Max number of clients on a single AP
85 (Rear of ballroom, Saturday at 9:55am)
So, our peak number of attendees connected at any given time was 57%
of attendees. The 95th percentile number is the number of associations we
have to handle to work 95% of the time, which was 44% of attendees.
Access Point Map
The wireless network access point locations and the peak number of
clients.
802.11a?
One thing that surprised many people is that we had fairly high
numbers of 802.11a users. Part of this is that we had better coverage for
802.11a, because of the number of available channels, than we did for b/g.
Because 802.11a has more channels, we could run those APs on higher power,
and so a user who saw both 802.11a and 802.11b+g APs would probably see the
802.11a AP as having a better signal.
Mostly, I did this so that the 802.11a users would just get out of the
way of the 802.11b users, where the spectrum is very scarce.
Note: I never had an 802.11a user that asked me for
help. Every person who asked me for help was running 802.11b or g. I'm
not sure if that's because the 802.11a users more more sophisticated, or
that the 802.11a service was that much better.
WEP
I also set up WEP with a trivial hex key. At the Python Need for
Speed sprint, we basically spent the week with really bad network
connection. We only had around 256kbps of bandwidth there, and other
people in the hotel were using the network. We couldn't track down who
among us was hammering it, so it must have been someone around or on
another floor. We even had people come right into the room we were having
the sprint in and sit down and start computing away...
We used a hex key, because there are two different algorithms for
converting a text key into a hex key. So, we just used a couple of hex
digits repeated 5 times. Mostly just to keep the random other people off
the network. Probably better on the sprint days, when there were other
events going on and we didn't have the WEP key posted anywhere.
Though the shaping probably would have prevented problems with the
random users abuse our network, it was nice to make sure our scarce
bandwidth wasn't spread any more thinly.
The Problems...
It wasn't completely rosy though. We did have some problems.
The first problem we deliberately created. For the tutorials we put
out too few APs, to see how they would work under the weight of many
connections. It didn't work super well was the answer. At lunch I doubled
the number of APs, and that solved those issues.
The bigger problem became apparent on the tutorial day though... We
had based our predictions on the amount of network usage on two false
assumptions: that the number of attendees was going to be roughly the same
as the previous year, and that the bandwidth usage would be similar.
Tuesday morning I realized that basing our bandwidth speculation on
the previous years usage was just wrong. The previous year, something
around 82% of the attendees had severe problems connecting, and therefore
they weren't sucking up the bandwidth. Last year we had 3mbps of bandwidth
(again, shared with the hotel), this year 4.5mbps. Next year I'm
recommending we get 10 or preferably 20mbps.
Another problem we had was that someone was wandering around with
their laptop set up on our ESSID, with our WEP key, running in AdHoc mode.
Meaning that users close to that person would associate with that user
instead of our network. This was reported to me on Friday, but it wasn't
until late Saturday afternoon that he found me and asked for help getting
connected.
The problem here was a faulty network configuration program.
The system was running Windows, with this busticated NetGear program
for configuring the wireless. You'd tell it you wanted to connect
to an existing network, and it would set itself up in Ad-Hoc mode.
It wouldn't tell you that it was doing this, you had to dig
through the advanced information. You also couldn't change this, you
had to use the "expert" configuration to tell it not to do that.
The number one problem we had with users connecting to the network?
It's called a "hardware radio switch". That's right, our most serious
problem was with users who had their laptops firmware configured to disable
the wireless radio. This is like "airplane mode" on cell phones. From the
software, it looks like the WiFi card is working, but it can never
associate.
A Mac User, Stephan Deibel, reported that setting the "Use
interference robustness" option helped. It was impossible, even searching
on the Internet, to find details on what exactly this option configures.
I speculated that it might reduce the "RTS" setting, but it was impossible
to tell for sure.
Another problem was APs getting unplugged. The worst problem was
with them getting unplugged from the Ethernet, because then users would
still try to associate with them, but it wouldn't work. That only
happened once. The APs had the ability to watch the Ethernet link
and disable the wireless if they got disconnected. I didn't set that
up because I didn't have a chance to test it before the show. I had
initially expected to run a number of the APs not connected to the
Ethernet, so I couldn't have used that anyway.
3 or 4 times an AP was unplugged from power, probably people banging
into them or the like. We didn't, in most cases, have safe places to mount
the APs, I just set up a chair for them to sit on.
The last problem was with our shaping. I had set up fancy shaping
using the HTB shaping rules. This should have allowed users to burst up to
the full line speed if capacity was available, but push heavy users down to
128kbps as others used bandwidth. In my testing at home, it worked exactly
as I had hoped. However, at the conference it eventually became clear that
it was just restricting all users to 128kbps.
I've been over it several times, and as far as I can tell, this was a
bug in the Linux traffic shaping.
Considering the overall scarcity of the bandwidth, it was probably for
the best that users were limited, providing fair sharing even when users
were hitting the network. Like the one person I saw in Guido's keynote,
who was streaming a Google video of another talk Guido gave, and ignoring
the streaming video.
I was mostly concerned about users with a virus, worm, or doing file
sharing swamping the bandwidth. This is based on my experiences at coffee
shops with users doing this and just killing the network because they were
swamping the fairly limited outbound bandwidth. It's easy to bring a
coffee shop network to it's knees by just sending 50 to 90KB/sec outbound.
We never had an instance during the conference where the network was
bad because someone or a few people were hammering it. So, in general I'd
call it a success.
DHCP
We had our own router, running NAT to the hotel network. On this
server we ran our own DHCP. Our private network was a /22 network of 1024
addresses. I set up DHCP to give out 760-ish of these in a pool of dynamic
addresses, with an 8 hour lease time. I set aside another 250-ish of these
addresses to be outside the pool.
I used the "glabel" program to print out a bunch of slips of paper,
one of each of these 250 IPs, including DNS, netmask, and gateway
information. People who reported problems getting an address were given
one of these slips of paper, effectively giving them a lease on an address.
One person suggested that 8 hours was way too long a lease and that
we'd probably run into problems because of this long lease time. I
explained that we had way more IPs available in the DHCP pool than we had
attendees, so the long lease time shouldn't be a problem. As far as I
know, it never was. Based on a review of the logs, we never allocated all
the IPs available in the pool.
I'm not surprised that we had more DHCP leases than we had
attendees. I figured there would be some attendees with more than one
wireless device, with cell phones and PDAs having it now.
DNS
I set up dnscache on the NAT router and published this machine as the
DNS server. I've had incredibly good luck with running dnscache in the
past, and in particular have found it to work well with little memory
usage. This was running on a small machine with only 256MB of RAM, and was
easy to set up, so I threw it into the mix.
Transparent HTTP Proxy
When it became apparent that we didn't have nearly enough network
bandwidth, I tried setting up a Squid transparent proxy. The iptables
REDIRECT target was just never matching, despite my double-checking the
rules 4 or 5 times. I finally gave up on it. I believe this may have been
related to the bug causing the shaping not to work properly, because I've
successfully set up transparent proxy before, and several other references
I checked showed that I was doing it right.
I would consider setting up a proxy if I did this again in the future,
but I'd probably use Apache to do it. I've found Squid to be very
complicated to configure, Apache is much easier to deal with.
The AP Setup
Last year the hotel provided around 4 APs. This year we had 24 APs
with 12 more in reserve. These were actually 12 "dual channel" APs, with
both an 802.11a and 802.11b+g AP built into it. The remaining 12 were
because I ordered the wrong model initially, and since I had 30 days to
return them I decided to just ship them to the conference in case we ended
up needing them.
I didn't know initially if we'd get access to the hotel wiring
infrastructure, so one of the options was that we could set up the
dual-channel APs to run meshing on 802.11a as our backbone to distribute
the other APs around for 802.11b+g as primary access.
We did end up getting access to the wiring infrastructure of the
hotel, so all the APs we ran in dual AP mode, acting as both an 802.11a and
802.11b+g AP.
I set up 802.11b+g to run in the second to lowest power mode, and
using the 3 available non-overlapping channels. I also mounted the APs
around 2.5 feet above ground, so that peoples bodies would absorb the
signal and help reduce interference between adjacent APs on the same
channel. I tried to organize the APs such that APs on the same channel
were not close together.
802.11a has like 9 non-overlapping channels, so I set up APs on
different channels as much as possible, and ran 802.11a in it's highest
power setting.
After the main conference, I set up two of the APs to use the 802.11a
radios in WDS mode, and put the WDS client AP out by the lobby (where we
didn't have a wired port), to provide repeater service for users in the
lobby, bar, and restaurant. "Mission-critical bar coverage has been set
up" I joked.
So, in the end we had a dozen 802.11a APs, and another dozen 802.11b+g
APs, to cover the conference.
RTS?
I had asked Jamie Gansead of ThinAirNet, who just sold off his
802.11b-based terrestrial wireless ISP, to review my plan for the wireless
network. His biggest suggestion was to set the "RTS" parameter low.
However, this is a client-side setting, not something I can push from
the server side.
802.11a/b/g wireless works by listening on the channel to see if
anyone else is sending, and if the packet is below the RTS, and the radio
doesn't hear anyone else sending, it will send the packet. However, if
there are two clients that can both see the AP but can't see each other,
this mechanism doesn't work.
If the packet is larger than the RTS, the client will ask the AP to
reserve a time for it to send the packet, the AP will announce to everyone
within hearing that the radio is reserved, and the client will send. This
extra overhead hurts in small networks where all the clients can see
each other most of the time, but really helps when you have users who can't
see other stations.
The default RTS setting is the maximum value. I noticed some high
latency on my laptop, and then set RTS down to the minimum and the latency
dropped way off.
Again, this wasn't something I could centrally dictate though. I set
up a wireless networking page which in the Wiki which included this hint,
but I don't know how many users actually used this setting.
We used the D-Link DWL-7200 AP, which is a less than $200 "enterprise"
access point. It had some problems, like enabling the "Load Balancing"
feature seemed to cause it to break, even if the number of associations was
below the specified limit. Also, every page you change in the
configuration required that you reboot the AP, meaning every page of
changes required 30 seconds to save.
It included the ability to save off and load a config file, so I just
made a base config that I saved, then uploaded that config to a new AP and
changed a couple of values including IP address and channel, so it went
much faster. The config file was text format, meaning in theory I could
have changed it in a text editor, but there was a checksum value at the end
that I figured would have caused that to break. Dang.
I had originally brought in a couple of Proxim "enterprise AP"s, but
the sheer price of these wouldn't have allowed me to get my target number
of APs (15) within my budget. The Proxim APs seem to have a few more
features, including a nice "mesh" auto-configuration that would have been
nice if we ended up doing meshing for the backhaul. However, they also
took a good 20 to 30 seconds per page of changes to make. The web
interface was amazingly slow.
The Proxims cost $450 each new (compared to $180-ish for the DLink).
I had originally looked at ebay and found they could be had for under $200,
but there just weren't enough of these auctions available in the weeks
leading up to the conference for me to have gotten them at this price. At
full price, the Proxims alone would have been over twice my budget for the
number of APs I was hoping to get.
Costs
I spent around $800 getting hardware in to evaluate. This included 4
Proxim APs (of which one was just broken, one was somehow a dual 802.11a
AP and didn't have the b+g radio) and would have covered getting 2 of the
DLinks. In the end I just ordered a dozen of the DLinks because time was
getting short. I had a 1.5 week long business trip a week before PyCon
which blew my schedule.
Total cost spent on APs was around $2200 for the APs we used for the
conference. Time required to architect the network, evaluate and select
the hardware, set up test and deploy the network, was around 70 hours.
Next Year
The hotel in Chicago where we are holding PyCon next year has already
said, flat out, that they will not allow us to hook our network gear to
their network. We are trying to make sure they understand the magnitude of
the job they are taking on. They are also offering an SLA.
As a contingency, we are keeping the wireless gear from this year, so
if the poo hits the fan next year we will have the hardware on hand to fix
it, even if the hotel can't. We are also getting competitive quotes from
other providers for what it'll cost to bring in our own network line for
the event. Hotels charge tens of thousands of dollars for providing a
network, and a full DS-3 (45mbps) can cost $2500/month for a year term, so
we may be in the ballpark. Especially if we are going to use the
connection two years in a row (paying off the build-out for the first year
over two events).
I suspect the hotel isn't up to dealing with the networking
requirements of PyCon because they have told us they are using T1 lines.
10 years ago, a T1 line was a mighty Internet connection. Today, it's
pretty sad. I have over 6x that bandwidth into my house. My
recommendation for bandwidth next year would require 7 to 14 T1s, which is
just silly.
I also hope that the hotel networking folks at least read this to get
an idea of what they're committing to. I'm also going to ask them to run
their network design past me, just so we can assure their plans are mighty
enough to handle the reality of PyCon.
In Conclusion
I went into this saying "It'd be hard to make it as bad as last year."
I was hoping for a perfect network experience for all, but admit I didn't
quite meet that. The biggest issues being the upstream network and shaping
that didn't work as I'd hoped.
However the "better than last year" target was undeniably achieved.
If I were to do it again, I'd add even more APs, particularly in the
center of the ballroom (I had them around the outside). My initial
estimate of 15 APs was probably spot on, I reduce it to 12 to try to stay
in budget, despite AMK saying that I could go over budget. Last minute
FedExing cost nearly 20% of my budget and was responsible for nearly my
entire overrun.
the big applause when AMK thanked me for the networking made all the
time and effort worth it. But, to be honest, the reduction in problems
over last year, which meant that I could spend more time enjoying the
conference, was the big payoff. I'll admit I was kind of floored by AMK
saying in the closing address "If Sean can do this for the conference,
imagine what tummy.com can do for your business." That was very
kind of him. People stopping me in the hall and saying "Yay" was also
quite uplifting.
Literally, I spent most of my time on the network helping people who
were having problems with their machine rather than the problem being
related to the network.
So, overall I'd call it a total success, though with a few things I'd
do differently.
Shameless Plug
tummy.com has smart people who can bring a diverse set of knowledge to
augment your Linux system administration and managed hosting needs. See
the menu on the upper left of this page for more information about our
services.
comments powered by Disqus