A contemporaneous post from Jim Roskind, who wrote the Python profile module among other things, talks about a bad input to quicksort.
We were typically sorting an array of length 1000. If I remember the numbers correctly, the profiler revealed that the hot comparison function was being called over 400,000 times per sort. This was a whole lot closer to what I’d expect from an n-squared sort algorithm than from an n * log(n) quick sort. Remembering that algorithms 101 taught that quick sort *can* be an n-squared sort if the pivots are “chosen poorly,” and this could commonly happen (with a simple quick sort implementation) when the list was already mostly sorted… I took a wild guess that this was the problem. To take a shot at working around this quick sort implementation, I needed to randomize the list prior to the sort. With Python, this was a snap (to try). I just put the list into a hash table, and then pulled it out (in its pseudo random order) before doing the sort. Sure enough, a factor of 20 performance improvement resulted (for our application), and a simple bug report went off to Guido. I always liked this story because it was a case where adding wasteful code was VERY helpful.Note to Grail users: You can visual quicksort (probably not for billion-element inputs) using my visualization applet.
Note: Jim's post about quicksort stuck in my mind, somehow, and I read Josh's post as if it were about quicksort, which doesn't make any sense. One benefit of that gaffe is that Josh referred me to Engineering a Sort Function by Bentley and McIlroy.
No comments:
Post a Comment