Wednesday, June 18, 2008

Python 3000 urllib package

I made an svn commit on the py3k branch today--my first commit in a long while.
Make a new urllib package .

It consists of code from urllib, urllib2, urlparse, and robotparser.
The old modules have all been removed. The new package has five
submodules: urllib.parse, urllib.request, urllib.response,
urllib.error, and urllib.robotparser. The urllib.request.urlopen()
function uses the url opener from urllib2.

Note that the unittests have not been renamed for the
beta, but they will be renamed in the future.

Joint work with Senthil Kumaran.

I started urllib2 at CNRI sometime around 1999, based on some experience working Grail (a Python+Tk web browser). I had intended to polish it much more and replace urllib, but I ran out of time. When the PythonLabs team moved from CNRI to BeOpen, we needed to put that code in a Python release or negotiate with CNRI to take the code anyway. So it went into Python 1.6 in its rough form, and I moved on to other projects.

A mere ten years later, we've made some progress. urllib and urllib2 have merged together into a urllib package. The urllib2 code has become the default implementation in the new package. If you call urllib.request.urlopen(), you get urllib2 code.

Senthil is working on a urllib summer of code project to improve the code and documentation further. I'm looking forward to it.