Tuesday, February 27, 2007
Flight delayed, bugs fixed
My original flight from Dallas back to Newark was cancelled because of mechanical problems with the plane. I got bumped from a 10:30 flight to a 2:45 flight, which left me a few hours for hacking and socializing at the PyCon sprints this morning. I had breakfast with Jeff Elkner and caught up on his software and teaching projects, and I got to finish some bug fixes in typeobject.c that I had worked on yesterday. I find it very satisfying to fix an Armin Rigo crash bug. It's usually an accomplishment just to understand it.
nonlocal implemented
I'm going to love Python 3000! Thomas Wouters and I implemented PEP 3104 tonight. It fixes a wart in the original nested scopes implementation that I did in 2001. In that version of nested scopes, we did not allow names defined in one function to be rebound in an enclosing function. It was impossible for the compiler to distinguish between an assignment that creates a local and an assignment that rebinds. Python 3000 will fix this using the nonlocal statement. I hate the name nonlocal, but no one has thought of a better name. Ka-Ping Yee wrote PEP 3104 and provided an exhaustive list of alternate names,
The code itself was quite simple. The only changes in compile.c were trivial. The symbol table needed more changes, because it had to recognize a new kind of declaration and propagate that information to the compiler. The symbol table uses a bit-field where a handful of the bits are used to represent the scope. I spent a lot of time scratching my head until I remember that I needed to increase the width of mask used to extract the scope-related bits. I think I'm not happy with the bit-field representation.
Pete Shinners and Neal Norwitz reviewed code and helped think of tests cases.
That wraps up a fun day of sprinting for me. I also closed several bugs and spent a few hours pouring over typeobject.c to fix some crasher bugs that Armin Rigo reported. I have a fix for one of them, but I want to do a little refactoring before I check it in. Next week, perhaps. I'm flying home first thing in the morning.
The code itself was quite simple. The only changes in compile.c were trivial. The symbol table needed more changes, because it had to recognize a new kind of declaration and propagate that information to the compiler. The symbol table uses a bit-field where a handful of the bits are used to represent the scope. I spent a lot of time scratching my head until I remember that I needed to increase the width of mask used to extract the scope-related bits. I think I'm not happy with the bit-field representation.
Pete Shinners and Neal Norwitz reviewed code and helped think of tests cases.
That wraps up a fun day of sprinting for me. I also closed several bugs and spent a few hours pouring over typeobject.c to fix some crasher bugs that Armin Rigo reported. I have a fix for one of them, but I want to do a little refactoring before I check it in. Next week, perhaps. I'm flying home first thing in the morning.
Monday, February 26, 2007
Bug of the day
Mike Verdone found a great Python bug today. It's an interaction between two features.
How old is this bug? It is present in Python 1.5.2 and was probably present in the original release of Python 1.5. The bug is about nine years old.
We're going to do a quick fix: Change name mangling so that it does not mangle names with dots in them. This will fix the __A__.B.C case, break the __A.B.C case, and leave broken the A.__B.C case. It is likely that we will change the language spec and say identifiers in import statements are not mangled.
- Name mangling. If you use an identifier with two leading underscores in a class, the name is mangled to include the name of the class as a prefix, except if the name also ends with two trailing underscores. In the class Spam, the identifier __eggs becomes _Spam__eggs. Note that the language reference says that mangling applies to any identified
- Packages. You can import a name like A.B.C. This name describes a module C in package B inside package A. Note that A, B, and C are identifiers, so if one of them has two leading underscores they should be mangled. import __A.B.C should technically be turned into import _Spam__A.B.C if it occurs inside class Spam.
How old is this bug? It is present in Python 1.5.2 and was probably present in the original release of Python 1.5. The bug is about nine years old.
We're going to do a quick fix: Change name mangling so that it does not mangle names with dots in them. This will fix the __A__.B.C case, break the __A.B.C case, and leave broken the A.__B.C case. It is likely that we will change the language spec and say identifiers in import statements are not mangled.
locals() and free variables
I've been struggling with some odd corner cases of free variables and eval/exec in Python.
There is a long-standing bug report that you can't access free variables when you created a nested code block with eval or exec--for example, putting a lambda in an expression passed to eval. It works if the variable happens to be free in the text of the function containing the eval, but only then. The bug report claimed that Scheme worked differently, but I don't think Scheme provides a way to capture an arbitrary environment to pass to eval; it only provides access to top-level environments.
The name locals() is misleading. It returns free variables as well as local variables. The name suggests that this behavior is wrong, but it is necessary to make exec and eval work with free variables. The function returns the names visible in the current scope.
The discussion here is actually about much more than locals(). The same basic issues arise when you use exec or import * in a block or when you use the debugger or some other trace function installed via sys.settrace(). In all these cases, we extract variables in a dictionary to support introspection.
There is still a problem with locals(), which results from the use of locals() in debugging. The implementation makes sure that changes made to the locals dict are reflected in the running program in many cases. In old versions of Python, local variables were stored in a dictionary and locals() returned the actual dictionary. This feature is useful in the debugger. When the implementation changed to use a simple C array instead of a dict, the interpreter arranged to copy variables back and forth between the dictionary and the array when the debugger was used. (This behavior is also needed to make features like exec and import * work.)
In many cases, it is fine to copy free variables into the dictionary returned by locals. If changes are made to those variables, they can be written back to the appropriate part of the closure rather than creating local variables that shadow the free variables.
Class namespaces pose a serious problem in the current CPython implementation. The class stores local variables in a dictionary. When the body of the class finishes execution, this dictionary is passed to the class constructor. Keys in the dictionary become attributes of the class. If you call locals(), free variables could be copied into the dictionary for access in the debugger or introspection. But if they are copied, they will become attributes of the class, which was not intended. It's a messy problem, because it is possible, though inscrutable, for
the same name to be used for a free variable in a method and a class attribute; they have the same name, but they refer to different bindings.
For now, we are fixing this in Python 2.x by omitting free variables from the dictionary returned by locals() when locals() is called in a class block. This makes introspection more difficult but prevents locals() from polluting the class namespace.
What are some other solutions for the class problem? It might be possible to return a copy of the class dictionary with the free variables added. Changes to this dictionary are written back to the real dictionary or free variables when you run in the debugger. It wouldn't be possible to reflect all changes to the dictionary; it would only work in contexts like debugging or using exec. I'm not sure if that would be too confusing.
We could also make two different functions, where locals() and vars() seem like reasonable names. The locals() could return the actual dictionary object for classes, without adding the free variables. The vars() could return a dictionary will all variables, but it would be a copy. Then client code could get whatever they wanted.
There is a long-standing bug report that you can't access free variables when you created a nested code block with eval or exec--for example, putting a lambda in an expression passed to eval. It works if the variable happens to be free in the text of the function containing the eval, but only then. The bug report claimed that Scheme worked differently, but I don't think Scheme provides a way to capture an arbitrary environment to pass to eval; it only provides access to top-level environments.
The name locals() is misleading. It returns free variables as well as local variables. The name suggests that this behavior is wrong, but it is necessary to make exec and eval work with free variables. The function returns the names visible in the current scope.
The discussion here is actually about much more than locals(). The same basic issues arise when you use exec or import * in a block or when you use the debugger or some other trace function installed via sys.settrace(). In all these cases, we extract variables in a dictionary to support introspection.
There is still a problem with locals(), which results from the use of locals() in debugging. The implementation makes sure that changes made to the locals dict are reflected in the running program in many cases. In old versions of Python, local variables were stored in a dictionary and locals() returned the actual dictionary. This feature is useful in the debugger. When the implementation changed to use a simple C array instead of a dict, the interpreter arranged to copy variables back and forth between the dictionary and the array when the debugger was used. (This behavior is also needed to make features like exec and import * work.)
In many cases, it is fine to copy free variables into the dictionary returned by locals. If changes are made to those variables, they can be written back to the appropriate part of the closure rather than creating local variables that shadow the free variables.
Class namespaces pose a serious problem in the current CPython implementation. The class stores local variables in a dictionary. When the body of the class finishes execution, this dictionary is passed to the class constructor. Keys in the dictionary become attributes of the class. If you call locals(), free variables could be copied into the dictionary for access in the debugger or introspection. But if they are copied, they will become attributes of the class, which was not intended. It's a messy problem, because it is possible, though inscrutable, for
the same name to be used for a free variable in a method and a class attribute; they have the same name, but they refer to different bindings.
For now, we are fixing this in Python 2.x by omitting free variables from the dictionary returned by locals() when locals() is called in a class block. This makes introspection more difficult but prevents locals() from polluting the class namespace.
What are some other solutions for the class problem? It might be possible to return a copy of the class dictionary with the free variables added. Changes to this dictionary are written back to the real dictionary or free variables when you run in the debugger. It wouldn't be possible to reflect all changes to the dictionary; it would only work in contexts like debugging or using exec. I'm not sure if that would be too confusing.
We could also make two different functions, where locals() and vars() seem like reasonable names. The locals() could return the actual dictionary object for classes, without adding the free variables. The vars() could return a dictionary will all variables, but it would be a copy. Then client code could get whatever they wanted.
PyCon Pictures
My first Python conference was almost ten years ago, at a hotel in downtown San Jose. Guido introduced Python 1.5 that year, and Jim Hugunin introduced JPython. I ran the very first lightning talks session. Ten years later, there are still a lot of familiar faces, but the conference is much bigger. The community is larger and their interests more diverse. The keynote speakers were variously engaging and entertaining, probably equally Randy Pausch's talk in 2000.
I attended a few talks, fixed a few bugs, and got a chance to re-engage with Python development. I have had little time for Python since 2.5 was released and the compiler rewrite finished. I'm looking forward to a full sprint day to get some code written. I had hoped to fix some obscure bugs, but Armin Rigo's bug reports were too obscure to get to the bottom of with just an hour's study.
A small group of python-devers and Googlers had an over-the-top dinner at the Mansion on Turtle Creek.
I've collected a few PyCon pictures over the week, snapshots of friends, speakers, and other Pythonistas. Update: I added a few pictures from the sprints on Monday (2/26/07).
Subscribe to:
Posts (Atom)