Monday, February 26, 2007

Bug of the day

Mike Verdone found a great Python bug today. It's an interaction between two features.
  • Name mangling. If you use an identifier with two leading underscores in a class, the name is mangled to include the name of the class as a prefix, except if the name also ends with two trailing underscores. In the class Spam, the identifier __eggs becomes _Spam__eggs. Note that the language reference says that mangling applies to any identified
  • Packages. You can import a name like A.B.C. This name describes a module C in package B inside package A. Note that A, B, and C are identifiers, so if one of them has two leading underscores they should be mangled. import __A.B.C should technically be turned into import _Spam__A.B.C if it occurs inside class Spam.
One detail of the implementation is that the compiler represents the sequence of identifiers in a package name as a single string. The compiled code represents the dotted name as a single string argument to a bytecode instruction. The implementation of name mangling checks for strings that start and end with double underscores. Therein lies the problem. The string "__A__.B.C" starts with two underscores, but does not end with two underscores. It is mangled to "_Spam__A__.B.C."

How old is this bug? It is present in Python 1.5.2 and was probably present in the original release of Python 1.5. The bug is about nine years old.

We're going to do a quick fix: Change name mangling so that it does not mangle names with dots in them. This will fix the __A__.B.C case, break the __A.B.C case, and leave broken the A.__B.C case. It is likely that we will change the language spec and say identifiers in import statements are not mangled.

No comments: