Sunday, August 27, 2006

An example code transformation

I wrote a simple transformer that works for a trivial case. It demonstrates the basic structure of a transformer and highlights some hard issues still to resolve. It is much too hard to generate the AST nodes (and corresponding tokens) for the newly generated code.

Let's start with the easy part. The HasKeyTransformer looks has all calls of a has_key attribute. (There should be better type analysis to try to cope with cases where there is a non-dict with a has_key() method.) It starts with an in-order walk of the tree looking for has_key() calls. The search() method performs the search, defering to the is_has_key_call() predicate to test each node. The predicate looks for a Call node, where the function being called is an attribute of another object; i.e. the call is of the form x.y(...). If the y attribute is has_key, then the predicate returns true.
import concrete

class HasKeyTransformer:

def search(self, tree):
for node in concrete.TreeIter(tree):
if self.is_has_key_call(node):
yield node

def is_has_key_call(self, node):
if not isinstance(node, _ast.Call):
return False
callee = node.func
if not isinstance(callee, _ast.Attribute):
return False
return callee.attr == "has_key"
The next step is to replace the Call node with a Compare node, using the object with the has_key method as the right-hand side of an "in" comparison. The new Compare node can re-use some of the old tokens and AST nodes but needs to generate its own "in" token. In some cases, it may need to add parentheses, too. The code turns ugly here.
    def replace(self, node):
# Replace the Call node with a Compare node. The
# base of the callee becomes the RHS and one argument
# in the call becomes the LHS.

the_dict = node.func.value
the_key = node.args[0]
strip_positions(the_key)

new = _ast.Compare()
new.left = the_key
the_in = _ast.In()

# We need to synthesize a full token for "in". Yuck!
the_in.concrete = [(3, "in")]
new.ops = [the_in]
new.comparators = [the_dict]
new.concrete = [new.left] + new.ops + new.comparators
return new
The strip_positions() function removes the old position information from all the tokens representing the key. We are moving the key to the left of "in," so none of those positions are correct. We'll have to count on untokenize to do something reasonable. We ought to be able to preserve relative spacing, but that's a detail. It's hard to hook up the new node, because we need to fill in the concrete list and all the other node-specific attributes. It should be possible to generate helper functions that take the concrete tokens and nodes and hook up all the attributes correctly. The code will look better then. We could also have token helper functions so that you can avoid writing tuples.
    def replace(self, node):
# Replace the Call node with a Compare node. The
# base of the callee becomes the RHS and one argument
# in the call becomes the LHS.

the_dict = node.func.value
the_key = node.args[0]
strip_positions(the_key)

return concrete.Compare(the_key, [token.STRING("in")], [the_dict])

No comments: