changed 5 years ago
Published Linked with GitHub

Magic symbol table (2020-12-16)

Discussion between Enrique and Hannes.

Problems discussed:

  • When is the symbol table available?
  • Can we construct it in a transformation pass for the new tree?
  • Related: When can we access SymbolDecl information in a SymbolRef?

Current approach is simple:
In construction of a node with SymbolTableTrait, we run a pass that collects all symbols and stores it in a field in the node.
Consequences:

  • In a NodeTranslator (which creates a new tree from an old tree) we don't have access to the new symbol table during visit.
  • Example: we want to know the dtype of a FieldAccess (which is a SymbolRef to a FieldDecl which has that information)/

Post visit construction

The main problem is that we construct nodes bottom up, i.e. the node with the Symbol table is only constructed after the nodes with Symbols and SymbolRefs.

Handmade symbol table

  • Create a SymbolTable dict manually in each node which will create a node with SymbolTableTrait and pass it around via kwargs.
  • Add each Symbol declaration to this object.
  • For each creation of a SymbolRef lookup the object.
  • Attach and validate the SymbolTable when constructing the node with SymbolTableTrait.

Automatize this approach

  • In the dispatcher, before the call to the user visit, create a SymbolTable when the user-defined visit is going to create a node with SymbolTableTrait. This could be indicated by a decorator by the user or derived from a required return-type-annotation.
  • In the dispatcher, after the call to the user visit
    • check if the created node (or elements of a collection) contains a Symbol and add it to the SymbolTable.
    • check if the created node contains a SymbolRef, validate that a Symbol exists in the SymbolTable and link from SymbolRef to Symbol.

Problems

  • What if a visit creates more than one SymbolTable (i.e. returns a collection of nodes with SymbolTableTrait).

Next day comment from Enrique

I've been thinking about our conversation from yesterday and it feels to me like we're trying to use ideas from different approaches that do not mix well, and thus we should decide which approach we follow. For example, if we want symbol tables embedded in the tree, then I guess we should either have immutable trees that are always visited in the right order, or we have lazy resolution of the SymbolRefs on access by traversing the tree upwards to look for a symbol table (like MLIR seems to do, which means we would need a link to the parent in the node).
Also, maybe we should try to use SSA for all temporaries from the beginning, since we probably need to do it anyway later for dataflow analysis and it would reduce the number of symbol names.
A more concrete idea we could try is to implement field validators which do not run automatically at initialization (e.f. @validator('dtype', auto=False)) and add an explicit validate() method that can be call at any time. In this way we could keep the natural post-order visitor to create nodes and still validators for fields which require parent information, calling children's validate() from the parent.

Isolated from above (2021-01-06)

C++ lambda example: create a new "root"-scope but inject some symbols from some other scope. In the case of c++-lambda global scope is injected and captures.

Automatic temporary context creation (2021-01-06)

import collections.ChainMap def visit_OuterScope(self, ...) -> NodeWithSymbolTable: context = ChainMap({}) self.visit(self.child, context=context) node_with_symbol_table = NodeWithSymbolTable(...) assert context.maps[0].keys() in node_with_symbol_table.symbol_table_.keys() return node_with_symbol_table def visit_InnerScope(self, context) -> InnerNodeWithSymbolTable: # automatically create Context object inner_ctx = Context() self.visit(self.child_def, context=[*context, inner_ctx]) self.visit(self.child_use, context=[*context, inner_ctx]) inner_node_with_symbol_table = InnerNodeWithSymbolTable(...) assert inner_ctx.all_symbols in inner_node_with_symbol_table.symbol_table_ return inner_node_with_symbol_table def visit_ChildDef(self, ..., context)-> NodeWithSymbol: node_with_symbol = NodeWithSymbol(name="symbol1") context[-1].register(node_with_symbol.name, node_with_symbol return node_with_symbol def visit_ChildUse(self, ..., context)-> NodeWithRef: node_with_ref = NodeWithRef(name="symbol1") context.chained_check(node_with_ref.name) return node_with_ref

Modification

I try to fix the following problem in the above design: SymbolRefs should point to the correct symbol at any time.

@create_scope def visit_OuterScope(self, ..., *, symbol_table) -> NodeWithSymbolTable: # automatically create Context object # symbol_table.new_scope() outer_tbl = SymbolTable() #inner_symbol_table = symbol_table.create_scope() #b = InnerNodeWithSymbolTable.builder() #b.inner_symbols = NodeWithSymbol() #b.inner_refs = NodeWithRefs(b.symtable_) #b.inner_refs = NodeWithRefs.builder().name = "" #AnotherNode(refs=b.inner_refs) #inner_node = b.build() #return NodeWithSymbolTable(InnerNodeWithSymbolTable(symbols=inner_symbols, refs=inner_refs())) #return NodeWithSymbolTable(InnerNodeWithSymbolTable(symbols=NodeWithSymbol(), refs=NodeWithRefs())) self.visit(self.child, symtable=[outer_tbl]) #symbol_table.pop() node_with_symbol_table = NodeWithSymbolTable(..., outer_tbl) # validate consistency inside (root validator of SymbolTable): # assert outer_ctx.all_symbols in node_with_symbol_table.symbol_table_ return node_with_symbol_table def visit_InnerScope(self, *, symtable, **kwargs) -> InnerNodeWithSymbolTable: # automatically create Context object inner_tbl = SymbolTable() self.visit(self.child_def, #symtable=[*symtbl, inner_tbl]) self.visit(self.child_use, symtable=[*symtbl, inner_tbl]) inner_node_with_symbol_table = InnerNodeWithSymbolTable(..., inner_tbl) # validate consistency inside (root validator of SymbolTable): # assert inner_ctx.all_symbols in inner_node_with_symbol_table.symbol_table_ return inner_node_with_symbol_table def visit_ChildDef(self, ..., symtable, builder_of_parent)-> NodeWithSymbol: node_with_symbol = symtable.register(NodeWithSymbol(name="symbol1")) return node_with_symbol def visit_ChildUse(self, ..., symtable)-> NodeWithRef: node_with_ref = NodeWithRef(name="symbol1", tbl=symtable) #or symtable[1] # creates the SymbolRef to the correct Symbol (or SymbolTable) return node_with_ref

Features:

  • With this approach you can at any time access the Symbol from a SymbolRef.
  • The validator in a SymbolTable node will check if the attached SymbolTable is actually consistent with the subtree. In case it is not, the error would be:
    • "Somewhere you registered the following Symbol Symbolname which is not in the tree."
  • We could also check if a SymbolRef was created but not attached (so probably less useful).

Addendum (Hannes)

It seems we don't need to store a SymbolTable object anywhere in the tree. Only during tree construction we need to keep it around (to create SymbolRefs), because we can just reference the SymbolNode from the SymbolRef directly. I don't see an advantage of going via SymbolTable. It maybe solves the problem of reconstructing the tree (but maybe we need to scan for removed Symbols in that case, so not sure).

Questions

Can we automatically build the new SymbolTable + SymbolRefs in a Translator if we don't visit manually.

Here one category of Symbols is changed but the others should be there.

class SomeNode(SymbolTableTrait)
    symbols: List[Union[SymbolNode1,SymbolNode2]]
    refs: List[SymRefNode]
    
class SymRefNode:
    name: SymRef
    
class SymbolNode1:
    name: SymbolName
    
class SymbolNode2:
    name: SymbolName
    
class MyVisitor(NodeTranslater):
    def visit_SymbolNode1(self, node):
        return Nothing

Here the SymbolTable (i.e. symbols inside) are not changed, but we don't know so probably we have to reconstruct. Then we have to automatically set the link to the new entry in SymRef2 which is not touched.

class SomeNode
    symbols: List[SymbolNode]
    children1: List[SymRef1]
    children2: List[SymRef2]
    
class SymRef1:
    name: SymRef
    
class SymRef2:
    name: SymRef
    
class MyVisitor(NodeTranslater):
    def visit_SymRef1(self, node):
        return DontCare()

2021-01-21 Tried the above approach on oir2gtcpp

Here is a branch: havogt/symbol_table_experiments

It illustrates some problems. The main problem is that while constructing the new symbol table there is no tree yet.
This means you would attach to the symbol table but only later the node gets added to the tree, which could be inconsistent.

We discussed some ways construct the tree structure before hand, like

binop = BinaryOpBuilder(op=node.op) # empty node
# pass the current node to the to be constructed children
# since you have the list of parents when constructing the child
# you know where this node will be and you know in which symboltable the child should be
binop.left = self.visit(node.left, parents=[*parent, binop])
binop.right = self.visit(node.right, parents=...)
return binop.build()

another approach for the same thing

def visit_OldBinOp(self, node, parent, field_in_parent)
binop = register(gtcpp.BinaryOp, parent, field_in_parent)
self.visit(node.left, parent=binop, field_in_parent="left")
self.visit(node.right, parent=binop, field_in_parent="right")
commit(binop)

Both have the same problem that you attach partly constructed nodes in an extra step.

Select a repo