Leo Tree Model - new approach

What would Leo’s data model look like if I were to build it now? Leo has a long history and its current data model is the result of the evolution process. Many different ideas were combined and applied, and some of them were later abandoned. All these ideas left their trails on the model and even though we can say that it is a solid piece of code with a good performance, it isn’t very developer friendly. Using it makes your code more complex than is necessary. At least that is what I believe to be true. That is why I decided to try to make my own version of data model suitable for representing Leo’s trees.

What I like about Leo’s current tree model is how easy it is to make changes in tree. However, it makes catching the information about changes almost impossible, or at least it requires lots of processing. That makes updating view very hard. That is why I intend to make my model more centralized. Data model will provide all necessary operations for tree modification, and those operations should be the only way to modify tree.

The other thing I would try to achieve is some stability of positions in tree. If a tree is modified some positions may become invalid, but those positions that point to the nodes that are still part of the tree should be valid even after the tree has been changed. For example if we have a position that points to one specific node in the tree, and if this node, or any of its ancestors is deleted, then model should tell us that this position doesn’t exist any more. But if the node or any of its ancestors, is moved to some other place in tree, this position should still point to the same node and remain valid.

Theory of operation

Suppose that model keeps a list of all nodes in outline order. Positions would be represented by indexes in this list. But, if order of nodes is changed, those indexes would point to the wrong node. But if we add one more layer of indirection we can have immutable and persistent positions. Data model need to keep list of position names. Now, the outside code would know only about position names. Those names will be internaly in data model translated in integer indexes that would point to the specific node. If position name is removed from this translation list, then it becomes invalid which would mean position does not exist any more. If node is moved, position name should be moved accordingly and it will remain valid even after node has been moved.

... little bit later

Update

I wished to write about this mini-project simultaneously as I develop it. But once, I have started to code, I just couldn’t stop until almost all intended features were completed. Now, I can only describe what I’ve got in the end.

The code in its current version can be found here: leo-tree-model.leo.

First some helpers

To have some tree to work on it was natural to use existing Leo tree that I could see and examine easily. So, my coding started with a few helper functions for converting Leo trees to new model.

Here is the start of LeoTreeModel class definition:

class LeoTreeModel(object):

    def __init__(self):
        self.positions = []
        self.nodes = []
        self.attrs = {}
        self.levels = []
        self.gnx2pos = defaultdict(list)
        self.parPos = []

positions is a list of position names. Each element in this list corresponds to one unique position in outline. As an implementation detail values are simple floats.
nodes is a list of node identificators. Implementation detail elements are just p.gnx values.
attrs is a dict containing values attached to each unique node, (its headline, body, list of parent and children identificators and size of subtree).
levels is a list containing level for each node in outline. This is a value attached to a position rather than a node. For example one node may be several places in outline each on different level.
gnx2pos - is a dictionary of lists. Keys are gnx node identificators and values are lists of position identificators, places that one particular node can be found in outline. Nodes that are not clones have a list with the single position, while clones have more than one element in this lists.
parPos is a list of parent position identificators. For each node in outline, corresponding element in this list is a position identifier of node’s parent.

def vnode2treemodel(vnode):
    '''Utility convertor: converts VNode instance into
       LeoTreeModel instance'''
    def viter(v, lev0):
        s = [1]
        mnode = (v.gnx, v.h, v.b, lev0, s,
                [x.gnx for x in v.parents],
                [x.gnx for x in v.children])
        yield mnode
        for ch in v.children:
            for x in viter(ch, lev0 + 1):
                s[0] += 1
                yield x
    return nodes2treemodel(tuple(viter(vnode, 0)))

def nodes2treemodel(nodes):
    '''Creates LeoTreeModel from the sequence of tuples
        (gnx, h, b, level, size, parentGnxes, childrenGnxes)'''
    ltm = LeoTreeModel()
    ltm.positions = [random.random() for x in nodes]
    ltm.levels = [x[3] for x in nodes]
    ltm.parPos = list(parPosIter(ltm.positions, ltm.levels))
    ltm.nodes = [x[0] for x in nodes]
    gnx2pos = defaultdict(list)
    for pos, x in zip(ltm.positions, nodes):
        gnx2pos[x[0]].append(pos)
    ltm.gnx2pos = gnx2pos
    for gnx, h, b, lev, sz, ps, chn in nodes:
        ltm.attrs[gnx] = [h, b, ps, chn, sz[0]]
    # root node must not have parents
    rgnx = nodes[0][0]
    ltm.attrs[rgnx][2] = []
    return ltm

Here is a utility iterator that yields elements of parPos list:

def parPosIter(ps, levs):
    '''Helper iterator for parent positions. Given sequence of
       positions and corresponding levels it generates sequence
       of parent positions'''
    rootpos = ps[0]
    levPars = [rootpos for i in range(256)] # max depth 255 levels
    it = zip(ps, levs)
    next(it) # skip first root node which has no parent
    yield rootpos
    for p, l in it:
        levPars[l] = p
        yield levPars[l - 1]

Accessing nodes data

Now that all necessary data is acquired and organized let’s see how we can access information.

To get a list of children identificators and parents identificators for any given node:

def parents(self, gnx):
    '''Returns list of gnxes of parents of node with given gnx'''
    a = self.attrs.get(gnx)
    return a[2] if a else []

def children(self, gnx):
    '''Returns list of gnxes of children of the node with given gnx'''
    a = self.attrs.get(gnx)
    return a[3] if a else []

The elements of these lists are not v-nodes like in Leo, but rather just identificators of nodes (gnx).

To access h, b of any given node, similar access functions can be written.

Traversing outline

To visit every position in outline one can simply use nodes iterator.

If we have a position and want to visit parent and all ancestors:

def parents_iterator(self, p):
    i = self.positions.index(p)
    while i:
        p = ltm.parPos[i]
        yield p
        i = self.positions.index(p)

Subtree iterator:

def subtree_iterator(self, p):
    i = self.positions.index(p)
    gnx = self.nodes[i]
    sz = self.attrs[gnx][4]
    for j in range(i+1, i+sz):
        yield self.positions[j]
        # or yield self.nodes[j]
        # or yield j
        # depending on what we need

Following siblings iterator:

def following_siblings_iterator(self, p):
    i = self.positions.index(p)
    lev0 = self.levels[i]
    N = len(self.nodes)
    while self.levels[i] == lev0:
        gnx = self.nodes[i]
        sz = self.attrs[gnx][4]
        i += sz
        if sz >= N:
            break
        yield self.nodes[i]
        # or yield self.positions[i]
        # or yield i
        # depending on what we need

As you can see traversals involve only plain element access and integer arithmetic. This allows much faster traversals than Leo can achieve using its position class. Position class for traversals relies on list manipulation methods: building new lists, or appending elements to it. Accessing ivars of vnode directly may seem to be faster but in essence it is implemented in Python in the same way as accessing values from dict.

When order of traversal is not important, but we want to visit every node just once we can traverse attrs keys. As an implementation detail in Python3.6 and newer dict keeps keys in order in which they were added to dict. This gives traversal of unique nodes a nice outline order.

I haven’t implemented any of these traversal methods in LeoTreeModel class, because they are easy to implement, and custom iterators may be better suited for the job we need to do. Sometimes it is more suitable to iterate gnx values, sometimes it is positions that we are interested in or just indexes in lists. Index is fastest to use but it is valid only as long as tree remains unchanged. Positions are immune to tree changes to some degree. And using gnxes allows fast access to children, parents, h, b…

In the following part I will be discussing loading outlines both from xml and from external files.