In the previous part I wrote about implementation of LeoTreeModel class and its fields. First instances of LeoTreeModel were made from Leo’s VNode instances. Now let’s enable building LeoTreeModel directly from .leo xml files.
Loading from Leo xml file
We already have function that builds LeoTreeModel based on a sequence of tuples that define each node in outline see (nodes2treemodel
) in previous part. This function expects as its only input argument array of tuples in the following form:
This values can be retrieved from Leo xml file. Iterating over <vnodes>
child elements we can build required array of tuples. Values for each tuple element can be deduced from <vnodes>
elements all except b
-value which is the text of <t>
element with the same gnx. All <t>
elements are children of one unique <tnodes>
element.
For parsing xml I have used xml.etree.ElementTree module which I think is part of standard Python library.
import xml.etree.ElementTree as ET
def loadLeo(fname):
'''Loads given xml Leo file and returns LeoTreeModel instance'''
with open(fname, 'rt') as inp:
s = inp.read()
xroot = ET.fromstring(s)
vnodesEl = xroot.find('vnodes')
tnodesEl = xroot.find('tnodes')
return xml2treemodel(vnodesEl, tnodesEl)
Leo xml file format is not fully symmetric which requires that we have two different iterators: one for every <v>
element, and the second one for iterating top level nodes which are children of <vnodes>
element.
def xml2treemodel(xvroot, troot):
'''Returns LeoTreeModel instance from vnodes and tnodes elements of xml Leo file'''
parDict = defaultdict(list) # accumulates parent gnxes for each node
hDict = {} # accumulates headlines for each node
# contains body for each node
bDict = dict((ch.attrib['tx'], ch.text or '') for ch in troot.getchildren())
xDict = {} # will keep references to Element instances for iterating clones
# here come two utility iterators
@others
nodes = tuple(riter())
return nodes2treemodel(nodes)
And as a child nodes we have two different iterators:
def viter(xv, lev0):
s = [1] # first element of this array is counting how many children
# there are in under this node i.e. its size
gnx = xv.attrib['t']
if len(xv) == 0:
# clone nodes doesn't contain vh element nor any children
# so we have to reiterate the first of all clones that
# has both vh and other children
for ch in viter(xDict[gnx], lev0):
yield ch
return
# not a clone, we have encountered a new node
xDict[gnx] = xv
hDict[gnx] = xv[0].text
chs = [ch.attrib['t'] for ch in xv if ch.tag == 'v']
for ch in chs:
parDict[ch].append(gnx)
mnode = [gnx, hDict[gnx], bDict.get(gnx, ''), lev0, s, parDict[gnx], chs]
yield mnode
for ch in xv.getchildren():
if ch.tag != 'v':continue
for x in viter(ch, lev0 + 1):
s[0] += 1
yield x
This iterator will be used for every <v>
element. However, for top-level elements that are not children of <v>
element but of <vnodes>
element, we have to make different iterator r(oot)iterator
.
def riter():
s = [1]
chs = []
yield 'hidden-root-vnode-gnx', '<hidden root vnode>','', 0, s, [], chs
for xv in xvroot.getchildren():
gnx = xv.attrib['t']
chs.append(gnx)
parDict[gnx].append('hidden-root-vnode-gnx')
for ch in viter(xv, 1):
s[0] += 1
yield ch
This iterator invokes the first one for each top level vnode, and finally gives us tuple of node tuples that we can pass to nodes2treemodel
function.
Reading external files
After this first pass we will have outline only. All children of @file
nodes are still missing. In July 2017, I wrote two functions for reading and writing external files in Leo. They relied on VNode and Position methods so they need to be adjusted for use in new LeoTreeModel. However, we can keep their overall structure.
load_derived_file(lines)
takes as input lines of text from derived file and returns generator of tuples (gnx, h, b, level)
. It has five distinct phases:
- handling first lines and header of derived file
- creating necessary regexes
- init topnode
- iterate input lines
- yield collected nodes
Phase 1: handling first lines and header
header_pattern = re.compile(r'''
^(.+)@\+leo
(-ver=(\d+))?
(-thin)?
(-encoding=(.*)(\.))?
(.*)$''', re.VERBOSE)
for i, line in flines:
m = header_pattern.match(line)
if m:
break
first_lines.append(line)
else:
raise ValueError('wrong format, not derived file')
delim_st = m.group(1)
delim_en = m.group(8)
Nothing too special about this phase. We simply read lines and collect them in the first_lines list until we encounter header line. When we have header line we deduce start and end delimiters.
Phase 2: creating regexes
Once we know start and end delimiters we can make some patterns that can be used for parsing remaining lines.
def get_patterns(delim_st, delim_en):
if delim_en:
dlms = re.escape(delim_st), re.escape(delim_en)
ns_src = r'^(\s*)%s@\+node:([^:]+): \*(\d+)?(\*?) (.*?)%s$'%dlms
sec_src = r'^(\s*)%s@(\+|-)<{2}[^>]+>>(.*?)%s$'%dlms
oth_src = r'^(\s*)%s@(\+|-)others%s\s*$'%dlms
all_src = r'^(\s*)%s@(\+|-)all%s\s*$'%dlms
code_src = r'^%s@@c(ode)?%s$'%dlms
doc_src = r'^%s@\+(at|doc)?(\s.*?)?%s$'%dlms
else:
dlms = re.escape(delim_st)
ns_src = r'^(\s*)%s@\+node:([^:]+): \*(\d+)?(\*?) (.*)$'%dlms
sec_src = r'^(\s*)%s@(\+|-)<{2}[^>]+>>(.*)$'%dlms
oth_src = r'^(\s*)%s@(\+|-)others\s*$'%dlms
all_src = r'^(\s*)%s@(\+|-)all\s*$'%dlms
code_src = r'^%s@@c(ode)?$'%dlms
doc_src = r'^%s@\+(at|doc)?(\s.*?)?'%dlms + '\n'
return bunch(
node_start = re.compile(ns_src),
section = re.compile(sec_src),
others = re.compile(oth_src, re.DOTALL),
all = re.compile(all_src, re.DOTALL),
code = re.compile(code_src),
doc = re.compile(doc_src),
)
patterns = get_patterns(delim_st, delim_en)
Phase 3: start top node
First we need a place to collect all data.
nodes = bunch(
# level must contain lists of node levels in order they appear in input
# this is to support at-all directive which will write clones several times.
level = defaultdict(list),
# contains headline for each node
head = {},
# contains lines of body text for each node
body = defaultdict(list),
# this is list which will store the order of nodes in derived file
# that is the order in which we will dump nodes once we have consumed
# all input lines
gnxes = [],
)
topnodeline = flines[len(first_lines) + 1][1] # line after header line
m = patterns.node_start.match(topnodeline)
topgnx = set_node(m)
# append first lines if we have some
nodes.body[topgnx] = ['@first '+ x for x in first_lines]
assert topgnx, 'top node line [%s] %d first lines'%(topnodeline, len(first_lines))
# this will keep track of current gnx and indent whenever we encounter
# at+others or at+<section> or at+all
stack = []
in_all = False
in_doc = False
# spelling of at-verbatim sentinel
verbline = delim_st + '@verbatim' + delim_en + '\n'
verbatim = False # keeps track whether next line is to be processed or not
where set_node
is like this:
@ utility function to set data from regex match object from sentinel line
see node_start pattern. groups[1 - 5] are:
(indent, gnx, level-number, second star, headline)
1 2 3 4 5
returns gnx
@c
def set_node(m):
gnx = m.group(2)
lev = int(m.group(3)) if m.group(3) else 1 + len(m.group(4))
nodes.level[gnx].append(lev)
nodes.head[gnx] = m.group(5)
nodes.gnxes.append(gnx)
return gnx
Phase 4: iterating lines
# we need to skip twice the number of first_lines, one header line
# and one top node line
start = 2 * len(first_lines) + 2
# keeps track of current indentation
indent = 0
# keeps track of current node that we are reading
gnx = topgnx
# list of lines for current node
body = nodes.body[gnx]
for i, line in flines[start:]:
# child nodes may if necessary shortcut this loop
# using continue or let the line fall through to
# the end of loop
... handle verbatim lines
... handle indentation
... handle at-all
... handle at-others
... handle at-doc
... handle at-code
... handle sections
... handle node start
... handle at-leo line
... handle directives
... handle in-doc parts
# nothing special about this line, let's append it to current body
body.append(line)
if i + 1 < len(flines):
nodes.body[topgnx].extend('@last %s'%x for x in flines[i+1:])
All those handle ...
parts are subnodes of this for loop. There is nothing special about them. They start with some check if they should be applied to current line or not. If so, they do what they need to do and end with continue
statement. If current line is not handled by any of handle...
nodes, then it is simply appended to the current body.
When we encounter line with closing leo header (“@-leo
”), we break out of the loop, and remaining lines (if any) are appended to the top-level node as @last
lines.
Phase 5: yielding results
Finally we can just dump all collected data in the outline order.
for gnx in nodes.gnxes:
b = ''.join(nodes.body[gnx])
h = nodes.head[gnx]
lev = nodes.level[gnx].pop(0)
yield gnx, h, b, lev-1
Extending load_derived_file
to get the sequence of node tuples suitable for building LeoTreeModel is straightforward. We just need to collect data about parents/children relations and calculate subtree size for each node.
def ltm_from_derived_file(fname):
'''Reads external file and returns tree model.'''
with open(fname, 'rt') as inp:
lines = inp.read().splitlines(True)
parents = defaultdict(list)
def viter():
stack = [None for i in range(256)]
lev0 = 0
for gnx, h, b, lev in load_derived_file(lines):
ps = parents[gnx]
cn = []
s = [1]
stack[lev] = [gnx, h, b, lev, s, ps, cn]
if lev:
# add parent gnx to list of parents
ps.append(stack[lev - 1][0])
if lev > lev0:
# parent level is lev0
# add this gnx to list of children in parent
stack[lev0][6].append(gnx)
else:
# parent level is one above
# add this gnx to list of children in parent
stack[lev - 1][6].append(gnx)
lev0 = lev
# increase size of every node in current stack
for x in stack[:lev]:
x[4][0] += 1
# finally yield this node
yield stack[lev]
nodes = tuple(viter())
return nodes2treemodel(nodes)
To be continued
In the next part I will write about adding some methods to data model to implement outline commands.