Functional programming for prototyping

From recent discussions on leo-editor-group I have been inspired to write in more details and try to better explain my point of view on efficient prototyping. Author of Leo (Edward K. Ream) is a man whose energy and dedication to this project, attracted many people from everywhere, and formed very pleasant, active and healthy web community. This community nourished and cherished by its founder is very interesting place to be, especially if you are in search for an inspiration.

Although there are times when I can’t take an active part in discussions, I am reading it very often. Few times I tried to share some ideas about programming on this forum and some of them were accepted. But in some cases I could not make my point clear. Generally I find very difficult to clearly explain coding ideas in plain English. More than once it turned out that just showing code does much better job in an idea explanation than writing thousand words.

In recent discussion I wrote:

Above all, I would suggest for any prototyping purposes to avoid classes as much as possible.

And Edward replied:

I could not possibly agree less with this statement.

At the same time code snippets editor in Google groups started to act strangely. It was impossible (or very hard) to copy and paste code examples. So, I have decided to use pelican tool to write and publish some articles about coding, software engineering with hopefully pretty formated code examples and maybe few diagrams.

Example problem

I didn’t want to insist on the same problem we were discussing recently. It would be easier, I thought, to show my ideas on a problem that is not currently in focus. I remember reading dozen times how Edward was blaming his own code. Googling for “complex” in leo-editor-group, gave me some ideas what to choose for example problem.

Among several other threads was this one , about c.importCommands that hopefuly would provide a good example. Without looking in actual code, and without any preparation, I am guessing that even now (after cleaning and simplifying performed about nine months ago), this part of code can be substantially simpler. It could, however, happen that code turns out to be already in simplest possible form. In that case, I will have to look for some other domain to find example.

Let’s begin

First let us check how these import commands work now. A few years ago I have tried to import bunch of javascript files using at-auto but the results of that import were poor and I had to import those files manually. Those were the files of RPG game exported by RPG Maker MV. The largest one is about 300k and total size is about 1M.

Here is a script to import largest file rpg_objects.js and messure time spent in createOutline method of c.importCommands.

import timeit
fileName = '/tmp/rpg_objects.js'
p1 = p.insertAfter()
p1.h = '@auto ' + fileName
def doit():
    c.importCommands.createOutline(fileName, parent=p1.copy())
t = timeit.timeit(doit, number=1)
p1.doDelete(p)
g.es('ok', t, 's')
c.redraw()

It takes about 30s for my computer to import this file. Geany opens this file almost instantaneously. It has 8549 lines. It is not only slow import that bothers me, but also after importing this file, Leo has become unresponsive. For each mouse click or key press it was blocked for more than 10s. I could barely delete imported node.

After this initial experiment, I am almost sure that there is a bug somewhere in Leo. It is really hard to believe that any bug free code would be so slow. But, even if there is a bug that causes 25s of total processing time, I believe that there must be a way to have a better import done in less than 2s. After all executing this file using nodejs interpreter is blazingly fast. Nodejs has to parse file before execution and even if the nodejs parser is written in C/C++, Python should not be too much slower than that. Hopefully, I will prove it.

First impressions

Before I even start prototyping, while I was looking for createOutline method to test it, I have found that code is like spiders web. When user executes command to read at-auto file(s), Leo starts to collect pieces of information from several objects jumping back and forth before it finally gets to the importing code. Leo application g.app contains atAutoDict, which is populated in LoadManager, and atFileCommands.readOneAtAutoNode calls createOutline in importCommands. On the other hand, importCommands in some places calls readOneAtAutoNode from atFileCommands. Finally inside createOutline parsing of file is dispatched to unknown class defined in some leo.plugins.importers.* submodule. The only way you can find which class is responsible for parsing your file, is to open and read those modules searching for javascript or js. Luckily, modules are named mostly by language they import. But what if any of those modules contain parser/scanner for two or more languages? Or if you are looking for less importer which is subtype of css?

Terrifying! But, enough complains, let’s concentrate on solving problem! There must be a better way!

Initial idea

My initial idea (which may be totally abandoned in future work), is to develop a function that takes as argument position and perhaps some configuration dictionary. The body of given position should contain the source code. This function should return a generator of tuples. Each tuple will have a line of source code, headline of new node when this line is first line of a new node and level of new node relative to input position.

The other part of soultion would be to have a function which iterates generator returned by the previous function and build outline populating bodies with lines from generator. This function can be universal for all types of import.

def outline_generator(p, conf):
    lines = tuple(enumerate(p.b.splitlines(False)))
    for line, head, level in process_lines(lines):
        yield line, head, level
        # head can be False, or str headline for new node
        # level can be:
        #    +1 when new node should be child of current node,
        #    -1 if new node should be added as a sibling of
        #       current node parent
        #     0 when new node should be added as a sibling of
        #       current node

def build_outline(p, it):
    stack = [p]
    p1 = p.copy()
    buf = []
    for line, head, level in it:
        if not head:
            buf.append(line)
        else:
            p1.b = '\n'.join(buf)
            buf = [line]
            if level == 0:
                p1 = stack[-1].insertAsLastChild()
            elif level == 1:
                stack.append(p1)
                p1 = p1.insertAsLastChild()
            elif level == -1:
                stack.pop()
                p1 = stack[-1].insertAsLastChild()
            else:
                # this should never happen
                raise ValueError('unsupported level value %r'%level)
            p1.h = head
    p1.b = '\n'.join(buf)

In production build_outline should probably operate on vnodes instead of positions. Most likely it should build tuple like from sqlite3 table and use a function from fileCommands to build subtree.

Processing budget

Better import means more processing, which requires more time. If we want it to be fast enough there is an upper limit of processing we can afford. There maybe several strategies that we can use for dividing source code into blocks. Ideally, import would try them all and compare results using some score function defined by the user preferences, searching for the one with the highest score. However, it may be that such a clever import would require too much time. In that case solution maybe to make every strategy as fast as possible and let user see the preview results of each strategy and manually choose one. Let’s try to find out how much processing we can afford.

import timeit
import re
def get_source():
    '''Read and return contents of largest js file'''
    return open('/tmp/rpg_objects.js', 'rt').read()
patterns = [
    re.compile(r'\s*((\w+)(\.(\w+))*)\s*\('),
    re.compile(r'\{([^}]+)\}'),
    re.compile(r'(\d+)(\s*,\s*\d+)*'),
]
def dummy_processing():
    src = get_source()
    lines = []
    findex = 0
    for i, line in enumerate(src.splitlines(False)):
        ms = []
        for  pat in patterns:
            ms.append(tuple(m for m in pat.finditer(line)))
        lines.append((i, findex, line, tuple(ms)))
        findex += len(line) + 1

def test(f, num):
    t = timeit.timeit(f, number=num) * 1000 / num
    g.es('average %5.1f ms'%t)

c.frame.log.selectTab('Log')
c.frame.log.clearLog()
test(dummy_processing, 100)

To run this script it is necessary to have javascript file rpg_objects.js in /tmp/ folder (here you can find all javascript files. Windows users should choose similar path suitable for their OS. On my machine this script runs for about 40s.

average 408.6 ms

Conclusion: it is most likely possible to import this file in less than half a second. In its current version leoImport.ImportCommands, Leo needs more than 30s to import this file and becomes unresponsive afterwards.

To be continued

Art of computing