This is the second part of story about using functional programming for prototyping (first part is Here). Originally, my intention was to document whole process of prototyping. I wished not to finish prototype and then describe it after, but to show every step in my work-flow. However, soon after I started coding, I forgot about documenting each step, and after few hours my prototype was complete.

All javascript files are successfully imported in less than a second (average 626.3ms in 10 imports). After importing ten times all files Leo is still responsive. It still remains to clean and document my prototype. All that I can now say about prototyping process is that it was a pleasant and easy activity. While developing it I needed to look few times in the Python documentation about re, bisect, timeit, …


Indexing source file

After initial idea to split source in lines and operate on lines one by one, I ended up with manual splitting, function named so as opposed to using splitlines method. However, now I think a better name for this function would be create_source_index or something similar.

and in Log pane result:

    average  13.2 ms

This function makes an index of source file chunks of three different types:

  • code
  • string
  • comment

it returns an array of tuples (chunk_type, start_index, end_index). At first I considered chunks to be single line, but later it become clear that it is better to allow for multiline chunks.

Helper functions defined inside this function are very simple:

As you can see they are very similar, and I really just copy/pasted them adjusting each a bit. That may be considered violation of DRY principle, but it is very localized (all of them reside inside one function). Main argument of DRY principle is that later if you change one part of duplicated code, there is a possibility that you will forget to change the other parts of code. But, since all this repeated code parts are very close to each other, DRY principle violation becomes less danger.

Source file is searched with regex pattern that catches beginning of:

  • single line comment
  • multi line comment
  • single quote string
  • double quote string

Searching starts in code mode, and for each beginning of another chunk type, helper function of that type is called with the starting index as first argument. Every helper function returns the end of chunk and appends chunk to chunks array.

Searching for chunks

After indexing source file we need a function that finds chunk for any given position in source file. I was thinking to use bisect from standard library, but it turned out that it was faster to re-implement it than to figure out how to adjust chunks array to comply to bisect rules. Now, that I write this, I realize that it could be achieved very easily, the only thing that is needed is to change the order of tuple items from (type, start, end) to (start, end, type). And to use bisect functions not on integer position, but on tuple created from the integer argument of this function. But, that may be left for the polishing phase.

Searching for functions

Now, that we have functions to locate chunks, we can pursue our next target: finding structural blocks of source code. In javascript syntax parenthesis '()', '[]', '{}' must be properly nested. There may exists some of this characters in comment and string chunks, but they don’t count. The only ones that matter are those in code blocks. That was the reason for chunking source code in the first place.

First we are searching for the opening character (one of '(', '[', '{'). If there is no block openings, that means we are finished. Last block will be in that case form given position to the end of file.

If we find the opening character, then we continue searching from the character following opening one, counting levels as we encounter them, skiping all non code blocks. When we find closing character that resets lev to zero, we have the end index and so we return starting and ending index of the code block.

There are several ways to define function in javascript. There are so called named functions in the form:

But there are also anonymous functions and function expressions, which doesn’t have a name, but are assigned to a variable or passed as an argument to some other function. What is common to both cases is that they contain keyword 'function', followed by a single parenthesized block of arguments, followed by a function body which is block that starts with ‘{’ and ends with ‘}’. To find all function definitions we need to search all code blocks for a 'function' word. Whenever we find one, skip next block of arguments. The very next block contains body of function definition. End of this block is the end of function definition.

Begining can be either start of keyword function that we already have found, but may also be somewhere before. Especially if the definition is following after comment chunk. Also, if it is function expression then it is often followed by semicolon. Finally, it would be useful to have not only start and end index yielded, but headline for the node as well.

It would be natural to continue search right after the end of found block. Here is a get_all_functions definition:

There is more in this function. I have discovered two more use cases of the code blocks that should naturally be in separate nodes. One is a call to Object.defineProperty, and the other is Object.defineProperties. I have realized that they would be best put in separate nodes after trying to import files with the previous version of the function. In its final version displayed above, two different regex patterns are used. When we search source with one and the other, we can have no match at all, a single match, or two matches. Patterns are designed to return possible function name in first group. Here is a helper function that takes both match results and return tuple of three matches. The third one is the nearest match that is next to be processed.

When search has been passed after any of matches, m.end() < i, we search further for the corresponding pattern. If any of matches is None, this function would return the other match. If both of patterns did match, function returns nearest one.

Pay attention to this pattern. Helper function that are defined inside a function can access variables in outer functon scope, but they can’t change them. As a consequence, local variables in the toplevel function need not to be passed to helper functions, because they are directly visible in inner scope. However, if helper function needs to change outer variable the only way is to return its new value, and let the outer function to change variables. This is very powerful pattern!

Here are few more helper functions:

Building outline

Now, that we have a generator that extracts function definitions from source code, we can build outline tree from extracted regions.

And as always there are few helper functions:

And final profiling test of the very same file rpg_objects.js that Leo with its current implementation of c.importCommands, was struggling for more than 30s, gave the following results:

10 calls, average time 233.1 ms

Isn’t it success? Let alone fact that files were imported in far better form than the original Leo import gives.

To be concluded