A Multi-Phase Interpreter
To showcase the sophistication around the handling of processes, this documentation presents the implementation of an intepreter. The input language is small - a simple expression calculator supporting arithmetic operators and parentheses. The final product will accept expressions as input and will output the result of the calculation, as a floating-point value.
The engineering of this type of software is well understood. It involves phases such as lexical analysis, parsing, code generation and a virtual machine. This project will implement those phases as distinct platform processes.
Breaking the design into separate processes has a good side to it. Each process is smaller and any risk associated with a phase is contained within the associated executable, e.g. problems with parsing will not leak into the operation of the virtual machine. However, difficulties include the correct initiation of the sequence of processes, detection of termination, and the passing of complex datasets from one phase to the next. On top of all this there is the ongoing need to respond to interruptions.
Note
Source files appearing in this section can be downloaded from here.
The repo Makefile contains the setup needed for this guide. Related background information can be
found here.
A Quick Sketch
There will be 4 processes, one superviser process and 3 phase processes. The phases are listed below along with a statement of what that phase achieves;
|
expression text ⟹ abstract syntax tree |
|
abstract syntax tree ⟹ machine code |
|
machine code ⟹ floating-point |
The supervisor process calc.py accepts the first input and returns the final output,
i.e. expression text ⟹ floating-point.
Lexical Analysis, Parsing And Interruptions
All the hard work is done by PLY. Thanks to the author for yet another lex and yacc combo and the level of usability achieved. The minimal size and clarity of this implementation is a reflection of the underlying toolset. The documentation is also an easy read.
The main code for the parser looks like this;
def reduce(self, text):
global self_service
self_service = self
t = parser.parse(text)
return AbstractSyntaxTree(t)
ar.bind(reduce)
#
#
def parse(self, settings, input):
a = self.create(reduce, input)
m = self.select(ar.Completed, ar.Stop)
if isinstance(m, ar.Stop):
self.halt(a)
self.select(ar.Completed)
return ar.Aborted()
return m.value
ar.bind(parse)
#
#
default_input = '(18 - 6) * (5 + 7)'
if __name__ == '__main__':
ar.create_object(parse, factory_input=default_input)
The call to create_object() is the starting point and the call to parser.parse(text) is where
this process passes the real work onto PLY. In between there are the two async objects parse()
and reduce() (i.e. from parsing vocabulary shift-reduce-goto). The reason for the pair of objects
is that parsing is a busy process not normally interested in interruptions. Yet, to meet the requirements
of async software in general, each phase must honour any Stop that it receives.
There are 2 parts to resolving this issue. Firstly there is the call to the halt() function
inside the parse() object. This sets a flag in the object at the specified address, a. Secondly
there is a check inserted into every “action” taken by the PLY parser. The parser “calls out” to
user defined code such as the t_NUMBER() function below, to “reduce” the grammatical fragment
recognized by the parser into a desired application form. In this particular case it converts the
text of a number into a Python float and stores it in a terminal leaf - the Operand() - of an
abstract syntax tree;
def if_halted():
if self_service.halted:
self_service.complete(ar.Aborted())
def t_NUMBER(t):
r'\d+(\.\d+)?'
if_halted()
self_service.console('NUMBER "{token}"'.format(token=t.value))
try:
value = float(t.value)
except ValueError:
value = 0.0
t.value = Operand(value)
return t
The self_service global is assigned inside the reduce object. All method calls on the global
are effectively calls on that async object, including the test of the flag - if self_service.halted -
and the potential termination - self_service.complete(ar.Aborted()). So when the parse()
function makes the halt() call, that eventually leads to the desired termination.
The complete() method is a standard mechanism for terminating any async object. It uses an exception
to jump out of the current call sequence, before cleaning up the async environment and passing the
specified value back to the parent object, i.e. the Aborted object.
To pass an abstract syntax tree back to the calc.py process is fairly easy. A small collection of
classes and registrations captures all the detail needed by ansar.encode (see parser_if.py);
OP = ar.Enumeration(ADD=0, SUB=1, MUL=2, DIV=3, NEGATE=4)
class BinaryOperator(object):
def __init__(self, left_operand=None, right_operand=None, operator=None):
self.left_operand = left_operand
self.right_operand = right_operand
self.operator = operator
BINARY_OPERATOR_SCHEMA = {
'left_operand': ar.Any(),
'right_operand': ar.Any(),
'operator': OP,
}
Operators such as + and / are represented in the AST as a BinaryOperator with the
appropriate OP enumeration. There is another similar class called UnaryOperator for
representing the - negation operator. PLY calls out to the application at relevant
points in the parsing process and the application responds with instances of BinaryOperator
and UnaryOperator. A hierarchical parse tree is constructed courtesy of the order of calls
made by PLY and the management of the nodes and leaves created by the application.
An initial fragment of the output for (10 - 3) * (4 + 5) + 37 appears below;
[
"lib.parser_if.BinaryOperator",
{
"left_operand": [
"lib.parser_if.BinaryOperator",
{
"left_operand": [
"lib.parser_if.BinaryOperator",
{
"left_operand": [
"lib.parser_if.Operand",
{
"value": 10.0
},
[]
],
"operator": "SUB",
"right_operand": [
"lib.parser_if.Operand",
{
"value": 3.0
},
[]
]
},
[]
],
"operator": "MUL",
"right_operand": [
"lib.parser_if.BinaryOperator",
{
There are BinaryOperators that include recursive instances of BinaryOperators
and instances of polymorphism where the left_operand can be a BinaryOperator
or an Operand (or instances of other classes). All of this material is
automatically and discreetly decoded in the calc.py process before an immediate
encoding occurs for the passing of the tree on to the code generator.
Code Generation
Generating a sequence of machine instructions from a syntax tree requires a depth-first traversal of the tree. Defining the traversal function as a coroutine allows for the convenient construction of a list;
def walk(ast):
if isinstance(ast, BinaryOperator):
yield from walk(ast.left_operand)
yield from walk(ast.right_operand)
if ast.operator == OP.ADD:
yield Add()
elif ast.operator == OP.SUB:
yield Sub()
elif ast.operator == OP.MUL:
yield Mul()
elif ast.operator == OP.DIV:
yield Div()
else:
pass
elif isinstance(ast, UnaryOperator):
yield from walk(ast.operand)
yield Negate()
elif isinstance(ast, Operand):
yield Push(ast.value)
else:
pass
The result of calling the walk coroutine is a sequence of objects
such as Push() and Add(). These are stored in a list and
passed on to the virtual machine.
A Stack-Based Virtual Machine
Each instruction is callable. The microcode for an instruction can retrieve values from a stack, and push new values on the stack.
class Add(object):
def __call__(self, stack):
b = stack.pop()
a = stack.pop()
v = a + b
stack.append(v)
Iterating the sequence of instructions and passing a machine stack on each instruction
call results in a final floating-point value on the stack. This is returned as the
output of vm.py.
def execute(self, code):
stack = []
self.console('Virtual machine start')
for c in code:
if self.halted:
return ar.Aborted()
c(stack)
t = ', '.join(['{value}'.format(value=v) for v in stack])
self.console('{code:8} [{values}]'.format(code=c.__art__.name, values=t))
self.console('Virtual machine end ({value})'.format(value=stack[-1]))
return MachineValue(stack[-1])
ar.bind(execute)
Without the potential for interruption this function would be trivial. Logging has also been added in the spirit of demonstration rather than production.
A Supervisor Process
Bringing the separate phases together is a matter of mechanical async coding;
def calc(self, settings, input):
# Phase #1 - compile to an abstract syntax tree.
a = self.create(ar.Process, 'parser', input=input)
m = self.select(ar.Completed, ar.Stop)
if isinstance(m, ar.Stop):
self.send(m, a)
self.select(ar.Completed)
return ar.Aborted()
r = m.value
if not isinstance(r, AbstractSyntaxTree):
return r
# Phase #2 - encode abstract syntax tree to machine code.
self.create(ar.Process, 'codegen', input=r)
There is a sequence of self.create(ar.Process, ...) calls and the passing
of the output (i.e. r) from one call, as the input for the next. There
is a check for the interruption message and propagation as required. Otherwise
there is a check for an expected type (e.g. AbstractSyntaxTree) and if that
fails the message is returned immediately as the output of calc.py, potentially
short-circuiting the execution of all phases. The most likely candidates
would be Aborted and Faulted.
A Complete Run
Logging for a complete run of calc.py for the expression (10 - 3) * (4 + 5) + 37
is shown below. The result is a useful breakdown of how intepreters - and
this multi-process implementation - go about their work:
$ cat calc-expression | PATH=dist:${PATH} dist/calc --debug-level=OBJECT
[00255086] 2023-04-23T22:25:30.667 + <00000007>start_vector - Created by <00000001>
[00255086] 2023-04-23T22:25:30.667 + <00000008>calc - Created by <00000007>
[00255086] 2023-04-23T22:25:30.667 + <00000009>Process[INITIAL] - Created by <00000008>
[00255086] 2023-04-23T22:25:30.667 < <00000009>Process[INITIAL] - Received Start from <00000008>
[00255086] 2023-04-23T22:25:30.667 ( <00000009>Process[INITIAL] - Started process (255094)
[00255086] 2023-04-23T22:25:30.667 + <0000000a>wait - Created by <00000009>
[00255096] 2023-04-23T22:25:30.769 + <00000007>start_vector - Created by <00000001>
[00255096] 2023-04-23T22:25:30.770 + <00000008>parse - Created by <00000007>
[00255096] 2023-04-23T22:25:30.770 + <00000009>reduce - Created by <00000008>
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "10"
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "3"
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression - expression
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: ( expression )
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "4"
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "5"
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression + expression
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: ( expression )
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression * expression
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "37"
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression + expression
[00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - statement: expression
[00255096] 2023-04-23T22:25:30.770 X <00000009>reduce - Destroyed
[00255096] 2023-04-23T22:25:30.770 < <00000008>parse - Received Completed from <00000009>
[00255096] 2023-04-23T22:25:30.770 X <00000008>parse - Destroyed
[00255096] 2023-04-23T22:25:30.770 < <00000007>start_vector - Received Completed from <00000008>
[00255096] 2023-04-23T22:25:30.770 X <00000007>start_vector - Destroyed
[00255086] 2023-04-23T22:25:31.048 X <0000000a>wait - Destroyed
[00255086] 2023-04-23T22:25:31.048 < <00000009>Process[EXECUTING] - Received Completed from <0000000a>
[00255086] 2023-04-23T22:25:31.048 ) <00000009>Process[EXECUTING] - Process (255094) ended with 0
[00255086] 2023-04-23T22:25:31.048 X <00000009>Process[EXECUTING] - Destroyed
[00255086] 2023-04-23T22:25:31.048 < <00000008>calc - Received Completed from <00000009>
[00255086] 2023-04-23T22:25:31.048 + <0000000b>Process[INITIAL] - Created by <00000008>
[00255086] 2023-04-23T22:25:31.049 < <0000000b>Process[INITIAL] - Received Start from <00000008>
[00255086] 2023-04-23T22:25:31.049 ( <0000000b>Process[INITIAL] - Started process (255105)
[00255086] 2023-04-23T22:25:31.049 + <0000000c>wait - Created by <0000000b>
[00255107] 2023-04-23T22:25:31.154 + <00000007>start_vector - Created by <00000001>
[00255107] 2023-04-23T22:25:31.154 + <00000008>codegen - Created by <00000007>
[00255107] 2023-04-23T22:25:31.154 + <00000009>generate - Created by <00000008>
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - Instruction block begin
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0000] Push
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0001] Push
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0002] Sub
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0003] Push
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0004] Push
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0005] Add
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0006] Mul
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0007] Push
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0008] Add
[00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - Instruction block end
[00255107] 2023-04-23T22:25:31.154 X <00000009>generate - Destroyed
[00255107] 2023-04-23T22:25:31.154 < <00000008>codegen - Received Completed from <00000009>
[00255107] 2023-04-23T22:25:31.154 X <00000008>codegen - Destroyed
[00255107] 2023-04-23T22:25:31.155 < <00000007>start_vector - Received Completed from <00000008>
[00255107] 2023-04-23T22:25:31.155 X <00000007>start_vector - Destroyed
[00255086] 2023-04-23T22:25:31.432 X <0000000c>wait - Destroyed
[00255086] 2023-04-23T22:25:31.432 < <0000000b>Process[EXECUTING] - Received Completed from <0000000c>
[00255086] 2023-04-23T22:25:31.432 ) <0000000b>Process[EXECUTING] - Process (255105) ended with 0
[00255086] 2023-04-23T22:25:31.432 X <0000000b>Process[EXECUTING] - Destroyed
[00255086] 2023-04-23T22:25:31.432 < <00000008>calc - Received Completed from <0000000b>
[00255086] 2023-04-23T22:25:31.432 + <0000000d>Process[INITIAL] - Created by <00000008>
[00255086] 2023-04-23T22:25:31.432 < <0000000d>Process[INITIAL] - Received Start from <00000008>
[00255086] 2023-04-23T22:25:31.433 ( <0000000d>Process[INITIAL] - Started process (255116)
[00255086] 2023-04-23T22:25:31.433 + <0000000e>wait - Created by <0000000d>
[00255118] 2023-04-23T22:25:31.541 + <00000007>start_vector - Created by <00000001>
[00255118] 2023-04-23T22:25:31.541 + <00000008>vm - Created by <00000007>
[00255118] 2023-04-23T22:25:31.541 + <00000009>execute - Created by <00000008>
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Virtual machine start
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [10.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [10.0, 3.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Sub [7.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [7.0, 4.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [7.0, 4.0, 5.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Add [7.0, 9.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Mul [63.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [63.0, 37.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Add [100.0]
[00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Virtual machine end (100.0)
[00255118] 2023-04-23T22:25:31.541 X <00000009>execute - Destroyed
[00255118] 2023-04-23T22:25:31.542 < <00000008>vm - Received Completed from <00000009>
[00255118] 2023-04-23T22:25:31.542 X <00000008>vm - Destroyed
[00255118] 2023-04-23T22:25:31.542 < <00000007>start_vector - Received Completed from <00000008>
[00255118] 2023-04-23T22:25:31.542 X <00000007>start_vector - Destroyed
[00255086] 2023-04-23T22:25:31.817 X <0000000e>wait - Destroyed
[00255086] 2023-04-23T22:25:31.817 < <0000000d>Process[EXECUTING] - Received Completed from <0000000e>
[00255086] 2023-04-23T22:25:31.817 ) <0000000d>Process[EXECUTING] - Process (255116) ended with 0
[00255086] 2023-04-23T22:25:31.817 X <0000000d>Process[EXECUTING] - Destroyed
[00255086] 2023-04-23T22:25:31.818 < <00000008>calc - Received Completed from <0000000d>
[00255086] 2023-04-23T22:25:31.818 X <00000008>calc - Destroyed
[00255086] 2023-04-23T22:25:31.818 < <00000007>start_vector - Received Completed from <00000008>
[00255086] 2023-04-23T22:25:31.818 X <00000007>start_vector - Destroyed
{
"value": [
"lib.vm_if.MachineValue",
{
"value": 100.0
},
[]
]
}