.. _a-multi-phase-intepreter: A Multi-Phase Interpreter ************************* To showcase the sophistication around the handling of processes, this documentation presents the implementation of an intepreter. The input language is small - a simple expression calculator supporting arithmetic operators and parentheses. The final product will accept expressions as input and will output the result of the calculation, as a floating-point value. The engineering of this type of software is well understood. It involves phases such as lexical analysis, parsing, code generation and a virtual machine. This project will implement those phases as distinct platform processes. Breaking the design into separate processes has a good side to it. Each process is smaller and any risk associated with a phase is contained within the associated executable, e.g. problems with parsing will not leak into the operation of the virtual machine. However, difficulties include the correct initiation of the sequence of processes, detection of termination, and the passing of complex datasets from one phase to the next. On top of all this there is the ongoing need to respond to interruptions. .. note:: Source files appearing in this section can be downloaded from `here `_. The repo ``Makefile`` contains the setup needed for this guide. Related background information can be found :ref:`here`. A Quick Sketch ============== There will be 4 processes, one superviser process and 3 phase processes. The phases are listed below along with a statement of what that phase achieves; .. list-table:: :widths: 25 75 * - ``parser.py`` - *expression text ⟹ abstract syntax tree* * - ``codegen.py`` - *abstract syntax tree ⟹ machine code* * - ``vm.py`` - *machine code ⟹ floating-point* The supervisor process ``calc.py`` accepts the first input and returns the final output, i.e. *expression text ⟹ floating-point*. Lexical Analysis, Parsing And Interruptions =========================================== All the hard work is done by `PLY `_. Thanks to the author for yet another lex and yacc combo and the level of usability achieved. The minimal size and clarity of this implementation is a reflection of the underlying toolset. The `documentation `_ is also an easy read. The main code for the parser looks like this; .. code-block:: python def reduce(self, text): global self_service self_service = self t = parser.parse(text) return AbstractSyntaxTree(t) ar.bind(reduce) # # def parse(self, settings, input): a = self.create(reduce, input) m = self.select(ar.Completed, ar.Stop) if isinstance(m, ar.Stop): self.halt(a) self.select(ar.Completed) return ar.Aborted() return m.value ar.bind(parse) # # default_input = '(18 - 6) * (5 + 7)' if __name__ == '__main__': ar.create_object(parse, factory_input=default_input) The call to :func:`~.framework.create_object` is the starting point and the call to ``parser.parse(text)`` is where this process passes the real work onto ``PLY``. In between there are the two async objects ``parse()`` and ``reduce()`` (i.e. from parsing vocabulary shift-reduce-goto). The reason for the pair of objects is that parsing is a busy process not normally interested in interruptions. Yet, to meet the requirements of async software in general, each phase must honour any :class:`~.lifecycle.Stop` that it receives. There are 2 parts to resolving this issue. Firstly there is the call to the :func:`~.point.halt` function inside the ``parse()`` object. This sets a flag in the object at the specified address, ``a``. Secondly there is a check inserted into every "action" taken by the ``PLY`` parser. The parser "calls out" to user defined code such as the ``t_NUMBER()`` function below, to "reduce" the grammatical fragment recognized by the parser into a desired application form. In this particular case it converts the text of a number into a Python float and stores it in a terminal leaf - the ``Operand()`` - of an abstract syntax tree; .. code-block:: python def if_halted(): if self_service.halted: self_service.complete(ar.Aborted()) def t_NUMBER(t): r'\d+(\.\d+)?' if_halted() self_service.console('NUMBER "{token}"'.format(token=t.value)) try: value = float(t.value) except ValueError: value = 0.0 t.value = Operand(value) return t The ``self_service`` global is assigned inside the ``reduce`` object. All method calls on the global are effectively calls on that async object, including the test of the flag - ``if self_service.halted`` - and the potential termination - ``self_service.complete(ar.Aborted())``. So when the ``parse()`` function makes the :func:`~.point.halt` call, that eventually leads to the desired termination. The :meth:`~.point.Point.complete` method is a standard mechanism for terminating any async object. It uses an exception to jump out of the current call sequence, before cleaning up the async environment and passing the specified value back to the parent object, i.e. the :class:`~.lifecycle.Aborted` object. To pass an abstract syntax tree back to the ``calc.py`` process is fairly easy. A small collection of classes and registrations captures all the detail needed by **ansar.encode** (see ``parser_if.py``); .. code-block:: python OP = ar.Enumeration(ADD=0, SUB=1, MUL=2, DIV=3, NEGATE=4) class BinaryOperator(object): def __init__(self, left_operand=None, right_operand=None, operator=None): self.left_operand = left_operand self.right_operand = right_operand self.operator = operator BINARY_OPERATOR_SCHEMA = { 'left_operand': ar.Any(), 'right_operand': ar.Any(), 'operator': OP, } Operators such as `+` and `/` are represented in the AST as a ``BinaryOperator`` with the appropriate ``OP`` enumeration. There is another similar class called ``UnaryOperator`` for representing the `-` negation operator. ``PLY`` calls out to the application at relevant points in the parsing process and the application responds with instances of ``BinaryOperator`` and ``UnaryOperator``. A hierarchical parse tree is constructed courtesy of the order of calls made by ``PLY`` and the management of the nodes and leaves created by the application. An initial fragment of the output for ``(10 - 3) * (4 + 5) + 37`` appears below; .. code-block:: [ "lib.parser_if.BinaryOperator", { "left_operand": [ "lib.parser_if.BinaryOperator", { "left_operand": [ "lib.parser_if.BinaryOperator", { "left_operand": [ "lib.parser_if.Operand", { "value": 10.0 }, [] ], "operator": "SUB", "right_operand": [ "lib.parser_if.Operand", { "value": 3.0 }, [] ] }, [] ], "operator": "MUL", "right_operand": [ "lib.parser_if.BinaryOperator", { There are ``BinaryOperators`` that include recursive instances of ``BinaryOperators`` and instances of polymorphism where the ``left_operand`` can be a ``BinaryOperator`` or an ``Operand`` (or instances of other classes). All of this material is automatically and discreetly decoded in the ``calc.py`` process before an immediate encoding occurs for the passing of the tree on to the code generator. Code Generation =============== Generating a sequence of machine instructions from a syntax tree requires a depth-first traversal of the tree. Defining the traversal function as a coroutine allows for the convenient construction of a list; .. code-block:: python def walk(ast): if isinstance(ast, BinaryOperator): yield from walk(ast.left_operand) yield from walk(ast.right_operand) if ast.operator == OP.ADD: yield Add() elif ast.operator == OP.SUB: yield Sub() elif ast.operator == OP.MUL: yield Mul() elif ast.operator == OP.DIV: yield Div() else: pass elif isinstance(ast, UnaryOperator): yield from walk(ast.operand) yield Negate() elif isinstance(ast, Operand): yield Push(ast.value) else: pass The result of calling the ``walk`` coroutine is a sequence of objects such as ``Push()`` and ``Add()``. These are stored in a list and passed on to the virtual machine. A Stack-Based Virtual Machine ============================= Each instruction is callable. The microcode for an instruction can retrieve values from a stack, and push new values on the stack. .. code-block:: python class Add(object): def __call__(self, stack): b = stack.pop() a = stack.pop() v = a + b stack.append(v) Iterating the sequence of instructions and passing a machine stack on each instruction call results in a final floating-point value on the stack. This is returned as the output of ``vm.py``. .. code-block:: python def execute(self, code): stack = [] self.console('Virtual machine start') for c in code: if self.halted: return ar.Aborted() c(stack) t = ', '.join(['{value}'.format(value=v) for v in stack]) self.console('{code:8} [{values}]'.format(code=c.__art__.name, values=t)) self.console('Virtual machine end ({value})'.format(value=stack[-1])) return MachineValue(stack[-1]) ar.bind(execute) Without the potential for interruption this function would be trivial. Logging has also been added in the spirit of demonstration rather than production. A Supervisor Process ==================== Bringing the separate phases together is a matter of mechanical async coding; .. code-block:: python def calc(self, settings, input): # Phase #1 - compile to an abstract syntax tree. a = self.create(ar.Process, 'parser', input=input) m = self.select(ar.Completed, ar.Stop) if isinstance(m, ar.Stop): self.send(m, a) self.select(ar.Completed) return ar.Aborted() r = m.value if not isinstance(r, AbstractSyntaxTree): return r # Phase #2 - encode abstract syntax tree to machine code. self.create(ar.Process, 'codegen', input=r) There is a sequence of ``self.create(ar.Process, ...)`` calls and the passing of the output (i.e. ``r``) from one call, as the ``input`` for the next. There is a check for the interruption message and propagation as required. Otherwise there is a check for an expected type (e.g. ``AbstractSyntaxTree``) and if that fails the message is returned immediately as the output of ``calc.py``, potentially short-circuiting the execution of all phases. The most likely candidates would be :class:`~.lifecycle.Aborted` and :class:`~.lifecycle.Faulted`. A Complete Run ============== Logging for a complete run of ``calc.py`` for the expression ``(10 - 3) * (4 + 5) + 37`` is shown below. The result is a useful breakdown of how intepreters - and this multi-process implementation - go about their work:: $ cat calc-expression | PATH=dist:${PATH} dist/calc --debug-level=OBJECT [00255086] 2023-04-23T22:25:30.667 + <00000007>start_vector - Created by <00000001> [00255086] 2023-04-23T22:25:30.667 + <00000008>calc - Created by <00000007> [00255086] 2023-04-23T22:25:30.667 + <00000009>Process[INITIAL] - Created by <00000008> [00255086] 2023-04-23T22:25:30.667 < <00000009>Process[INITIAL] - Received Start from <00000008> [00255086] 2023-04-23T22:25:30.667 ( <00000009>Process[INITIAL] - Started process (255094) [00255086] 2023-04-23T22:25:30.667 + <0000000a>wait - Created by <00000009> [00255096] 2023-04-23T22:25:30.769 + <00000007>start_vector - Created by <00000001> [00255096] 2023-04-23T22:25:30.770 + <00000008>parse - Created by <00000007> [00255096] 2023-04-23T22:25:30.770 + <00000009>reduce - Created by <00000008> [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "10" [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "3" [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression - expression [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: ( expression ) [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "4" [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "5" [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression + expression [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: ( expression ) [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression * expression [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - NUMBER "37" [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: NUMBER [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - expression: expression + expression [00255096] 2023-04-23T22:25:30.770 ^ <00000009>reduce - statement: expression [00255096] 2023-04-23T22:25:30.770 X <00000009>reduce - Destroyed [00255096] 2023-04-23T22:25:30.770 < <00000008>parse - Received Completed from <00000009> [00255096] 2023-04-23T22:25:30.770 X <00000008>parse - Destroyed [00255096] 2023-04-23T22:25:30.770 < <00000007>start_vector - Received Completed from <00000008> [00255096] 2023-04-23T22:25:30.770 X <00000007>start_vector - Destroyed [00255086] 2023-04-23T22:25:31.048 X <0000000a>wait - Destroyed [00255086] 2023-04-23T22:25:31.048 < <00000009>Process[EXECUTING] - Received Completed from <0000000a> [00255086] 2023-04-23T22:25:31.048 ) <00000009>Process[EXECUTING] - Process (255094) ended with 0 [00255086] 2023-04-23T22:25:31.048 X <00000009>Process[EXECUTING] - Destroyed [00255086] 2023-04-23T22:25:31.048 < <00000008>calc - Received Completed from <00000009> [00255086] 2023-04-23T22:25:31.048 + <0000000b>Process[INITIAL] - Created by <00000008> [00255086] 2023-04-23T22:25:31.049 < <0000000b>Process[INITIAL] - Received Start from <00000008> [00255086] 2023-04-23T22:25:31.049 ( <0000000b>Process[INITIAL] - Started process (255105) [00255086] 2023-04-23T22:25:31.049 + <0000000c>wait - Created by <0000000b> [00255107] 2023-04-23T22:25:31.154 + <00000007>start_vector - Created by <00000001> [00255107] 2023-04-23T22:25:31.154 + <00000008>codegen - Created by <00000007> [00255107] 2023-04-23T22:25:31.154 + <00000009>generate - Created by <00000008> [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - Instruction block begin [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0000] Push [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0001] Push [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0002] Sub [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0003] Push [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0004] Push [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0005] Add [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0006] Mul [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0007] Push [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - [0008] Add [00255107] 2023-04-23T22:25:31.154 ^ <00000009>generate - Instruction block end [00255107] 2023-04-23T22:25:31.154 X <00000009>generate - Destroyed [00255107] 2023-04-23T22:25:31.154 < <00000008>codegen - Received Completed from <00000009> [00255107] 2023-04-23T22:25:31.154 X <00000008>codegen - Destroyed [00255107] 2023-04-23T22:25:31.155 < <00000007>start_vector - Received Completed from <00000008> [00255107] 2023-04-23T22:25:31.155 X <00000007>start_vector - Destroyed [00255086] 2023-04-23T22:25:31.432 X <0000000c>wait - Destroyed [00255086] 2023-04-23T22:25:31.432 < <0000000b>Process[EXECUTING] - Received Completed from <0000000c> [00255086] 2023-04-23T22:25:31.432 ) <0000000b>Process[EXECUTING] - Process (255105) ended with 0 [00255086] 2023-04-23T22:25:31.432 X <0000000b>Process[EXECUTING] - Destroyed [00255086] 2023-04-23T22:25:31.432 < <00000008>calc - Received Completed from <0000000b> [00255086] 2023-04-23T22:25:31.432 + <0000000d>Process[INITIAL] - Created by <00000008> [00255086] 2023-04-23T22:25:31.432 < <0000000d>Process[INITIAL] - Received Start from <00000008> [00255086] 2023-04-23T22:25:31.433 ( <0000000d>Process[INITIAL] - Started process (255116) [00255086] 2023-04-23T22:25:31.433 + <0000000e>wait - Created by <0000000d> [00255118] 2023-04-23T22:25:31.541 + <00000007>start_vector - Created by <00000001> [00255118] 2023-04-23T22:25:31.541 + <00000008>vm - Created by <00000007> [00255118] 2023-04-23T22:25:31.541 + <00000009>execute - Created by <00000008> [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Virtual machine start [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [10.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [10.0, 3.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Sub [7.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [7.0, 4.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [7.0, 4.0, 5.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Add [7.0, 9.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Mul [63.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Push [63.0, 37.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Add [100.0] [00255118] 2023-04-23T22:25:31.541 ^ <00000009>execute - Virtual machine end (100.0) [00255118] 2023-04-23T22:25:31.541 X <00000009>execute - Destroyed [00255118] 2023-04-23T22:25:31.542 < <00000008>vm - Received Completed from <00000009> [00255118] 2023-04-23T22:25:31.542 X <00000008>vm - Destroyed [00255118] 2023-04-23T22:25:31.542 < <00000007>start_vector - Received Completed from <00000008> [00255118] 2023-04-23T22:25:31.542 X <00000007>start_vector - Destroyed [00255086] 2023-04-23T22:25:31.817 X <0000000e>wait - Destroyed [00255086] 2023-04-23T22:25:31.817 < <0000000d>Process[EXECUTING] - Received Completed from <0000000e> [00255086] 2023-04-23T22:25:31.817 ) <0000000d>Process[EXECUTING] - Process (255116) ended with 0 [00255086] 2023-04-23T22:25:31.817 X <0000000d>Process[EXECUTING] - Destroyed [00255086] 2023-04-23T22:25:31.818 < <00000008>calc - Received Completed from <0000000d> [00255086] 2023-04-23T22:25:31.818 X <00000008>calc - Destroyed [00255086] 2023-04-23T22:25:31.818 < <00000007>start_vector - Received Completed from <00000008> [00255086] 2023-04-23T22:25:31.818 X <00000007>start_vector - Destroyed { "value": [ "lib.vm_if.MachineValue", { "value": 100.0 }, [] ] }