.. _multi-processing-is-more-than-forking: Multi-Processing Is More Than Forking ##################################### Multi-threading and multi-processing are two techniques that can be applied to challenges such as concurrency. Within the Python eco-system there is extra motivation to consider multi-processing due to the internal interpreter architecture (i.e. the GIL). However, multi-processing comes with obvious and less obvious difficulties. Creation and management of processes (e.g. detection of termination) is certainly different to the creation and management of threads, and any transfer of information now has to cross an inter-process boundary. But even after solving these non-trivial problems there is a much larger one to consider. .. note:: Source files appearing in this section can be downloaded from `here `_. The repo ``Makefile`` contains the setup needed for this guide. Related background information can be found :ref:`here`. Beyond Forking ************** Adoption of async programming techniques and the :class:`~.processing.Process` object type provide the means to develop sophisticated, multi-processing software. This is programmatic, *parent* process control over a set of *child* processes. A custom piece of software creates and manages a specific, logically associated set of processes. There are also a significant number of scenarios where a more generic tool would be good enough. A tool that can load a description of the processes needed would solve a lot of development - and possibly operational - requirements. Not having to write that custom supervisor process each time is a compelling thought. A single command would start all the processes in the description and a single command would stop them. A difficulty hidden within this generally good idea is that most substantial applications require runtime resources such as disk space for configuration files, logs and perhaps a database. Network ports are also an issue. These resource requirements are problematic because groups of processes will often include multiple instances of a common executable. Imagine an executable that controls a robotic arm. It would be entirely plausible to include two instances of the executable with different configurations such that the 2 processes behave like left and right arms. The 2 instances need distinct runtime environments. By default, copies of the same executable are not good at sharing. This is what a *container* does for these situations. Products like `docker `_ use facilities in the underlying operating system (e.g. `cgroups `_) to tackle the same essential problem. It can also be said that containers are a light version of `virtualization `_, a technology delivered by products such as `VirtualBox `_. These products solve a common problem - how to run multiple copies of software that is not otherwise capable of being in the same space. Processes With Dual Modes ========================= An executable based on :func:`~.framework.create_object` has 2 distinct runtime modes. By default it runs as a process within the host operating system, on behalf of the current user. In this mode it is said to be running as a *tool* or *utility*. It typically loads its configuration from a file under the current user's $HOME folder. Multiple instances of the same executable running for the same user, will load the same configuration. Passing a few special arguments to that same executable at start time enables the second runtime mode. Those special arguments include a location and a name, enough information for :func:`~.framework.create_object` to recover a disk management *context*. In this mode it is said to be running as a *component*. That disk management context contains private disk resources such as configuration files, database areas and space for logging. The effect is that processes running as components can happily co-exist in a group. Configuration can be modified on a per-process basis and each process has its own logs. Creation of the contexts and passing the correct, associated arguments to executables is not something that should be attempted manually. This is a space intended for programmatic automation. Manual initiation of processes is intended for "tool" mode. A nice by-product of this disk management is that there can be many groups of processes on the same host - just at different locations (i.e. folders). Its even possible to have multiple copies of the same group running on the same host, but in different folders. It should be noted that this does not address the wider issue of global resources. Management of resources such as network ports is outside the scope of this document. A Tool For Practical Multi-Processing ===================================== The generic tool that brings everything together is the ``ansar`` command-line utility that comes with the **ansar-create** library. This single tool allows for the persistent description of a group of processes. Entries can be added, the contents of the group can be listed, members of the group can be updated and entries can be deleted. Technically those CRUD operations are manipulating descriptive information - there are no platform processes being created or terminated during those operations. Quite separately the ``ansar`` tool also provides the ability to "start the group" or even a subset of the group. The current status of the group can be listed (i.e. print a table of currently running processes) and there is a "stop the group" operation. All processes started by the ``ansar`` tool are running in *component* mode within a disk management *context*. Lock files are used to prevent multiple copies of group processes. A Quick Tour ************ A small collection of processes is used to demonstrate the use of the ansar tool. These are; .. list-table:: :widths: 25 75 * - ``noop.py`` - *does nothing and exits immediately* * - ``snooze.py`` - *waits for a configured amount of time* * - ``zombie.py`` - *does nothing until interrupted* * - ``factorial.py`` - *calculates factorial(n) using recursive processes* * - ``busy.py`` - *starts a tree of sub-processes* * - ``server.py`` - *a very basic, sockets-based network server* * - ``client.py`` - *a very basic, sockets-based network client* * - ``analyzer.py`` - *a custom test analysis* An Ansar Command Line ===================== The general layout of an ansar command appears below; $ ansar \[--<*ansar-setting*\ >=<*value*\ > ..\] <*sub-command*\ > \[-<*sub-setting*\ >=<*value*\ > ..\] \[*word* ..\] Each command involves a *sub-command*, optional *settings* and *words*. Settings appearing before the sub-command are more general and associated with ansar, while those appearing after are associated with the specific sub-command. The optional list of words is also associated with the sub-command. A Sub-Command Example ===================== Consider the following description of the `create` sub-command; $ ansar create \[<*home-path*\ >\] \[--redirect-<*name*\ >=<*path*\ > …] The sub-command creates a *home*. It accepts an optional *home-path* as the location of the new home and an optional list of redirection settings. The home is subsequently configured with a list of processes to execute, and provides areas where each of those processes can store operational materials, e.g. logs. .. code-block:: $ ansar create --redirect-bin=dist $ ls -la total 220 drwxrwxr-x 10 dennis dennis 4096 Mar 25 04:02 . drwxrwxr-x 12 dennis dennis 4096 Mar 21 14:32 .. drwxrwxr-x 10 dennis dennis 4096 Mar 25 04:02 .ansar-home drwxrwxr-x 2 dennis dennis 4096 Mar 24 16:34 dist -rw-rw-r-- 1 dennis dennis 1420 Mar 15 13:52 noop.py .. In the absence of an explicit *home-path*, the path for this home has defaulted to ``.ansar-home`` in the current folder. A redirection of bin to the ``./dist`` folder (i.e. the default output folder for ``pyinstaller`` executables) has been recorded within the new home. All executables named in subsequent ansar commands will be expected to exist in ``./dist``. .. note:: Without the redirection of the ``bin`` folder, executables are expected to exist in a dedicated home sub-folder. In these scenarios executables are transferred to that dedicated folder using the ``deploy`` sub-command. Refer to later sections for further details. Basic Behaviour =============== The commands used to populate a home with process definitions and create an operational set of those processes, are covered in the following sections. Add A Process And Run It ------------------------ .. code-block:: $ ansar add noop $ ansar list noop-0 $ ansar run --debug-level=CONSOLE 19:40:25.847 ^ <00000009>noop - Log this and exit { "value": [ "ansar.command.ansar_command.Run", { "completed": [ [ "noop-0", [ "ansar.create.lifecycle.Ack", {}, [] ] ] ], "home": ".ansar-home" }, [] ] } $ The ``add`` sub-command is used to add an instance of an executable to the default home. Technically, it adds a description of an instance, i.e. there are no new platform processes created by this command. The current set of descriptions are listed to confirm the new entry and then the entire list (i.e. the single instance of noop) is executed, using the ``run`` sub-command. The requested logging (``--debug-level=CONSOLE``) is placed on ``stderr`` and the output from the ``run`` command is placed on ``stdout``. The ``noop`` process logged its efforts and returned an :class:`~.lifecycle.Ack` object to the run command. A full transcript of the console-level logging is included (minus the process id and full timestamps) for this first example; subsequent examples will omit logging that is not relevant to the demonstration. Each instance of an executable is known by a *role* - a short name that describes the part the instance plays within the collection. In the above command both the *role* and the *home* have assumed default values. The role defaults to the name of the executable with a small suffix appended to it; the reasons for this behaviour will become clear in later sections. A more explicit use of the add command looks like this:: $ ansar add robot-arm left-arm toy-robot --rotation=-90.0 This command says to add the ``left-arm`` instance of the ``robot-arm`` executable to the ``toy-robot`` home. The rotation setting for the new instance is initialized to -90.0. Add A Process With Persistent Settings -------------------------------------- .. code-block:: $ ansar add snooze $ ansar list noop-0 snooze-0 $ ansar run --debug-level=DEBUG 01:33:10.063 + <00000007>lock_and_hold - Created by <00000001> 01:33:10.064 > <00000007>lock_and_hold - Sent Ready to <00000001> 01:33:10.064 + <00000007>lock_and_hold - Created by <00000001> 01:33:10.064 + <00000008>start_vector - Created by <00000001> 01:33:10.064 > <00000007>lock_and_hold - Sent Ready to <00000001> 01:33:10.064 ~ <00000008>start_vector - .. "../dist/snooze" .. 01:33:10.064 ~ <00000008>start_vector - Working folder .. 01:33:10.064 ~ <00000008>start_vector - .. "__main__.snooze" 01:33:10.064 ~ <00000008>start_vector - Class threads .. 01:33:10.064 + <00000009>snooze - Created by <00000008> 01:33:10.065 ^ <00000009>snooze - Do nothing for 2.0 seconds 01:33:10.065 > <00000009>snooze - Sent StartTimer to <00000003> 01:33:10.065 + <00000008>start_vector - Created by <00000001> 01:33:10.065 ~ <00000008>start_vector - .. "../dist/noop" .. 01:33:10.065 ~ <00000008>start_vector - Working folder .. 01:33:10.065 ~ <00000008>start_vector - .. "__main__.noop" 01:33:10.065 ~ <00000008>start_vector - Class threads .. 01:33:10.065 + <00000009>noop - Created by <00000008> 01:33:10.065 ^ <00000009>noop - Log this and exit 01:33:10.065 X <00000009>noop - Destroyed 01:33:10.065 < <00000008>start_vector - Received Completed .. 01:33:10.065 X <00000008>start_vector - Destroyed 01:33:10.165 < <00000007>lock_and_hold - Received Stop .. 01:33:10.170 X <00000007>lock_and_hold - Destroyed 01:33:12.067 < <00000009>snooze - Received T1 from <00000003> 01:33:12.067 X <00000009>snooze - Destroyed 01:33:12.067 < <00000008>start_vector - Received Completed .. 01:33:12.067 X <00000008>start_vector - Destroyed 01:33:12.068 < <00000007>lock_and_hold - Received Stop .. 01:33:12.071 X <00000007>lock_and_hold - Destroyed $ A second role is created with the ``snooze`` executable. The home now has 2 entries. The ``run`` command - by default - starts all the processes described in the collection and waits for them all to complete. With the addition of a ``snooze``, that completion now takes a few seconds. Logging still shows the immediate exit of the ``noop`` command. Both start at around the 10.064 mark and the noop object terminates 100th of a second later. The timer for ``snooze`` is not received until 12.067, about 2 seconds after the object was created. .. code-block:: $ ansar update snooze-0 --seconds=5.0 $ ansar run --debug-level=CONSOLE 19:49:35.903 ^ <00000009>snooze - Do nothing for 5.0 seconds 19:49:35.903 ^ <00000009>noop - Log this and exit .. The ``seconds`` setting for the ``snooze-0`` instance is assigned a longer value. Another run shows the ``snooze`` command behaving accordingly. Add A Process That Never Wants To Terminate ------------------------------------------- .. code-block:: $ ansar add zombie $ ansar list noop-0 snooze-0 zombie-0 $ ansar run --debug-level=CONSOLE 19:53:30.562 ^ <00000009>snooze - Do nothing for 5.0 seconds 19:53:30.563 ^ <00000009>noop - Log this and exit 19:53:30.563 ^ <00000009>zombie - Do nothing until interrupted ^C{ .. $ Adding an instance of ``zombie`` changes an essential behaviour of the collection; it no longer self-terminates. Both ``noop`` and ``snooze`` eventually terminate if given enough time, but user intervention is required to terminate ``zombie`` and the run inherits that requirement. .. note:: Control-c is the standard command-line mechanism for terminating long running processes. In a standard async process, a control-c is caught by :func:`~.framework.create_object` and converted to an :class:`~.lifecycle.Stop` message. In response, every async process is expected to terminate gracefully. A control-c is caught by the ansar command and propagated to all its children, resulting in the shutdown of the run. The action also injects a circumflex-cee (^C) into the terminal output, disrupting the logging. Logs redirected to a file will will not include that disruption. .. code-block:: $ ansar delete zombie-0 $ ansar list noop-0 snooze-0 $ A ``delete`` command is used to remove the ``zombie-0`` role from the home. This both demonstrates the command and restores a more convenient behaviour for the purposes of this tour. Add A Process That Expects Input -------------------------------- .. code-block:: $ ansar add factorial $ ansar run --debug-level=CONSOLE [00438217] 2023-04-06T01:52:17.317 ^ <00000009>snooze - Do nothing .. [00438216] 2023-04-06T01:52:17.317 ^ <00000009>fact - factorial(5) [00438218] 2023-04-06T01:52:17.317 ^ <00000009>noop - Log this and exit [00438245] 2023-04-06T01:52:17.418 ^ <00000008>fact - factorial(4) [00438255] 2023-04-06T01:52:17.520 ^ <00000008>fact - factorial(3) [00438265] 2023-04-06T01:52:17.621 ^ <00000008>fact - factorial(2) [00438275] 2023-04-06T01:52:17.722 ^ <00000008>fact - factorial(1) [00438285] 2023-04-06T01:52:17.823 ^ <00000008>fact - factorial(0) { "value": [ "ansar.command.ansar_command.Run", { "completed": [ .. [ "factorial-0", [ "lib.factorial_if.FactorialReturned", { "value": 120 }, [] ] ], .. ], "home": ".ansar-home" }, [] ] } $ An instance of ``factorial`` is added. This is a different executable in that it’s the first demonstration executable to create sub-processes. It uses the ansar ability to “call” a process as if it were a function, as a basis for a recursive implementation of the factorial function. The chain of processes can be seen in the logs - note the [00438216] process id on the ``<00000009>fact - factorial(5)`` log and how that id and log change as the chain extends. .. code-block:: $ ansar input factorial-0 { "value": 5 } $ The ``input`` command can be used to view the initial input for the named role. This is a default encoding created during the ``add factorial`` command. .. code-block:: $ cat factorial-7 { "value": 7 } $ ansar input factorial-0 --set-file=factorial-7 $ ansar run { "value": [ "ansar.command.ansar_command.Run", { "completed": [ .. [ "factorial-0", [ "lib.factorial_if.FactorialReturned", { "value": 5040 }, [] ] ], .. ], "home": ".ansar-home" }, [] ] } $ The ``input`` command can also be used to modify the initial input for the named role. Use the ``--set-file`` parameter to store new initial input for a role. Redefining The Settings For A Process ------------------------------------- .. code-block:: $ cat short-snooze { "value": { "seconds": 1.0 } } $ ansar settings snooze-0 --set-file=short-snooze $ ansar run --debug-level=CONSOLE 14:45:55.859 ^ <00000009>noop - Log this and exit 14:45:55.859 ^ <00000009>factorial - factorial(5) 14:45:55.859 ^ <00000009>snooze - Do nothing for 1.0 seconds 14:45:55.961 ^ <00000008>factorial - factorial(4) 14:45:56.062 ^ <00000008>factorial - factorial(3) 14:45:56.162 ^ <00000008>factorial - factorial(2) 14:45:56.263 ^ <00000008>factorial - factorial(1) 14:45:56.363 ^ <00000008>factorial - factorial(0) { "value": [ "ansar.command.ansar_command.Run", { "completed": [ .. [ "snooze-0", [ "ansar.create.lifecycle.Ack", {}, [] ] ], .. ], "home": ".ansar-home" }, [] ] } $ Persistent settings associated with processes can be modified using the ``update`` command or the ``settings`` command. By accepting complete encodings the settings command provides for the full expression of ansar encodings, specifically including graphs. Adding Some Workload -------------------- .. code-block:: $ ansar add busy $ cat busy-input { "value": { "duties": [ "noop", "snooze", "factorial" ], "management_levels": 5, "managers": 3 } } $ ansar input busy-0 --set-file=busy-input $ ansar run --debug-level=CONSOLE 22:46:31.177 ^ <00000009>snooze - Do nothing for 2.0 seconds 22:46:31.177 ^ <00000009>factorial - factorial(5) 22:46:31.178 ^ <00000009>noop - Log this and exit 22:46:31.295 ^ <00000008>noop - Log this and exit 22:46:31.295 ^ <00000008>factorial - factorial(4) 22:46:31.295 ^ <00000008>snooze - Do nothing for 2.0 seconds 22:46:31.296 ^ <00000008>factorial - factorial(5) 22:46:31.490 ^ <00000008>snooze - Do nothing for 2.0 seconds 22:46:31.499 ^ <00000008>factorial - factorial(4) .. 22:46:46.327 ^ <00000008>factorial - factorial(0) 22:46:46.339 ^ <00000008>factorial - factorial(0) 22:46:46.423 ^ <00000008>factorial - factorial(0) { "value": [ "ansar.command.ansar_command.Run", { "completed": [ .. [ "factorial-0", [ "lib.factorial_if.FactorialReturned", { "value": 120 }, [] ] ], [ "busy-0", [ "lib.job_if.JobReturned", { "processes": 484 }, [] ] ] ], "home": ".ansar-home" }, [] ] } $ Recursion is again used to create a tree of ``busy`` processes that has a defined number of levels (i.e. ``management_levels``) and a defined number of branches (i.e. ``managers``). The run command results in the creation of 484 processes plus those entries previously added alongside the ``busy-0`` entry. .. note:: Interruption of complex and dynamic collections of processes is likely to catch some processes in the early stage of their lives. A control-c can interrupt the Python interpreter as it is performing ``import`` operations, long before the ``__name__ == "__main__"`` has even been reached. The consequence is that signals may be processed by the default handlers inside the interpreter; tracebacks will appear on stdout. The :class:`~.processing.Process` machine catches this event and converts it into an :class:`~.lifecycle.Aborted` message, which is returned to the parent async object, preserving operational integrity. More Advanced Use ================= Use of the ``run`` command results in immediate feeback. Logging from all the related sub-processes is placed on ``stderr`` for viewing, or saving in a file for off-line analysis. These are valuable development procedures. The ``start`` command is similar to ``run`` except that after starting the related set of processes, control is immediately returned to the command-line, leaving the processes to continue in the background. Logging no longer appears on ``stderr`` but is instead appended to a per-process storage area. Further ``ansar`` commands provide access to those logs, as well as administration of the background processes. Starting Processes In The Background ------------------------------------ .. code-block:: $ ansar update snooze-0 --seconds=10.0 $ ansar add zombie $ ansar list factorial-0 noop-0 snooze-0 zombie-0 $ ansar start $ ansar status snooze-0 zombie-0 Logging no longer appears on the terminal. The ``status`` command shows which roles within a collection are currently running. In this example, the use of ``status`` must have followed the ``start`` quickly enough to catch ``snooze-0`` before it self-terminated. .. code-block:: $ ansar start ansar: cannot perform "start", "(all)" currently running as - 589299 $ ansar status -l zombie-0 <589299> 2m25.6s $ ansar stop ansar: cannot perform "stop", "(all)" not currently running - busy-0, factorial-0, noop-0, snooze-0 $ ansar stop zombie-0 $ ansar start $ Attempts to run multiple instances of a role are detected and reported. In this case it’s the ``zombie-0`` role, verified by the matching process IDs in the error message and the (long) status output. Commands that involve a role - e.g. ``run``, ``start``, ``stop`` and others - accept a role-search as a parameter. Omitting the parameter is assumed to mean “match everything”. Where the command encounters any form of mismatch between the intentions of the command and the current set of processes, it terminates with an error message. In the above case, the intentions of the command were to run a new set of all the processes in the group. Instead, it detected an operational member of the group and terminated. The ``--force`` ansar flag can be used to override that cautionary behaviour. This would cause the command to kill the operational instance of ``zombie-0`` before creating the new set, including the new instance of ``zombie-0``. Reviewing Background Activity ----------------------------- .. code-block:: $ ansar log snooze-0 $ ansar log snooze-0 --last=WEEK 00:41:03.329 + <00000007>lock_and_hold - Created by <00000001> 00:41:03.329 > <00000007>lock_and_hold - Sent Ready to <00000001> 00:41:03.330 + <00000008>start_vector - Created by <00000001> 00:41:03.330 ~ <00000008>start_vector - Executable "/home/brad/somewhere/dist/snooze" as process (369676) 00:41:03.330 ~ <00000008>start_vector - Working folder "/" 00:41:03.330 ~ <00000008>start_vector - Running object "__main__.snooze" 00:41:03.330 ~ <00000008>start_vector - Class threads (1) "retries" (1) 00:41:03.330 + <00000009>snooze - Created by <00000008> 00:41:03.330 ^ <00000009>snooze - Do nothing for 2.0 seconds 00:41:03.330 > <00000009>snooze - Sent StartTimer to <00000003> 00:41:05.332 < <00000009>snooze - Received T1 from <00000003> 00:41:05.332 X <00000009>snooze - Destroyed 00:41:05.332 < <00000008>start_vector - Received Completed from <00000009> 00:41:05.332 X <00000008>start_vector - Destroyed 00:41:05.333 < <00000007>lock_and_hold - Received Stop from <00000001> 00:41:05.336 X <00000007>lock_and_hold - Destroyed Logs produced by foreground processes (i.e. using ``run``) are presented on stderr and then - without deliberate action - lost. Logs produced by background processes are directed into persistent storage and subsequently recovered with the ansar ``log`` command. The first ``log`` command above fails as the default behaviour is to query for logs generated within the last 5 minutes; ``snooze-0`` has been idle since it terminated at 00:41:05.332. The second command uses one of the log parameters to extend the query to the start of the current week - 12:00am on Monday. This matches everything from that moment onward and the first entry happens to be at 00:41:03.329. Note that the time value on logs is the full ISO 8601 format but they appear truncated here for brevity. As well as ``WEEK`` there are also ``MONTH``, ``DAY``, ``HOUR``, ``MINUTE``, ``HALF``, ``QUARTER``, ``TEN`` and ``FIVE`` enumerations where ``HALF`` and ``QUARTER`` refer to portions of an hour and ``TEN`` and ``FIVE`` refer to numbers of minutes. In all cases, the enumeration describes the start of a fixed time period rather than a time span, e.g. ``--last=HOUR`` will list logs starting at the most recent hourly mark. To look back 60 minutes use ``--back=1h``. The ansar ``log`` command also accepts the following parameters; .. list-table:: :widths: 25 75 * - ``clock`` - *use local time for both input and output* * - ``from`` - *start in ISO time format* * - ``start`` - *start as index into start-stop history* * - ``back`` - *start as negative offset from current time* * - ``to`` - *end in ISO time format* * - ``span`` - *positive offset from evaluated start* * - ``count`` - *end as a number of logs* .. code-block:: $ ansar log snooze-0 --last=MONTH --count=20 This command will list the first 20 logs generated by ``snooze-0``, since the first of the current month. Log storage is self-maintaining. A FIFO approach ensures that when the total storage of logs reaches a configured maximum, the arrival of further logs causes the deletion of the oldest. .. code-block:: $ ansar status -l zombie-0 <377524> 12.7s For a detailed view of operational processes use the ``--long-listing`` parameter (or the ``-l`` shorthand flag). This view includes the process ID and the time since the process was created. Behaviour Of Background Processes Over Time ------------------------------------------- .. code-block:: $ ansar history zombie-0 [0] 7h30m ago ... 1m5.4s (Aborted) [1] 4m44.6s ago ... 4m34.3s (Aborted) A role is a name for an instance of an executable which may be started and stopped many times within the lifetime of a home. Logs are seamless with respect to these ups and downs though it is fairly easy to infer the boundaries from the contents of individual logs. Ansar also keeps an explicit record of when processes are started, when they are stopped and the value returned by the main object. Use the ``history`` command to print a table of start times and run durations. The printed indexes can be used in the ``start`` parameter in the ``log`` command to select logs from the start of a particular execution. .. code-block:: $ ansar returned zombie-0 --start=0 { "value": [ "ansar.create.lifecycle.Aborted", {}, [] ] } Those same indexes can also be used in the ``returned`` command to select which return value to print. If the command is used to access the results of the latest execution (e.g. no ``--start`` is specified) and that execution has not yet completed, the command will wait until the information is available. Development Automation ********************** Combining the nature of :func:`~.framework.create_object`-based applications with the ``ansar`` command line tool goes some way towards engineering of multi-processing solutions. This section considers the potential to streamline the standard edit-build-test-debug loop, to relieve developers of as many repetitive, error-prone responsibilities as possible, within that multi-processing context. A huge array of tools are available in this space, especially if cloud deployment is the end goal. The arrangements of tools and procedures suggested here are deliberately simple in the hope that any potential to integrate with your own development toolsets is as clear as possible. The Standard Loop With Multi-Processing ======================================= Multi-processing complicates the standard development loop. Source code changes may be occurring over one executable or many. Test runs require the presence of multiple distinct processes that properly represent the current codebase, i.e. which source files have changed, which executables need building, do executables need to be copied from the build areas, are there running processes that need to be replaced, and how to execute and collate unit tests happening across multiple processes? What about those processes that need supporting data files? Defining The Set Of Processes ----------------------------- This part of the tour involves a new home; .. code-block:: $ ansar -f destroy $ ansar create --redirect-bin=dist $ ansar add server $ ansar add client $ ansar list client-0 server-0 The ``destroy`` command is used to delete the default home. Passing the ``-f`` flag ensures that any process associated with the old home is properly terminated. The ``create`` command prepares the new home folder for the subsequent ``add`` commands. A *redirect* is again used to link the new home with a build folder. Instances of the ``server`` and ``client`` executables are added and assume the default names, ``server-0`` and ``client-0``. In this composition of processes the ``server`` is the component under development. The ``client`` is a test client - it connects to the server, submits requests and expects responses. A standard ansar method - available to every async object - is used to verify the details of each request-response pair. This creates a sequence of pass/fail records that go on to form the basis of a test report. The ``server`` implements a very basic word mapping service. Words are sent across a connection and mapped inside the server to a stored alternative. The mapped alternative is sent back across the connection as a response. If no entry is found for a submitted word, the same word is echoed back to the client. Such a service might form the basis of a "hint" facillity for a spelling checker. .. note:: Implementation of networking within the ``server`` and ``client`` components is for demonstration purposes only. There are several reasons why the approach should not be used for production quality software including scalability, the use of blocking sockets and the lack of message encoding/decoding. When Clients Are Started Before Servers --------------------------------------- An initial run of the new set of processes hits a bump. All the processes are effectively started at the same time and starting a client before the server has had a chance to establish itself will inevitably lead to problems; .. code-block:: :emphasize-lines: 14 $ ansar run --debug-level=DEBUG 07:29:26.188 + <00000007>start_vector - Created by <00000001> 07:29:26.188 ~ <00000007>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/.dev/bin/ansar" as process (698548) 07:29:26.188 ~ <00000007>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking" 07:29:26.188 ~ <00000007>start_vector - Running object "ansar.command.ansar_command.ansar" 07:29:26.188 ~ <00000007>start_vector - Class threads (1) "retries" (1) 07:29:26.189 + <00000008>ansar - Created by <00000007> 07:29:26.189 ~ <00000008>ansar - Call the sub-command function 07:29:26.189 ^ <00000008>ansar - Detect status of associated roles (server-0, client-0, zombie-0) 07:29:26.189 + <00000009>lock_and_hold - Created by <00000008> .. 07:29:26.304 ~ <00000008>start_vector - Running object "__main__.zombie" 07:29:26.304 ~ <00000008>start_vector - Class threads (1) "retries" (1) 07:29:26.304 ? <00000009>client - Session error - "[Errno 111] Connection refused" 07:29:26.304 X <00000009>client - Destroyed 07:29:26.304 < <00000008>start_vector - Received Completed from <00000009> A connection has been attempted by the ``client`` and "refused". Without some kind of orchestration of activity at the networking level there is no way to advise the client of the appropriate moment to initiate connection. Ansar provides an alternative solution. Any process that is part of a home can be configured with a *retry* strategy. The effect is that ``client-0`` is performed repeatedly until a goal is reached or the strategy is exhausted. In this case the goal is to establish a valid connection. All that the ``client`` needs to do is return certain values that either keep the retries active or cause termination. Please Repeat That ------------------ A single ansar command sets up the handling of connection failures; .. code-block:: :emphasize-lines: 25, 32 $ cat client-retry { "value": { "first_steps": [1.0, 2.0, 4.0] } } $ ansar set retry client-0 --encoding-file=client-retry $ ansar run --debug-level=DEBUG 05:32:15.384 + <00000007>lock_and_hold - Created by <00000001> .. 05:32:15.384 + <00000008>start_vector - Created by <00000001> 05:32:15.385 ~ <00000008>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/dist/server" as process (729010) 05:32:15.385 ~ <00000008>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/dist/client" as process (729009) 05:32:15.385 ~ <00000008>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking" 05:32:15.385 ~ <00000008>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking" 05:32:15.385 ~ <00000008>start_vector - Running object "__main__.client" 05:32:15.385 ~ <00000008>start_vector - Class threads (1) "retries" (1) 05:32:15.385 ~ <00000008>start_vector - Running object "__main__.server" 05:32:15.385 + <00000009>Retry[INITIAL] - Created by <00000008> 05:32:15.385 ~ <00000008>start_vector - Class threads (1) "retries" (1) 05:32:15.385 < <00000009>Retry[INITIAL] - Received Start from <00000008> 05:32:15.385 + <00000009>server - Created by <00000008> 05:32:15.385 + <0000000a>client - Created by <00000009> 05:32:15.385 + <0000000a>listen - Created by <00000009> 05:32:15.385 ? <0000000a>client - Session error - "[Errno 111] Connection refused" 05:32:15.385 X <0000000a>client - Destroyed 05:32:15.385 < <00000009>Retry[ATTEMPTING] - Received Completed from <0000000a> 05:32:15.385 ^ <00000009>Retry[ATTEMPTING] - Pausing for 1.000000 seconds 05:32:15.385 > <00000009>Retry[ATTEMPTING] - Sent StartTimer to <00000003> 05:32:16.385 < <00000009>Retry[PAUSING] - Received T1 from <00000003> 05:32:16.386 + <0000000b>client - Created by <00000009> 05:32:16.387 ^ <0000000b>client - Connected to ('127.0.0.1', 65432) 05:32:16.387 + <0000000b>accepted - Created by <0000000a> 05:32:16.387 ^ <0000000b>accepted - Accepted on ('127.0.0.1', 52278) 05:32:16.394 = <0000000b>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51) 05:32:16.394 = <0000000b>client - Expected b'define' for b'explain', got b'explain' (client.py:56) 05:32:16.395 = <0000000b>client - Expected b'droll' for b'fly', got b'fly' (client.py:61) 05:32:16.395 X <0000000b>accepted - Destroyed 05:32:16.395 X <0000000b>client - Destroyed 05:32:16.395 < <00000009>Retry[ATTEMPTING] - Received Completed from <0000000b> 05:32:16.395 X <00000009>Retry[ATTEMPTING] - Destroyed 05:32:16.395 < <00000008>start_vector - Received Completed from <00000009> 05:32:16.395 X <00000008>start_vector - Destroyed .. The ``set`` command is used to update a small set of properties associated with each home entry and one such property is ``retry``. Setting this value activates *retries* inside the :func:`~.framework.create_object` function. Running the application object (e.g. ``client()``) is subsequently considered to be an *attempt* and the value returned by each attempt influences what happens next; .. list-table:: :widths: 25 75 * - :class:`~.lifecycle.Maybe` - *not successful, try again later* * - :class:`~.lifecycle.Cannot` - *abandon, bad request or environmental problem* * - \* - *any other message indicates success* Where a repeat attempt is indicated (:class:`~.lifecycle.Maybe`) the retry machinery consults the retry property for a time delay and uses the ``T1`` timer to impose the "down time". In the given example the delay is 1.0s - the first value from the ``first_steps`` list. The retry property provides the following values; .. list-table:: :widths: 25 75 * - ``first_steps[]`` - *list of float, the initial time delays* * - ``regular_steps`` - *float, repeating delay* * - ``step_limit`` - *int, maximum number of delays* * - ``randomized`` - *float, time slices for backoff* * - ``truncated`` - *float, reduce the scale of backoff* The first 3 values can be used to describe a series of float values while the latter 2 enable a secondary adjustment of those values with the goal of avoiding the "everyone retrying at the same moment" phenomenon. .. list-table:: :widths: 25 25 20 20 20 60 :header-rows: 1 * - ``first_steps`` - ``regular_steps`` - ``step_limit`` - ``randomized`` - ``truncated`` - *sequence* * - [1.0, 2.0, 4.0] - None - None - None - None - [1.0, 2.0, 4.0] * - [] - 1.0 - None - None - None - [1.0, 1.0, 1.0, 1.0 ...] * - [] - 1.0 - 4 - None - None - [1.0, 1.0, 1.0, 1.0] * - [1.0, 2.0, 4.0] - 8.0 - 6 - None - None - [1.0, 2.0, 4.0, 8.0, 8.0, 8.0] * - [1.0, 2.0, 4.0] - 8.0 - 6 - 0.25 - 0.5 - [1.25, 2.5, 6.0, 11.0, 9.5, 10.5] Any set of values involving a non-None ``regular_steps`` and a None ``step_limit`` describes an endless sequence. Combining ``first_steps`` with a value for ``randomized`` can produce a form of *exponential backoff*. The latter value is used to slice up the latest time delay into available slots and one of those slots is selected randomly. The ``truncation`` value reduces the portion of the time delay that is available for slicing, e.g. a value of 0.25 limits the potential adjustment to a quarter. The adjustment is additive. It is the combination of the retry property and the conditions met by each attempt, that determines the final behaviour of the process. Unit Tests In Multi-Process Solutions ------------------------------------- Tests are implemented using the :meth:`~.point.Point.test` method (see ``client.py``); .. code-block:: python word, expect = b'eager', b'fervent' s.sendall(word) reply = s.recv(1024) self.test(reply == expect, f"Expected {expect} for {word}, got {reply}") A word is sent over a socket and the response is compared against an expected value. This fragment of code meets all the requirements of an ansar test. Output occurs in two ways. Firstly, all failed tests (i.e. where the conditional evaluates to false) generate a log at the ``WARNING`` level. Secondly, all test results are collected in a background async object. Test applications such as the ``client`` request that information at the end of an execution and return the results in the form of a :class:`~.test.TestReport`. .. code-block:: python ar.test_enquiry(self) report = self.select(ar.TestReport) return report A standalone execution of the ``client`` demonstrates this activity; .. code-block:: $ ansar list client-0 server-0 $ ansar status $ ansar start server-0 $ dist/client --debug-level=OBJECT 19:35:35.299 + <00000008>start_vector - Created by <00000001> 19:35:35.300 + <00000009>client - Created by <00000008> 19:35:35.300 ^ <00000009>client - Connected to ('127.0.0.1', 65432) 19:35:35.303 = <00000009>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51) 19:35:35.303 = <00000009>client - Expected b'define' for b'explain', got b'explain' (client.py:56) 19:35:35.303 = <00000009>client - Expected b'droll' for b'fly', got b'fly' (client.py:61) 19:35:35.303 > <00000009>client - Sent Enquiry to <00000004> 19:35:35.304 < <00000009>client - Received TestReport from <00000004> 19:35:35.304 X <00000009>client - Destroyed 19:35:35.304 < <00000008>start_vector - Received Completed from <00000009> 19:35:35.304 X <00000008>start_vector - Destroyed { "value": [ "ansar.create.test.TestReport", { "failed": 3, "passed": 0, "tested": [ { "condition": false, "line": 51, "name": "client", "source": "client.py", "stamp": "2023-05-30T19:35:35.302931", "text": "Expected b'fervent' for b'eager', got b'eager'" }, { "condition": false, "line": 56, "name": "client", "source": "client.py", "stamp": "2023-05-30T19:35:35.303186", "text": "Expected b'define' for b'explain', got b'explain'" }, { "condition": false, "line": 61, "name": "client", "source": "client.py", "stamp": "2023-05-30T19:35:35.3034", "text": "Expected b'droll' for b'fly', got b'fly'" } ] }, [] ] } Failed tests appear in the logging stream and all tests appear in the final JSON output. The details retained for each test can be seen in the ``tested`` list. As well as the more obvious inclusion of the ``condition`` and ``text`` values, ansar augments the results with the name of the module that performed the test and also the line number within that module. These values are critical to a good edit-run-debug loop, as demonstrated in the following sections. .. note:: It is worth mentioning that :meth:`~.point.Point.test` has value in an application, independent of whether that application decides to return a :class:`~.test.TestReport`. It is effectively a shorthand for "if condition is false log this warning". Background collection of test results will eventually reach a maximum and at that point, will begin discarding test results. The maximum number retained is kept fairly small (a few hundred) for practical reasons. The collection cannot have infinite size and in a high velocity development loop, larger and larger numbers of failed tests have decreasing value. There can be any number of processes like ``client-0`` within a home, performing tests and generating ``TestReports``. The results from these individual test processes can be inspected with commands such as :ref:`ansar log `, ``ansar history`` and ``ansar returned``. Quick Navigation Of Failed Tests -------------------------------- The next step is to collect the information from the ``TestReports`` and present them in a manner that facillitates quick navigation of the relevant source. Happily, this is exactly what :ref:`ansar run ` can do. By adding a few arguments to the command, a useful listing is produced; .. code-block:: $ ansar stop server-0 $ ansar run --code-path=. --test-run ^Crole "client-0" (pass/fail): 0/3 /home/dennis/gh/multi-processing-is-more-than-forking/client.py:51 - Expected b'fervent' for b'eager', got b'eager' /home/dennis/gh/multi-processing-is-more-than-forking/client.py:56 - Expected b'define' for b'explain', got b'explain' /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly' $ The :ref:`ansar run ` command looks through the list of results from all the finished processes, checking for instances of :class:`~.test.TestReport`. It collects these into a single :class:`~.test.TestSuite` object. If it detects certain command-line information (i.e. ``--code-path=``) it uses that information to resolve the final names of the source files mentioned in individual tests. By default the improved report information appears in the normal JSON output of the ansar command. Specifying a ``--test-run`` causes ansar to set aside its normal behaviour and instead provide a summary of the gathered information. It prints a table of the processes (i.e. roles) that supplied the test information, and then a second table consisting of *source file*, *line number* and *warning text* - one line for each failed test. This is the final piece of the development loop. Running this command inside a VS Code ``bash`` window gives the IDE enough information for quick navigation to the offending lines of source code. Hovering over each line results in the underlining of the source address and a control-click takes the cursor to the exact location. Most modern IDEs support similar behaviour. Custom Handling Of Test Results ------------------------------- If the "test run" information is not sophisticated enough or there is potential for better local integration, the :ref:`ansar run ` command also supports the passing of the :class:`~.test.TestSuite` to a designated executable. .. code-block:: :emphasize-lines: 20-22 $ ansar run --code-path=. --test-analyzer=analyzer --debug-level=OBJECT 08:40:16.100 + <00000008>lock_and_hold - Created by <00000001> 08:40:16.101 > <00000008>lock_and_hold - Sent Ready to <00000001> .. 08:40:16.102 ^ <0000000b>client - Connected to ('127.0.0.1', 65432) 08:40:16.102 + <0000000c>accepted - Created by <0000000b> 08:40:16.102 ^ <0000000c>accepted - Accepted on ('127.0.0.1', 37696) 08:40:16.105 = <0000000b>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51) 08:40:16.106 = <0000000b>client - Expected b'define' for b'explain', got b'explain' (client.py:56) 08:40:16.106 = <0000000b>client - Expected b'droll' for b'fly', got b'fly' (client.py:61) .. 08:40:16.106 X <00000009>start_vector - Destroyed 08:40:16.203 < <00000008>lock_and_hold - Received Stop from <00000001> 08:40:16.212 X <00000008>lock_and_hold - Destroyed ^C08:40:18.506 < <00000009>start_vector - Received Stop from <00000001> 08:40:18.506 > <00000009>start_vector - Sent Stop to <0000000a> 08:40:18.506 < <0000000a>server - Received Stop from <00000009> .. 08:40:18.806 + <00000008>start_vector - Created by <00000001> 08:40:18.807 + <00000009>analyzer - Created by <00000008> 08:40:18.807 ^ <00000009>analyzer - Analyzed client-0 08:40:18.807 X <00000009>analyzer - Destroyed 08:40:18.807 < <00000008>start_vector - Received Completed from <00000009> 08:40:18.807 X <00000008>start_vector - Destroyed { "value": [ "ansar.create.lifecycle.Ack", {}, [] ] } Test results are now passed to the ``analyzer`` executable. The analyzer logs the names of roles that supplied the results and terminates with an :class:`~.lifecycle.Ack`. The ansar command assumes that the return value from the analyzer should be returned as the result for the run itself. Deployment Of Supporting File Materials --------------------------------------- Getting back to the development loop, there are still the 3 failed tests to consider; .. code-block:: 08:40:16.105 = <0000000b>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51) 08:40:16.106 = <0000000b>client - Expected b'define' for b'explain', got b'explain' (client.py:56) 08:40:16.106 = <0000000b>client - Expected b'droll' for b'fly', got b'fly' (client.py:61) There is now a means to quickly navigate through the source code relating to the failed tests. To actually fix those failures there are two possibilities - either the test needs to change or the ``server`` needs to change. Changing the former is a simple matter of editing the offending line of test code and running the loop again; .. code-block:: $ vi +51 client.py $ cat client.py .. word, expect = b'eager', b'eager' s.sendall(word) reply = s.recv(1024) self.test(reply == expect, f"Expected {expect} for {word}, got {reply}") .. $ pyinstaller --onefile --log-level ERROR -p . client.py $ ansar run --code-path=. --test-run role "client-0" (pass/fail): 1/2 /home/dennis/gh/multi-processing-is-more-than-forking/client.py:56 - Expected b'define' for b'explain', got b'explain' /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly' .. note:: Use of the ``vi`` command is a means to demonstrate the workflow. As mentioned previously, the editing process would normally occur within the local IDE. A test run now reports that there is no longer an issue with the test of ``eager``. In those cases where the problem lies in the ``server`` it is similarly easy, though a small phase of setup is required; .. code-block:: $ ansar -f snapshot testing $ find testing testing testing/settings-by-role testing/settings-by-role/server-0.json testing/settings-by-role/client-0.json testing/resource-by-executable testing/resource-by-executable/server testing/resource-by-executable/client testing/model-by-role testing/model-by-role/server-0 testing/model-by-role/client-0 The ``snapshot`` command takes a *snapshot* of the current disk storage areas of the default home and arranges them under the folder name provided. Adding the ``-f`` flag ensures that any active roles are shutdown for the duration of the snapshot. With a safe and up-to-date image of all the operational files required by the home, it is now possible to modify that image and then *deploy* it to the home. The first modification is to provide a database of word mappings. The ``server`` is nice enough to operate in the absence of that information, but full operation requires the data to be in place. Loading the ``server-0`` role with an initial set of mappings looks like this; .. code-block:: :emphasize-lines: 8, 9, 17, 20, 32 $ ansar start server-0 $ cat word-map.json { "value": [ ["explain", "define"] ] } $ cp word-map.json testing/model-by-role/server-0 $ ansar --debug-level=OBJECT -f deploy --storage-path=testing 23:29:19.507 + <00000008>start_vector - Created by <00000001> 23:29:19.507 + <00000009>ansar - Created by <00000008> 23:29:19.508 ^ <00000009>ansar - Detected 1 model changes for "server-0" 23:29:19.509 ^ <00000009>ansar - Detect status of associated roles (server-0) 23:29:19.509 + <0000000a>lock_and_hold - Created by <00000009> 23:29:19.509 X <0000000a>lock_and_hold - Destroyed 23:29:19.509 < <00000009>ansar - Received Completed from <0000000a> 23:29:19.509 ^ <00000009>ansar - Stop roles (server-0) .. 23:29:20.768 < <00000009>ansar - Received Completed from <0000000b> 23:29:20.768 ^ <00000009>ansar - Starting transfer of materials 23:29:20.768 + <0000000c>FolderTransfer[INITIAL] - Created by <00000009> 23:29:20.768 < <0000000c>FolderTransfer[INITIAL] - Received Start from <00000009> 23:29:20.768 + <0000000d>folder_transfer - Created by <0000000c> 23:29:20.769 ^ <0000000d>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0 23:29:20.769 ^ <0000000d>folder_transfer - Move 1 aliases to targets 23:29:20.770 ^ <0000000d>folder_transfer - Clear 0 aliases 23:29:20.770 X <0000000d>folder_transfer - Destroyed 23:29:20.770 < <0000000c>FolderTransfer[RUNNING] - Received Completed from <0000000d> 23:29:20.770 X <0000000c>FolderTransfer[RUNNING] - Destroyed 23:29:20.770 < <00000009>ansar - Received Completed from <0000000c> 23:29:20.770 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0" 23:29:20.770 ^ <00000009>ansar - Restoring 1 stopped roles 23:29:20.773 + <0000000e>Process[INITIAL] - Created by <00000009> 23:29:20.773 < <0000000e>Process[INITIAL] - Received Start from <00000009> .. 23:29:20.884 < <00000008>start_vector - Received Completed from <00000009> 23:29:20.884 X <00000008>start_vector - Destroyed $ ansar status server-0 $ ansar run client-0 --code-path=. --test-run role "client-0" (pass/fail): 2/1 /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly' There is now a single failed test - "explain" is successfully being mapped to "define". Copying a JSON encoding to the ``testing/model-by-role/server-0`` folder and running the :ref:`ansar deploy ` command installs the ``word-map.json`` database within the operational ``server-0``. For this demonstration the ``server-0`` role was started before the deploy and test run commands. This was to highlight the awareness of the current operational state and the degree of automation. Internal phases include; * the detection of changes within the snapshot, * evaluation of affected roles, * detection of those affected roles that are also operational, * terminations as necessary, * copying of changes, * and restoring any terminated roles. Deploying of materials from a snapshot such as ``testing`` into an active home is optimized to the least I/O possible. Source and destination areas are compared, resulting in a sequence of delta operations. Once any active roles are terminated the deltas are executed, bringing the home in-sync with the external snapshot. With all the pieces in their correct places, fixing the last remaining failed test is simple; .. code-block:: $ vi +61 testing/model-by-role/server-0/word-map.json $ cat testing/model-by-role/server-0/word-map.json { "value": [ ["explain", "define"], ["fly", "droll"] ] } $ ansar -f deploy --storage-path=testing $ ansar run client-0 --code-path=. --test-run role "client-0" (pass/fail): 3/0 $ A Streamlined, Multi-Process Development Loop ============================================= The previous section was a tour through the commands used for the development of a multi-process solution. It also introduced a variation in that basic workflow, by starting ``server-0`` and then running the ``test-client`` separately. This section elaborates on that variation to maximize the benefits of :ref:`ansar deploy ` and packages the related commands into a standard makefile. Start With Nothing ------------------ .. code-block:: :emphasize-lines: 1 $ make clean rm -f dist/analyzer dist/busy dist/client dist/factorial dist/noop dist/server dist/snooze dist/zombie rm -rf testing ansar -f destroy The ``clean`` target deletes build artefacts, the extracted snapshot and lastly, it deletes the composition of processes from the filesystem. Included in that last operation is a termination of any lingering processes, i.e. the non-testing, operational roles. Create The Multi-Process Configuration -------------------------------------- .. code-block:: :emphasize-lines: 1 $ make home pyinstaller --onefile --log-level ERROR -p . analyzer.py pyinstaller --onefile --log-level ERROR -p . busy.py .. ansar create ansar deploy dist ansar add server ansar add client test-client ansar add zombie ansar set retry test-client --encoding-file=client-retry ansar snapshot testing The ``home`` target arranges everything such that roles are ready to be executed. This time around creation *does not* involve the redirect of the bin folder. Instead, executables are deployed to the home with the ``ansar deploy dist`` command. Copying executables from one folder to another might seem like a burden. In practise the runtime overhead does not intrude heavily on the workflow. Copying can be significantly minimized. Given two folders of executables (a source and a destination) it is possible to calculate a delta and perform optimal updates. The arrangement also separates the build chain from the operational home. Full discussion of build chains, software pipelines and repo management (mono-repo vs poly-repo) is beyond the scope of this document. Having the 2 approaches available (i.e. ``--redirect-bin`` and ``ansar deploy``) improves the possibility of integration. Establish An Operational State ------------------------------ .. code-block:: :emphasize-lines: 1 $ make start ansar -f start "test-.*" --invert-search The ``start`` target initiates those roles that are being tested, rather than those roles that perform the testing. The home is now ready for test runs. Begin Development - What Needs Doing ------------------------------------ .. code-block:: :emphasize-lines: 1 $ make ansar --force --debug-level=CONSOLE deploy dist testing 00:05:01.532 ^ <00000009>ansar - Nothing to deploy ansar run "test-.*" --code-path=. --test-run role "test-client" (pass/fail): 0/3 /home/dennis/gh/multi-processing-is-more-than-forking/client.py:51 - Expected b'eager' for b'fervent', got b'fervent' /home/dennis/gh/multi-processing-is-more-than-forking/client.py:56 - Expected b'define' for b'explain', got b'explain' /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly' make: *** [Makefile:64: test] Error 1 Omitting the target is a synonym for ``make test``. An ``ansar deploy`` command checks build artefacts and the file materials under testing, for any necessary updates - the answer in this case is no. It then performs a test run including only those roles that generate test reports. There are 3 familiar failed tests and the ``make`` itself terminates with an error, i.e. ``--test-run`` affects the exit code of the ``ansar`` command. Deploying An Operational File ----------------------------- .. code-block:: :emphasize-lines: 1 $ cp word-map.json testing/model-by-role/server-0 $ make ansar --force --debug-level=CONSOLE deploy dist testing 00:05:23.002 ^ <00000009>ansar - Detected 1 model changes for "server-0" 00:05:23.003 ^ <00000009>ansar - Detect status of associated roles (server-0) 00:05:23.003 ^ <00000009>ansar - Stop roles (server-0) 00:05:23.003 ^ <00000009>ansar - Poll for termination 00:05:24.253 ^ <00000009>ansar - Detect status of associated roles (server-0) 00:05:24.261 ^ <00000009>ansar - Starting transfer of materials 00:05:24.262 ^ <0000000d>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0 00:05:24.263 ^ <0000000d>folder_transfer - Move 1 aliases to targets 00:05:24.264 ^ <0000000d>folder_transfer - Clear 0 aliases 00:05:24.264 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0" 00:05:24.264 ^ <00000009>ansar - Restoring 1 stopped roles ansar run "test-.*" --code-path=. --test-run role "test-client" (pass/fail): 1/2 /home/dennis/gh/multi-processing-is-more-than-forking/client.py:51 - Expected b'eager' for b'fervent', got b'fervent' /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly' make: *** [Makefile:64: test] Error 1 Installing the ``word-map.json`` clears one of the failing tests. Editing Source Code ------------------- .. code-block:: :emphasize-lines: 1 $ vi +51 client.py $ make pyinstaller --onefile --log-level ERROR -p . client.py ansar --force --debug-level=CONSOLE deploy dist testing 00:07:21.597 ^ <00000009>ansar - Detected 1 changing executables (added 1 associated roles) 00:07:21.597 ^ <00000009>ansar - Detect status of associated roles (test-client) 00:07:21.606 ^ <00000009>ansar - Starting transfer of materials 00:07:21.606 ^ <0000000c>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin 00:07:21.613 ^ <0000000c>folder_transfer - Move 1 aliases to targets 00:07:21.614 ^ <0000000c>folder_transfer - Clear 0 aliases 00:07:21.615 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin" ansar run "test-.*" --code-path=. --test-run role "test-client" (pass/fail): 2/1 /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly' make: *** [Makefile:64: test] Error 1 Editing the test client clears the second failing test. Modifying An Operational File ----------------------------- .. code-block:: :emphasize-lines: 1 $ vi testing/model-by-role/server-0/word-map.json $ make ansar --force --debug-level=CONSOLE deploy dist testing 00:08:04.145 ^ <00000009>ansar - Detected 1 model changes for "server-0" 00:08:04.146 ^ <00000009>ansar - Detect status of associated roles (server-0) 00:08:04.146 ^ <00000009>ansar - Stop roles (server-0) 00:08:04.146 ^ <00000009>ansar - Poll for termination 00:08:05.396 ^ <00000009>ansar - Detect status of associated roles (server-0) 00:08:05.427 ^ <00000009>ansar - Starting transfer of materials 00:08:05.428 ^ <0000000d>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0 00:08:05.429 ^ <0000000d>folder_transfer - Move 1 aliases to targets 00:08:05.430 ^ <0000000d>folder_transfer - Clear 0 aliases 00:08:05.430 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0" 00:08:05.430 ^ <00000009>ansar - Restoring 1 stopped roles ansar run "test-.*" --code-path=. --test-run $ make ansar --force --debug-level=CONSOLE deploy dist testing 00:08:26.570 ^ <00000009>ansar - Nothing to deploy ansar run "test-.*" --code-path=. --test-run $ ansar status -l server-0 <960962> 17.2s zombie-0 <960776> 1m46.3s Adding another word mapping to ``word-map.json`` clears the last failing test. The ``make`` command is completing without error, i.e. the ``test-run`` is now producing a zero exit code. Repeating the ``make`` command shows the ``ansar deploy`` command detecting that nothing significant has changed. Lastly, the :ref:`ansar status ` is used to show that ``zombie-0`` has been left undisturbed - its runtime is significantly longer than the ``server-0`` runtime - through all the changes to code and application files. Summary Of The Multi-Process Development Loop --------------------------------------------- Development activity is reduced to making the necessary changes and running the ``make`` command. This inherently allows the developer to focus on the problems and potential fixes, without the distraction of correctly propagating changes on every iteration of the loop. Propagation that can involve multiple source files, building of executables, deployment of application files and management of operational processes. Adding Some Workload ==================== Simulation of operational workloads can be difficult. In the case of network servers there is the immediate difficulty that multiple client processes will be needed. Ansar supports the creation of multiple processes as a single command. More accurately, this is the creation of multiple roles for the one common executable. .. code-block:: :emphasize-lines: 1 $ ansar add client 'test-client-{number}' --start=10 --count=40 $ ansar set retry 'test-client-\d\d' --encoding-file=client-retry $ ansar list server-0 test-client test-client-10 test-client-11 test-client-12 test-client-13 test-client-14 test-client-15 .. test-client-49 zombie-0 $ ansar get retry test-client-49 { "value": { "first_steps": [ 1.0, 2.0, 4.0 ] } } When adding a role, the role name is optional and often omitted, i.e. adding a role for the ``client`` executable results in the default name ``client-0``. This is because the role name is assumed to be a template to be expanded to the final name. Templates are permitted to reference the variables ``executable`` and ``number``, where the former expands to the executable being added and the latter expands to a runtime integer. Internally, the add command performs a loop based on ``start`` and ``count`` variables, which default to 0 and 1 respectively. For each iteration it expands the template passing the current loop index as ``number`` before populating the home with the new definition. By setting the ``start`` and ``count`` values on the command line, a single add command can create multiple roles. The default role name is ``{executable}-{number}``. The :ref:`ansar add ` command above adds 40 new roles with names like ``test-client-49``. The ``ansar set`` command caters to these situations by accepting a search pattern and applying the change to each match. Setting the retry property is probably more important in this scenario - multiple clients will be clammering for the attention of the lone server. .. code-block:: :emphasize-lines: 1 $ make ansar --force --debug-level=CONSOLE deploy dist testing 02:28:17.017 ^ <00000009>ansar - Nothing to deploy ansar --debug-level=DEBUG run "test-.*" --code-path=. --test-run 02:28:17.317 + <00000008>start_vector - Created by <00000001> 02:28:17.317 ~ <00000008>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/.dev/bin/ansar" as process (964752) 02:28:17.317 ~ <00000008>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking" 02:28:17.317 ~ <00000008>start_vector - Running object "ansar.command.ansar_command.ansar" 02:28:17.317 ~ <00000008>start_vector - Class threads (1) "retries" (1) 02:28:17.317 + <00000009>ansar - Created by <00000008> 02:28:17.317 ~ <00000009>ansar - Call the sub-command function 02:28:17.318 ^ <00000009>ansar - Detect status of associated roles (test-client-27, test-client-15, test-client-38, test-client-24, test-client-23, test-client-36, test-client-18, test-client-31, test-client-32, test-client-43, test-client-35, test-client-19, test-client-41, test-client-49, test-client-10, test-client-28, test-client-11, test-client-48, test-client-39, test-client-44, test-client-40, test-client-22, test-client-45, test-client-21, test-client-30, test-client-46, test-client-26, test-client-20, test-client-37, test-client-47, test-client-25, test-client-17, test-client-33, test-client-13, test-client-34, test-client, test-client-12, test-client-16, test-client-42, test-client-14, test-client-29) 02:28:17.318 + <0000000a>lock_and_hold - Created by <00000009> 02:28:17.318 > <0000000a>lock_and_hold - Sent Ready to <00000009> 02:28:17.318 + <0000000b>lock_and_hold - Created by <00000009> 02:28:17.319 + <0000000c>lock_and_hold - Created by <00000009> .. 02:28:17.372 X <00000014>lock_and_hold - Destroyed 02:28:17.373 < <00000009>ansar - Received Completed from <00000014> 02:28:17.373 X <0000000f>lock_and_hold - Destroyed 02:28:17.373 < <00000009>ansar - Received Completed from <0000000f> 02:28:17.373 ^ <00000009>ansar - Running "test-.*" (.ansar-home) 02:28:17.373 + <00000033>Process[INITIAL] - Created by <00000009> 02:28:17.373 < <00000033>Process[INITIAL] - Received Start from <00000009> 02:28:17.373 ~ <00000033>Process[INITIAL] - Execute /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin/client --call-signature=o --point-of-origin=1 --home-path=/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home --role-name=test-client-27 02:28:17.373 ( <00000033>Process[INITIAL] - Started process (964802) 02:28:17.373 + <00000034>wait - Created by <00000033> 02:28:17.374 + <00000035>Process[INITIAL] - Created by <00000009> 02:28:17.374 < <00000035>Process[INITIAL] - Received Start from <00000009> 02:28:17.374 ~ <00000035>Process[INITIAL] - Execute /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin/client --call-signature=o --point-of-origin=1 --home-path=/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home --role-name=test-client-15 02:28:17.374 ( <00000035>Process[INITIAL] - Started process (964804) 02:28:17.374 + <00000036>wait - Created by <00000035> 02:28:17.374 + <00000037>Process[INITIAL] - Created by <00000009> 02:28:17.374 < <00000037>Process[INITIAL] - Received Start from <00000009> 02:28:17.374 ~ <00000037>Process[INITIAL] - Execute /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin/client --call-signature=o --point-of-origin=1 --home-path=/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home --role-name=test-client-38 02:28:17.374 ( <00000037>Process[INITIAL] - Started process (964806) .. 02:28:18.208 < <00000069>Process[EXECUTING] - Received Completed from <00000083> 02:28:18.208 ) <00000069>Process[EXECUTING] - Process (965116) ended with 0 02:28:18.208 X <00000069>Process[EXECUTING] - Destroyed 02:28:18.208 < <00000009>ansar - Received Completed from <00000069> 02:28:18.208 ^ <00000009>ansar - Completion for "test-client-14" () 02:28:18.210 X <00000084>wait - Destroyed 02:28:18.211 < <0000006a>Process[EXECUTING] - Received Completed from <00000084> 02:28:18.211 ) <0000006a>Process[EXECUTING] - Process (965123) ended with 0 02:28:18.211 X <0000006a>Process[EXECUTING] - Destroyed 02:28:18.211 < <00000009>ansar - Received Completed from <0000006a> 02:28:18.211 ^ <00000009>ansar - Completion for "test-client-29" () role "test-client-15" (pass/fail): 3/0 role "test-client-35" (pass/fail): 3/0 role "test-client-19" (pass/fail): 3/0 role "test-client-24" (pass/fail): 3/0 role "test-client-43" (pass/fail): 3/0 role "test-client-31" (pass/fail): 3/0 role "test-client-36" (pass/fail): 3/0 role "test-client-27" (pass/fail): 3/0 role "test-client-23" (pass/fail): 3/0 role "test-client-41" (pass/fail): 3/0 role "test-client-10" (pass/fail): 3/0 role "test-client-38" (pass/fail): 3/0 .. role "test-client" (pass/fail): 3/0 role "test-client-13" (pass/fail): 3/0 role "test-client-12" (pass/fail): 3/0 role "test-client-16" (pass/fail): 3/0 role "test-client-42" (pass/fail): 3/0 role "test-client-14" (pass/fail): 3/0 role "test-client-29" (pass/fail): 3/0 02:28:18.272 X <00000009>ansar - Destroyed 02:28:18.272 < <00000008>start_vector - Received Completed from <00000009> 02:28:18.272 X <00000008>start_vector - Destroyed The same ``make`` command now runs a much more ambitious test workload. All the test clients completed successfully, passing their respective ``TestReports`` back to the run command. Those reports indicated that the new test encountered zero problems. This is the correct expectation given that all test failures had been cleared and every instance of ``test-client-xx`` is a replication of a test process that was already passing.