.. _multi-processing-is-more-than-forking:

Multi-Processing Is More Than Forking
#####################################

Multi-threading and multi-processing are two techniques that can be applied to challenges
such as concurrency. Within the Python eco-system there is extra motivation to consider multi-processing
due to the internal interpreter architecture (i.e. the GIL).

However, multi-processing comes with obvious and less obvious difficulties. Creation and management
of processes (e.g. detection of termination) is certainly different to the creation and management of
threads, and any transfer of information now has to cross an inter-process boundary. But even after
solving these non-trivial problems there is a much larger one to consider.

.. note::

   Source files appearing in this section can be downloaded from `here <https://github.com/mr-ansar/multi-processing-is-more-than-forking>`_.
   The repo ``Makefile`` contains the setup needed for this guide. Related background information can be
   found :ref:`here<python-scripts-as-executables>`.

Beyond Forking
**************

Adoption of async programming techniques and the :class:`~.processing.Process` object type provide
the means to develop sophisticated, multi-processing software. This is programmatic, *parent* process
control over a set of *child* processes. A custom piece of software creates and manages a specific,
logically associated set of processes.

There are also a significant number of scenarios where a more generic tool would be good enough. A tool
that can load a description of the processes needed would solve a lot of development - and possibly
operational - requirements. Not having to write that custom supervisor process each time is a compelling
thought. A single command would start all the processes in the description and a single command would
stop them.

A difficulty hidden within this generally good idea is that most substantial applications require runtime
resources such as disk space for configuration files, logs and perhaps a database. Network ports are also
an issue. These resource requirements are problematic because groups of processes will often include
multiple instances of a common executable. Imagine an executable that controls a robotic arm. It would
be entirely plausible to include two instances of the executable with different configurations such that
the 2 processes behave like left and right arms. The 2 instances need distinct runtime environments. By
default, copies of the same executable are not good at sharing.

This is what a *container* does for these situations. Products like `docker <https://www.docker.com/>`_
use facilities in the underlying operating system (e.g. `cgroups <https://en.wikipedia.org/wiki/Cgroups>`_)
to tackle the same essential problem. It can also be said that containers are a light version
of `virtualization <https://en.wikipedia.org/wiki/Virtualization>`_, a technology delivered by
products such as `VirtualBox <https://www.virtualbox.org/>`_. These products solve a common problem - how
to run multiple copies of software that is not otherwise capable of being in the same space.

Processes With Dual Modes
=========================

An executable based on :func:`~.framework.create_object` has 2 distinct runtime modes. By default it
runs as a process within the host operating system, on behalf of the current user. In this mode it is
said to be running as a *tool* or *utility*. It typically loads its configuration from a file under
the current user's $HOME folder. Multiple instances of the same executable running for the same user,
will load the same configuration.

Passing a few special arguments to that same executable at start time enables the second
runtime mode. Those special arguments include a location and a name, enough information
for :func:`~.framework.create_object` to recover a disk management *context*. In this mode it is said
to be running as a *component*. That disk management context contains private disk resources
such as configuration files, database areas and space for logging. The effect is that processes
running as components can happily co-exist in a group. Configuration can be modified on a per-process
basis and each process has its own logs.

Creation of the contexts and passing the correct, associated arguments to executables is
not something that should be attempted manually. This is a space intended for programmatic
automation. Manual initiation of processes is intended for "tool" mode.

A nice by-product of this disk management is that there can be many groups of processes on
the same host - just at different locations (i.e. folders). Its even possible to have multiple
copies of the same group running on the same host, but in different folders. It should be noted
that this does not address the wider issue of global resources. Management of resources such as
network ports is outside the scope of this document.

A Tool For Practical Multi-Processing
=====================================

The generic tool that brings everything together is the ``ansar`` command-line utility that
comes with the **ansar-create** library. This single tool allows for the persistent description
of a group of processes. Entries can be added, the contents of the group can be listed, members
of the group can be updated and entries can be deleted.

Technically those CRUD operations are manipulating descriptive information - there are no
platform processes being created or terminated during those operations. Quite separately the
``ansar`` tool also provides the ability to "start the group" or even a subset of the group.
The current status of the group can be listed (i.e. print a table of currently running processes)
and there is a "stop the group" operation. All processes started by the ``ansar`` tool are
running in *component* mode within a disk management *context*. Lock files are used to prevent
multiple copies of group processes.

A Quick Tour
************

A small collection of processes is used to demonstrate the use of the ansar tool. These are;

.. list-table::
   :widths: 25 75

   * - ``noop.py``
     - *does nothing and exits immediately*
   * - ``snooze.py``
     - *waits for a configured amount of time*
   * - ``zombie.py``
     - *does nothing until interrupted*
   * - ``factorial.py``
     - *calculates factorial(n) using recursive processes*
   * - ``busy.py``
     - *starts a tree of sub-processes*
   * - ``server.py``
     - *a very basic, sockets-based network server*
   * - ``client.py``
     - *a very basic, sockets-based network client*
   * - ``analyzer.py``
     - *a custom test analysis*

An Ansar Command Line
=====================

The general layout of an ansar command appears below;

   $ ansar \[--<*ansar-setting*\ >=<*value*\ > ..\] <*sub-command*\ > \[-<*sub-setting*\ >=<*value*\ > ..\] \[*word* ..\]

Each command involves a *sub-command*, optional *settings* and *words*. Settings appearing before the sub-command are more
general and associated with ansar, while those appearing after are associated with the specific sub-command. The optional list
of words is also associated with the sub-command.

A Sub-Command Example
=====================

Consider the following description of the `create` sub-command;

   $ ansar create \[<*home-path*\ >\] \[--redirect-<*name*\ >=<*path*\ > …]

The sub-command creates a *home*. It accepts an optional *home-path* as the location of the new home and an optional list of
redirection settings. The home is subsequently configured with a list of processes to execute, and provides areas where each
of those processes can store operational materials, e.g. logs.

.. code-block::

   $ ansar create --redirect-bin=dist
   $ ls -la
   total 220
   drwxrwxr-x 10 dennis dennis  4096 Mar 25 04:02 .
   drwxrwxr-x 12 dennis dennis  4096 Mar 21 14:32 ..
   drwxrwxr-x 10 dennis dennis  4096 Mar 25 04:02 .ansar-home
   drwxrwxr-x  2 dennis dennis  4096 Mar 24 16:34 dist
   -rw-rw-r--  1 dennis dennis  1420 Mar 15 13:52 noop.py
   ..

In the absence of an explicit *home-path*, the path for this home has defaulted to ``.ansar-home`` in the current folder. A
redirection of bin to the ``./dist`` folder (i.e. the default output folder for ``pyinstaller`` executables) has been recorded
within the new home. All executables named in subsequent ansar commands will be expected to exist in ``./dist``.

.. note::

   Without the redirection of the ``bin`` folder, executables are expected to exist in a dedicated home sub-folder. In
   these scenarios executables are transferred to that dedicated folder using the ``deploy`` sub-command. Refer to
   later sections for further details.

Basic Behaviour
===============

The commands used to populate a home with process definitions and create an operational set of those processes, are
covered in the following sections.

Add A Process And Run It
------------------------

.. code-block::

   $ ansar add noop
   $ ansar list
   noop-0
   $ ansar run --debug-level=CONSOLE
   19:40:25.847 ^ <00000009>noop - Log this and exit
   {
       "value": [
           "ansar.command.ansar_command.Run",
           {
               "completed": [
                   [
                       "noop-0",
                       [
                           "ansar.create.lifecycle.Ack",
                           {},
                           []
                       ]
                   ]
               ],
               "home": ".ansar-home"
           },
           []
       ]
   }
   $

The ``add`` sub-command is used to add an instance of an executable to the default home. Technically, it adds a
description of an instance, i.e. there are no new platform processes created by this command. The current set of
descriptions are listed to confirm the new entry and then the entire list (i.e. the single instance of noop) is
executed, using the ``run`` sub-command.

The requested logging (``--debug-level=CONSOLE``) is placed on ``stderr`` and the output from the ``run``
command is placed on ``stdout``. The ``noop`` process logged its efforts and returned an :class:`~.lifecycle.Ack`
object to the run command. A full transcript of the console-level logging is included (minus the process id and
full timestamps) for this first example; subsequent examples will omit logging that is not relevant to the
demonstration.

Each instance of an executable is known by a *role* - a short name that describes the part the instance plays
within the collection. In the above command both the *role* and the *home* have assumed default values. The role
defaults to the name of the executable with a small suffix appended to it; the reasons for this behaviour will
become clear in later sections.

A more explicit use of the add command looks like this::

   $ ansar add robot-arm left-arm toy-robot --rotation=-90.0

This command says to add the ``left-arm`` instance of the ``robot-arm`` executable to the ``toy-robot`` home. The
rotation setting for the new instance is initialized to -90.0.

Add A Process With Persistent Settings
--------------------------------------

.. code-block::

   $ ansar add snooze
   $ ansar list
   noop-0
   snooze-0
   $ ansar run --debug-level=DEBUG
   01:33:10.063 + <00000007>lock_and_hold - Created by <00000001>
   01:33:10.064 > <00000007>lock_and_hold - Sent Ready to <00000001>
   01:33:10.064 + <00000007>lock_and_hold - Created by <00000001>
   01:33:10.064 + <00000008>start_vector - Created by <00000001>
   01:33:10.064 > <00000007>lock_and_hold - Sent Ready to <00000001>
   01:33:10.064 ~ <00000008>start_vector - .. "../dist/snooze" ..
   01:33:10.064 ~ <00000008>start_vector - Working folder ..
   01:33:10.064 ~ <00000008>start_vector - .. "__main__.snooze"
   01:33:10.064 ~ <00000008>start_vector - Class threads ..
   01:33:10.064 + <00000009>snooze - Created by <00000008>
   01:33:10.065 ^ <00000009>snooze - Do nothing for 2.0 seconds
   01:33:10.065 > <00000009>snooze - Sent StartTimer to <00000003>
   01:33:10.065 + <00000008>start_vector - Created by <00000001>
   01:33:10.065 ~ <00000008>start_vector - .. "../dist/noop" ..
   01:33:10.065 ~ <00000008>start_vector - Working folder ..
   01:33:10.065 ~ <00000008>start_vector - .. "__main__.noop"
   01:33:10.065 ~ <00000008>start_vector - Class threads ..
   01:33:10.065 + <00000009>noop - Created by <00000008>
   01:33:10.065 ^ <00000009>noop - Log this and exit
   01:33:10.065 X <00000009>noop - Destroyed
   01:33:10.065 < <00000008>start_vector - Received Completed ..
   01:33:10.065 X <00000008>start_vector - Destroyed
   01:33:10.165 < <00000007>lock_and_hold - Received Stop ..
   01:33:10.170 X <00000007>lock_and_hold - Destroyed
   01:33:12.067 < <00000009>snooze - Received T1 from <00000003>
   01:33:12.067 X <00000009>snooze - Destroyed
   01:33:12.067 < <00000008>start_vector - Received Completed ..
   01:33:12.067 X <00000008>start_vector - Destroyed
   01:33:12.068 < <00000007>lock_and_hold - Received Stop ..
   01:33:12.071 X <00000007>lock_and_hold - Destroyed
   $

A second role is created with the ``snooze`` executable. The home now has 2 entries. The ``run`` command - by
default - starts all the processes described in the collection and waits for them all to complete. With the
addition of a ``snooze``, that completion now takes a few seconds. Logging still shows the immediate exit of
the ``noop`` command. Both start at around the 10.064 mark and the noop object terminates 100th of a second
later. The timer for ``snooze`` is not received until 12.067, about 2 seconds after the object was created.

.. code-block::

   $ ansar update snooze-0 --seconds=5.0
   $ ansar run --debug-level=CONSOLE
   19:49:35.903 ^ <00000009>snooze - Do nothing for 5.0 seconds
   19:49:35.903 ^ <00000009>noop - Log this and exit
   ..

The ``seconds`` setting for the ``snooze-0`` instance is assigned a longer value. Another run shows the ``snooze``
command behaving accordingly.

Add A Process That Never Wants To Terminate
-------------------------------------------

.. code-block::

   $ ansar add zombie
   $ ansar list
   noop-0
   snooze-0
   zombie-0
   $ ansar run --debug-level=CONSOLE
   19:53:30.562 ^ <00000009>snooze - Do nothing for 5.0 seconds
   19:53:30.563 ^ <00000009>noop - Log this and exit
   19:53:30.563 ^ <00000009>zombie - Do nothing until interrupted
   ^C{
   ..
   $

Adding an instance of ``zombie`` changes an essential behaviour of the collection; it no longer self-terminates.
Both ``noop`` and ``snooze`` eventually terminate if given enough time, but user intervention is required to
terminate ``zombie`` and the run inherits that requirement.

.. note::

   Control-c is the standard command-line mechanism for terminating long running processes.
   In a standard async process, a control-c is caught by :func:`~.framework.create_object` and converted
   to an :class:`~.lifecycle.Stop` message. In response, every async process is expected to terminate
   gracefully. A control-c is caught by the ansar command and propagated to all its children,
   resulting in the shutdown of the run. The action also injects a circumflex-cee (^C) into the
   terminal output, disrupting the logging. Logs redirected to a file will will not include
   that disruption.

.. code-block::

   $ ansar delete zombie-0
   $ ansar list
   noop-0
   snooze-0
   $

A ``delete`` command is used to remove the ``zombie-0`` role from the home. This both
demonstrates the command and restores a more convenient behaviour for the purposes of
this tour.

Add A Process That Expects Input
--------------------------------

.. code-block::

   $ ansar add factorial
   $ ansar run --debug-level=CONSOLE
   [00438217] 2023-04-06T01:52:17.317 ^ <00000009>snooze - Do nothing ..
   [00438216] 2023-04-06T01:52:17.317 ^ <00000009>fact - factorial(5)
   [00438218] 2023-04-06T01:52:17.317 ^ <00000009>noop - Log this and exit
   [00438245] 2023-04-06T01:52:17.418 ^ <00000008>fact - factorial(4)
   [00438255] 2023-04-06T01:52:17.520 ^ <00000008>fact - factorial(3)
   [00438265] 2023-04-06T01:52:17.621 ^ <00000008>fact - factorial(2)
   [00438275] 2023-04-06T01:52:17.722 ^ <00000008>fact - factorial(1)
   [00438285] 2023-04-06T01:52:17.823 ^ <00000008>fact - factorial(0)
   {
       "value": [
           "ansar.command.ansar_command.Run",
           {
               "completed": [
               ..
                   [
                       "factorial-0",
                       [
                           "lib.factorial_if.FactorialReturned",
                           {
                               "value": 120
                           },
                           []
                       ]
                   ],
               ..
               ],
               "home": ".ansar-home"
           },
           []
       ]
   }
   $

An instance of ``factorial`` is added. This is a different executable in that it’s the first demonstration
executable to create sub-processes. It uses the ansar ability to “call” a process as if it were a function,
as a basis for a recursive implementation of the factorial function. The chain of processes can be seen in
the logs - note the [00438216] process id on the ``<00000009>fact - factorial(5)`` log and how that id and log
change as the chain extends.

.. code-block::

   $ ansar input factorial-0
   {
       "value": 5
   }
   $

The ``input`` command can be used to view the initial input for the named role. This is a default encoding
created during the ``add factorial`` command.

.. code-block::

   $ cat factorial-7
   {
       "value": 7
   }
   $ ansar input factorial-0 --set-file=factorial-7
   $ ansar run
   {
       "value": [
           "ansar.command.ansar_command.Run",
           {
               "completed": [
               ..
                   [
                       "factorial-0",
                       [
                           "lib.factorial_if.FactorialReturned",
                           {
                               "value": 5040
                           },
                           []
                       ]
                   ],
               ..
               ],
               "home": ".ansar-home"
           },
           []
       ]
   }
   $

The ``input`` command can also be used to modify the initial input for the named role. Use the ``--set-file``
parameter to store new initial input for a role.

Redefining The Settings For A Process
-------------------------------------

.. code-block::

   $ cat short-snooze
   {
       "value": {
           "seconds": 1.0
       }
   }
   $ ansar settings snooze-0 --set-file=short-snooze
   $ ansar run --debug-level=CONSOLE
   14:45:55.859 ^ <00000009>noop - Log this and exit
   14:45:55.859 ^ <00000009>factorial - factorial(5)
   14:45:55.859 ^ <00000009>snooze - Do nothing for 1.0 seconds
   14:45:55.961 ^ <00000008>factorial - factorial(4)
   14:45:56.062 ^ <00000008>factorial - factorial(3)
   14:45:56.162 ^ <00000008>factorial - factorial(2)
   14:45:56.263 ^ <00000008>factorial - factorial(1)
   14:45:56.363 ^ <00000008>factorial - factorial(0)
   {
       "value": [
           "ansar.command.ansar_command.Run",
           {
               "completed": [
               ..
                   [
                       "snooze-0",
                       [
                           "ansar.create.lifecycle.Ack",
                           {},
                           []
                       ]
                   ],
               ..
               ],
               "home": ".ansar-home"
           },
           []
       ]
   }
   $

Persistent settings associated with processes can be modified using the ``update`` command or
the ``settings`` command. By accepting complete encodings the settings command provides for the
full expression of ansar encodings, specifically including graphs.

Adding Some Workload
--------------------

.. code-block::

   $ ansar add busy
   $ cat busy-input
   {
       "value": {
           "duties": [
               "noop",
               "snooze",
               "factorial"
           ],
           "management_levels": 5,
           "managers": 3
       }
   }
   $ ansar input busy-0 --set-file=busy-input
   $ ansar run --debug-level=CONSOLE
   22:46:31.177 ^ <00000009>snooze - Do nothing for 2.0 seconds
   22:46:31.177 ^ <00000009>factorial - factorial(5)
   22:46:31.178 ^ <00000009>noop - Log this and exit
   22:46:31.295 ^ <00000008>noop - Log this and exit
   22:46:31.295 ^ <00000008>factorial - factorial(4)
   22:46:31.295 ^ <00000008>snooze - Do nothing for 2.0 seconds
   22:46:31.296 ^ <00000008>factorial - factorial(5)
   22:46:31.490 ^ <00000008>snooze - Do nothing for 2.0 seconds
   22:46:31.499 ^ <00000008>factorial - factorial(4)
   ..
   22:46:46.327 ^ <00000008>factorial - factorial(0)
   22:46:46.339 ^ <00000008>factorial - factorial(0)
   22:46:46.423 ^ <00000008>factorial - factorial(0)
   {
       "value": [
           "ansar.command.ansar_command.Run",
           {
               "completed": [
                   ..
                   [
                       "factorial-0",
                       [
                           "lib.factorial_if.FactorialReturned",
                           {
                               "value": 120
                           },
                           []
                       ]
                   ],
                   [
                       "busy-0",
                       [
                           "lib.job_if.JobReturned",
                           {
                               "processes": 484
                           },
                           []
                       ]
                   ]
               ],
               "home": ".ansar-home"
           },
           []
       ]
   }
   $

Recursion is again used to create a tree of ``busy`` processes that has a defined number of
levels (i.e. ``management_levels``) and a defined number of branches (i.e. ``managers``).
The run command results in the creation of 484 processes plus those entries previously added
alongside the ``busy-0`` entry.

.. note::

   Interruption of complex and dynamic collections of processes is likely to catch some
   processes in the early stage of their lives. A control-c can interrupt the Python interpreter
   as it is performing ``import`` operations, long before the ``__name__ == "__main__"`` has
   even been reached. The consequence is that signals may be processed by the default handlers
   inside the interpreter; tracebacks will appear on stdout. The :class:`~.processing.Process` machine
   catches this event and converts it into an :class:`~.lifecycle.Aborted` message, which is returned
   to the parent async object, preserving operational integrity.

More Advanced Use
=================

Use of the ``run`` command results in immediate feeback. Logging from all the related sub-processes
is placed on ``stderr`` for viewing, or saving in a file for off-line analysis. These are valuable
development procedures.

The ``start`` command is similar to ``run`` except that after starting the related set of processes,
control is immediately returned to the command-line, leaving the processes to continue in the background.
Logging no longer appears on ``stderr`` but is instead appended to a per-process storage area. Further
``ansar`` commands provide access to those logs, as well as administration of the background processes.

Starting Processes In The Background
------------------------------------

.. code-block::

   $ ansar update snooze-0 --seconds=10.0
   $ ansar add zombie
   $ ansar list
   factorial-0
   noop-0
   snooze-0
   zombie-0
   $ ansar start
   $ ansar status
   snooze-0
   zombie-0

Logging no longer appears on the terminal. The ``status`` command shows which roles within a collection are
currently running. In this example, the use of ``status`` must have followed the ``start`` quickly enough
to catch ``snooze-0`` before it self-terminated.

.. code-block::

   $ ansar start
   ansar: cannot perform "start", "(all)" currently running as - 589299
   $ ansar status -l
   zombie-0                 <589299> 2m25.6s
   $ ansar stop
   ansar: cannot perform "stop", "(all)" not currently running - busy-0, factorial-0, noop-0, snooze-0
   $ ansar stop zombie-0
   $ ansar start
   $

Attempts to run multiple instances of a role are detected and reported. In this case it’s the ``zombie-0`` role,
verified by the matching process IDs in the error message and the (long) status output.

Commands that involve a role - e.g. ``run``, ``start``, ``stop`` and others - accept a role-search as a
parameter. Omitting the parameter is assumed to mean  “match everything”. Where the command encounters any
form of mismatch between the intentions of the command and the current set of processes, it terminates with
an error message. In the above case, the intentions of the command were to run a new set of all the processes
in the group. Instead, it detected an operational member of the group and terminated. The ``--force`` ansar flag
can be used to override that cautionary behaviour. This would cause the command to kill the operational instance
of ``zombie-0`` before creating the new set, including the new instance of ``zombie-0``.

Reviewing Background Activity
-----------------------------

.. code-block::

   $ ansar log snooze-0
   $ ansar log snooze-0 --last=WEEK
   00:41:03.329 + <00000007>lock_and_hold - Created by <00000001>
   00:41:03.329 > <00000007>lock_and_hold - Sent Ready to <00000001>
   00:41:03.330 + <00000008>start_vector - Created by <00000001>
   00:41:03.330 ~ <00000008>start_vector - Executable "/home/brad/somewhere/dist/snooze" as process (369676)
   00:41:03.330 ~ <00000008>start_vector - Working folder "/"
   00:41:03.330 ~ <00000008>start_vector - Running object "__main__.snooze"
   00:41:03.330 ~ <00000008>start_vector - Class threads (1) "retries" (1)
   00:41:03.330 + <00000009>snooze - Created by <00000008>
   00:41:03.330 ^ <00000009>snooze - Do nothing for 2.0 seconds
   00:41:03.330 > <00000009>snooze - Sent StartTimer to <00000003>
   00:41:05.332 < <00000009>snooze - Received T1 from <00000003>
   00:41:05.332 X <00000009>snooze - Destroyed
   00:41:05.332 < <00000008>start_vector - Received Completed from <00000009>
   00:41:05.332 X <00000008>start_vector - Destroyed
   00:41:05.333 < <00000007>lock_and_hold - Received Stop from <00000001>
   00:41:05.336 X <00000007>lock_and_hold - Destroyed

Logs produced by foreground processes (i.e. using ``run``) are presented on stderr and then - without
deliberate action - lost. Logs produced by background processes are directed into persistent storage and
subsequently recovered with the ansar ``log`` command.

The first ``log`` command above fails as the default behaviour is to query for logs generated within the
last 5 minutes; ``snooze-0`` has been idle since it terminated at 00:41:05.332. The second command uses one
of the log parameters to extend the query to the start of the current week - 12:00am on Monday. This matches
everything from that moment onward and the first entry happens to be at 00:41:03.329. Note that the time
value on logs is the full ISO 8601 format but they appear truncated here for brevity.

As well as ``WEEK`` there are also ``MONTH``, ``DAY``, ``HOUR``, ``MINUTE``, ``HALF``, ``QUARTER``, ``TEN``
and ``FIVE`` enumerations where ``HALF`` and ``QUARTER`` refer to portions of an hour and ``TEN`` and ``FIVE``
refer to numbers of minutes. In all cases, the enumeration describes the start of a fixed time period rather
than a time span, e.g. ``--last=HOUR`` will list logs starting at the most recent hourly mark. To look
back 60 minutes use ``--back=1h``.

The ansar ``log`` command also accepts the following parameters;

.. list-table::
   :widths: 25 75

   * - ``clock``
     - *use local time for both input and output*
   * - ``from``
     - *start in ISO time format*
   * - ``start``
     - *start as index into start-stop history*
   * - ``back``
     - *start as negative offset from current time*
   * - ``to``
     - *end in ISO time format*
   * - ``span``
     - *positive offset from evaluated start*
   * - ``count``
     - *end as a number of logs*

.. code-block::

   $ ansar log snooze-0 --last=MONTH --count=20

This command will list the first 20 logs generated by ``snooze-0``, since the first of the current month.
Log storage is self-maintaining. A FIFO approach ensures that when the total storage of logs reaches a
configured maximum, the arrival of further logs causes the deletion of the oldest.

.. code-block::

   $ ansar status -l
   zombie-0                 <377524> 12.7s

For a detailed view of operational processes use the ``--long-listing`` parameter (or the ``-l`` shorthand
flag). This view includes the process ID and the time since the process was created.

Behaviour Of Background Processes Over Time
-------------------------------------------

.. code-block::

   $ ansar history zombie-0
   [0] 7h30m ago ... 1m5.4s (Aborted)
   [1] 4m44.6s ago ... 4m34.3s (Aborted)

A role is a name for an instance of an executable which may be started and stopped many times within the
lifetime of a home. Logs are seamless with respect to these ups and downs though it is fairly easy to
infer the boundaries from the contents of individual logs. Ansar also keeps an explicit record of when
processes are started, when they are stopped and the value returned by the main object. Use the ``history``
command to print a table of start times and run durations. The printed indexes can be used in the ``start``
parameter in the ``log`` command to select logs from the start of a particular execution.

.. code-block::

   $ ansar returned zombie-0 --start=0
   {
       "value": [
           "ansar.create.lifecycle.Aborted",
           {},
           []
       ]
   }

Those same indexes can also be used in the ``returned`` command to select which return value to
print. If the command is used to access the results of the latest execution (e.g. no ``--start``
is specified) and that execution has not yet completed, the command will wait until the information
is available.

Development Automation
**********************

Combining the nature of :func:`~.framework.create_object`-based applications with the ``ansar`` command line tool goes some way
towards engineering of multi-processing solutions. This section considers the potential to streamline the standard
edit-build-test-debug loop, to relieve developers of as many repetitive, error-prone responsibilities as possible,
within that multi-processing context.

A huge array of tools are available in this space, especially if cloud deployment is the end goal. The arrangements of
tools and procedures suggested here are deliberately simple in the hope that any potential to integrate with your
own development toolsets is as clear as possible.

The Standard Loop With Multi-Processing
=======================================

Multi-processing complicates the standard development loop. Source code changes may be occurring over one
executable or many. Test runs require the presence of multiple distinct processes that properly represent
the current codebase, i.e. which source files have changed, which executables need building, do executables
need to be copied from the build areas, are there running processes that need to be replaced, and how to execute
and collate unit tests happening across multiple processes? What about those processes that need supporting
data files?

Defining The Set Of Processes
-----------------------------

This part of the tour involves a new home;

.. code-block::

   $ ansar -f destroy
   $ ansar create --redirect-bin=dist
   $ ansar add server
   $ ansar add client
   $ ansar list
   client-0
   server-0

The ``destroy`` command is used to delete the default home. Passing the ``-f`` flag ensures that any process
associated with the old home is properly terminated. The ``create`` command prepares the new home folder for the
subsequent ``add`` commands. A *redirect* is again used to link the new home with a build folder. Instances
of the ``server`` and ``client`` executables are added and assume the default names, ``server-0`` and ``client-0``.

In this composition of processes the ``server`` is the component under development. The ``client`` is a test
client - it connects to the server, submits requests and expects responses. A standard ansar method - available
to every async object - is used to verify the details of each request-response pair. This creates a sequence
of pass/fail records that go on to form the basis of a test report.

The ``server`` implements a very basic word mapping service. Words are sent across a connection and mapped inside
the server to a stored alternative. The mapped alternative is sent back across the connection as a response. If
no entry is found for a submitted word, the same word is echoed back to the client. Such a service might form
the basis of a "hint" facillity for a spelling checker.

.. note::

   Implementation of networking within the ``server`` and ``client`` components is for demonstration purposes
   only. There are several reasons why the approach should not be used for production quality software
   including scalability, the use of blocking sockets and the lack of message encoding/decoding.

When Clients Are Started Before Servers
---------------------------------------

An initial run of the new set of processes hits a bump. All the processes are effectively started at the same
time and starting a client before the server has had a chance to establish itself will inevitably lead to
problems;

.. code-block::
   :emphasize-lines: 14

   $ ansar run --debug-level=DEBUG
   07:29:26.188 + <00000007>start_vector - Created by <00000001>
   07:29:26.188 ~ <00000007>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/.dev/bin/ansar" as process (698548)
   07:29:26.188 ~ <00000007>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking"
   07:29:26.188 ~ <00000007>start_vector - Running object "ansar.command.ansar_command.ansar"
   07:29:26.188 ~ <00000007>start_vector - Class threads (1) "retries" (1)
   07:29:26.189 + <00000008>ansar - Created by <00000007>
   07:29:26.189 ~ <00000008>ansar - Call the sub-command function
   07:29:26.189 ^ <00000008>ansar - Detect status of associated roles (server-0, client-0, zombie-0)
   07:29:26.189 + <00000009>lock_and_hold - Created by <00000008>
   ..
   07:29:26.304 ~ <00000008>start_vector - Running object "__main__.zombie"
   07:29:26.304 ~ <00000008>start_vector - Class threads (1) "retries" (1)
   07:29:26.304 ? <00000009>client - Session error - "[Errno 111] Connection refused"
   07:29:26.304 X <00000009>client - Destroyed
   07:29:26.304 < <00000008>start_vector - Received Completed from <00000009>

A connection has been attempted by the ``client`` and "refused".

Without some kind of orchestration of activity at the networking level there is no way to advise the client
of the appropriate moment to initiate connection. Ansar provides an alternative solution. Any process that is
part of a home can be configured with a *retry* strategy.

The effect is that ``client-0`` is performed repeatedly until a goal is reached or the strategy is exhausted.
In this case the goal is to establish a valid connection. All that the ``client`` needs to do is return certain
values that either keep the retries active or cause termination.

Please Repeat That
------------------

A single ansar command sets up the handling of connection failures;

.. code-block::
   :emphasize-lines: 25, 32

   $ cat client-retry
   {
   	   "value": {
           "first_steps": [1.0, 2.0, 4.0]
       }
   }
   $ ansar set retry client-0 --encoding-file=client-retry
   $ ansar run --debug-level=DEBUG
   05:32:15.384 + <00000007>lock_and_hold - Created by <00000001>
   ..
   05:32:15.384 + <00000008>start_vector - Created by <00000001>
   05:32:15.385 ~ <00000008>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/dist/server" as process (729010)
   05:32:15.385 ~ <00000008>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/dist/client" as process (729009)
   05:32:15.385 ~ <00000008>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking"
   05:32:15.385 ~ <00000008>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking"
   05:32:15.385 ~ <00000008>start_vector - Running object "__main__.client"
   05:32:15.385 ~ <00000008>start_vector - Class threads (1) "retries" (1)
   05:32:15.385 ~ <00000008>start_vector - Running object "__main__.server"
   05:32:15.385 + <00000009>Retry[INITIAL] - Created by <00000008>
   05:32:15.385 ~ <00000008>start_vector - Class threads (1) "retries" (1)
   05:32:15.385 < <00000009>Retry[INITIAL] - Received Start from <00000008>
   05:32:15.385 + <00000009>server - Created by <00000008>
   05:32:15.385 + <0000000a>client - Created by <00000009>
   05:32:15.385 + <0000000a>listen - Created by <00000009>
   05:32:15.385 ? <0000000a>client - Session error - "[Errno 111] Connection refused"
   05:32:15.385 X <0000000a>client - Destroyed
   05:32:15.385 < <00000009>Retry[ATTEMPTING] - Received Completed from <0000000a>
   05:32:15.385 ^ <00000009>Retry[ATTEMPTING] - Pausing for 1.000000 seconds
   05:32:15.385 > <00000009>Retry[ATTEMPTING] - Sent StartTimer to <00000003>
   05:32:16.385 < <00000009>Retry[PAUSING] - Received T1 from <00000003>
   05:32:16.386 + <0000000b>client - Created by <00000009>
   05:32:16.387 ^ <0000000b>client - Connected to ('127.0.0.1', 65432)
   05:32:16.387 + <0000000b>accepted - Created by <0000000a>
   05:32:16.387 ^ <0000000b>accepted - Accepted on ('127.0.0.1', 52278)
   05:32:16.394 = <0000000b>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51)
   05:32:16.394 = <0000000b>client - Expected b'define' for b'explain', got b'explain' (client.py:56)
   05:32:16.395 = <0000000b>client - Expected b'droll' for b'fly', got b'fly' (client.py:61)
   05:32:16.395 X <0000000b>accepted - Destroyed
   05:32:16.395 X <0000000b>client - Destroyed
   05:32:16.395 < <00000009>Retry[ATTEMPTING] - Received Completed from <0000000b>
   05:32:16.395 X <00000009>Retry[ATTEMPTING] - Destroyed
   05:32:16.395 < <00000008>start_vector - Received Completed from <00000009>
   05:32:16.395 X <00000008>start_vector - Destroyed
   ..

The ``set`` command is used to update a small set of properties associated with each home entry and one such
property is ``retry``. Setting this value activates *retries* inside the :func:`~.framework.create_object` function. Running
the application object (e.g. ``client()``) is subsequently considered to be an *attempt* and the value returned by
each attempt influences what happens next;

.. list-table::
   :widths: 25 75

   * - :class:`~.lifecycle.Maybe`
     - *not successful, try again later*
   * - :class:`~.lifecycle.Cannot`
     - *abandon, bad request or environmental problem*
   * - \*
     - *any other message indicates success*

Where a repeat attempt is indicated (:class:`~.lifecycle.Maybe`) the retry machinery consults the retry property for a time
delay and uses the ``T1`` timer to impose the "down time". In the given example the delay is 1.0s - the first
value from the ``first_steps`` list. The retry property provides the following values;

.. list-table::
   :widths: 25 75

   * - ``first_steps[]``
     - *list of float, the initial time delays*
   * - ``regular_steps``
     - *float, repeating delay*
   * - ``step_limit``
     - *int, maximum number of delays*
   * - ``randomized``
     - *float, time slices for backoff*
   * - ``truncated``
     - *float, reduce the scale of backoff*

The first 3 values can be used to describe a series of float values while the latter 2 enable a secondary adjustment
of those values with the goal of avoiding the "everyone retrying at the same moment" phenomenon.

.. list-table::
   :widths: 25 25 20 20 20 60
   :header-rows: 1

   * - ``first_steps``
     - ``regular_steps``
     - ``step_limit``
     - ``randomized``
     - ``truncated``
     - *sequence*
   * - [1.0, 2.0, 4.0]
     - None
     - None
     - None
     - None
     - [1.0, 2.0, 4.0]
   * - []
     - 1.0
     - None
     - None
     - None
     - [1.0, 1.0, 1.0, 1.0 ...]
   * - []
     - 1.0
     - 4
     - None
     - None
     - [1.0, 1.0, 1.0, 1.0]
   * - [1.0, 2.0, 4.0]
     - 8.0
     - 6
     - None
     - None
     - [1.0, 2.0, 4.0, 8.0, 8.0, 8.0]
   * - [1.0, 2.0, 4.0]
     - 8.0
     - 6
     - 0.25
     - 0.5
     - [1.25, 2.5, 6.0, 11.0, 9.5, 10.5]

Any set of values involving a non-None ``regular_steps`` and a None ``step_limit`` describes an endless sequence.
Combining ``first_steps`` with a value for ``randomized`` can produce a form of *exponential backoff*. The latter value
is used to slice up the latest time delay into available slots and one of those slots is selected randomly. The ``truncation``
value reduces the portion of the time delay that is available for slicing, e.g. a value of 0.25 limits the
potential adjustment to a quarter. The adjustment is additive.

It is the combination of the retry property and the conditions met by each attempt, that determines the final
behaviour of the process.

Unit Tests In Multi-Process Solutions
-------------------------------------

Tests are implemented using the :meth:`~.point.Point.test` method (see ``client.py``);

.. code-block:: python

   word, expect = b'eager', b'fervent'
   s.sendall(word)
   reply = s.recv(1024)
   self.test(reply == expect, f"Expected {expect} for {word}, got {reply}")

A word is sent over a socket and the response is compared against an expected value. This fragment of code meets all
the requirements of an ansar test.

Output occurs in two ways. Firstly, all failed tests (i.e. where the conditional evaluates to false) generate a log at
the ``WARNING`` level. Secondly, all test results are collected in a background async object. Test applications such
as the ``client`` request that information at the end of an execution and return the results in the form of a :class:`~.test.TestReport`.

.. code-block:: python

   ar.test_enquiry(self)
   report = self.select(ar.TestReport)
   return report

A standalone execution of the ``client`` demonstrates this activity;

.. code-block::

   $ ansar list
   client-0
   server-0
   $ ansar status
   $ ansar start server-0
   $ dist/client --debug-level=OBJECT
   19:35:35.299 + <00000008>start_vector - Created by <00000001>
   19:35:35.300 + <00000009>client - Created by <00000008>
   19:35:35.300 ^ <00000009>client - Connected to ('127.0.0.1', 65432)
   19:35:35.303 = <00000009>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51)
   19:35:35.303 = <00000009>client - Expected b'define' for b'explain', got b'explain' (client.py:56)
   19:35:35.303 = <00000009>client - Expected b'droll' for b'fly', got b'fly' (client.py:61)
   19:35:35.303 > <00000009>client - Sent Enquiry to <00000004>
   19:35:35.304 < <00000009>client - Received TestReport from <00000004>
   19:35:35.304 X <00000009>client - Destroyed
   19:35:35.304 < <00000008>start_vector - Received Completed from <00000009>
   19:35:35.304 X <00000008>start_vector - Destroyed
   {
       "value": [
           "ansar.create.test.TestReport",
           {
               "failed": 3,
               "passed": 0,
               "tested": [
                   {
                       "condition": false,
                       "line": 51,
                       "name": "client",
                       "source": "client.py",
                       "stamp": "2023-05-30T19:35:35.302931",
                       "text": "Expected b'fervent' for b'eager', got b'eager'"
                   },
                   {
                       "condition": false,
                       "line": 56,
                       "name": "client",
                       "source": "client.py",
                       "stamp": "2023-05-30T19:35:35.303186",
                       "text": "Expected b'define' for b'explain', got b'explain'"
                   },
                   {
                       "condition": false,
                       "line": 61,
                       "name": "client",
                       "source": "client.py",
                       "stamp": "2023-05-30T19:35:35.3034",
                       "text": "Expected b'droll' for b'fly', got b'fly'"
                   }
               ]
           },
           []
	    ]
   }

Failed tests appear in the logging stream and all tests appear in the final JSON output. The details retained
for each test can be seen in the ``tested`` list. As well as the more obvious inclusion of the ``condition``
and ``text`` values, ansar augments the results with the name of the module that performed the test and also
the line number within that module. These values are critical to a good edit-run-debug loop, as
demonstrated in the following sections.

.. note::

   It is worth mentioning that :meth:`~.point.Point.test` has value in an application, independent of whether that
   application decides to return a :class:`~.test.TestReport`. It is effectively a shorthand for "if condition is
   false log this warning". Background collection of test results will eventually reach a maximum and
   at that point, will begin discarding test results. The maximum number retained is kept fairly small (a few
   hundred) for practical reasons. The collection cannot have infinite size and in a high velocity development
   loop, larger and larger numbers of failed tests have decreasing value.

There can be any number of processes like ``client-0`` within a home, performing tests and generating
``TestReports``. The results from these individual test processes can be inspected with commands such
as :ref:`ansar log <ansar-command-reference-log>`, ``ansar history`` and ``ansar returned``.

Quick Navigation Of Failed Tests
--------------------------------
The next step is to collect the information from the ``TestReports`` and present them in a manner that
facillitates quick navigation of the relevant source. Happily, this is exactly what :ref:`ansar run <ansar-command-reference-run>` can do.
By adding a few arguments to the command, a useful listing is produced;

.. code-block::

   $ ansar stop server-0
   $ ansar run --code-path=. --test-run
   ^Crole "client-0" (pass/fail): 0/3
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:51 - Expected b'fervent' for b'eager', got b'eager'
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:56 - Expected b'define' for b'explain', got b'explain'
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly'
   $ 

The :ref:`ansar run <ansar-command-reference-run>` command looks through the list of results from all the finished processes, checking
for instances of :class:`~.test.TestReport`. It collects these into a single :class:`~.test.TestSuite` object. If it detects
certain command-line information (i.e. ``--code-path=<path>``) it uses that information to resolve the
final names of the source files mentioned in individual tests.

By default the improved report information appears in the normal JSON output of the ansar command.
Specifying a ``--test-run`` causes ansar to set aside its normal behaviour and instead provide a summary
of the gathered information. It prints a table of the processes (i.e. roles) that supplied the test
information, and then a second table consisting of *source file*, *line number* and *warning text* - one
line for each failed test.

This is the final piece of the development loop. Running this command inside a VS Code ``bash`` window
gives the IDE enough information for quick navigation to the offending lines of source code. Hovering over
each line results in the underlining of the source address and a control-click takes the cursor to the exact
location. Most modern IDEs support similar behaviour.

Custom Handling Of Test Results
-------------------------------

If the "test run" information is not sophisticated enough or there is potential for better local integration,
the :ref:`ansar run <ansar-command-reference-run>` command also supports the passing of the :class:`~.test.TestSuite` to a designated executable.

.. code-block::
   :emphasize-lines: 20-22

   $ ansar run  --code-path=. --test-analyzer=analyzer --debug-level=OBJECT
   08:40:16.100 + <00000008>lock_and_hold - Created by <00000001>
   08:40:16.101 > <00000008>lock_and_hold - Sent Ready to <00000001>
   ..
   08:40:16.102 ^ <0000000b>client - Connected to ('127.0.0.1', 65432)
   08:40:16.102 + <0000000c>accepted - Created by <0000000b>
   08:40:16.102 ^ <0000000c>accepted - Accepted on ('127.0.0.1', 37696)
   08:40:16.105 = <0000000b>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51)
   08:40:16.106 = <0000000b>client - Expected b'define' for b'explain', got b'explain' (client.py:56)
   08:40:16.106 = <0000000b>client - Expected b'droll' for b'fly', got b'fly' (client.py:61)
   ..
   08:40:16.106 X <00000009>start_vector - Destroyed
   08:40:16.203 < <00000008>lock_and_hold - Received Stop from <00000001>
   08:40:16.212 X <00000008>lock_and_hold - Destroyed
   ^C08:40:18.506 < <00000009>start_vector - Received Stop from <00000001>
   08:40:18.506 > <00000009>start_vector - Sent Stop to <0000000a>
   08:40:18.506 < <0000000a>server - Received Stop from <00000009>
   ..
   08:40:18.806 + <00000008>start_vector - Created by <00000001>
   08:40:18.807 + <00000009>analyzer - Created by <00000008>
   08:40:18.807 ^ <00000009>analyzer - Analyzed client-0
   08:40:18.807 X <00000009>analyzer - Destroyed
   08:40:18.807 < <00000008>start_vector - Received Completed from <00000009>
   08:40:18.807 X <00000008>start_vector - Destroyed
   {
       "value": [
           "ansar.create.lifecycle.Ack",
           {},
           []
       ]
   }

Test results are now passed to the ``analyzer`` executable. The analyzer logs the names of roles that
supplied the results and terminates with an :class:`~.lifecycle.Ack`. The ansar command assumes that the return
value from the analyzer should be returned as the result for the run itself.

Deployment Of Supporting File Materials
---------------------------------------

Getting back to the development loop, there are still the 3 failed tests to consider;

.. code-block::

   08:40:16.105 = <0000000b>client - Expected b'fervent' for b'eager', got b'eager' (client.py:51)
   08:40:16.106 = <0000000b>client - Expected b'define' for b'explain', got b'explain' (client.py:56)
   08:40:16.106 = <0000000b>client - Expected b'droll' for b'fly', got b'fly' (client.py:61)

There is now a means to quickly navigate through the source code relating to the failed tests. To actually fix
those failures there are two possibilities - either the test needs to change or the ``server`` needs to change.
Changing the former is a simple matter of editing the offending line of test code and running the loop again;

.. code-block::

   $ vi +51 client.py
   $ cat client.py
       ..
       word, expect = b'eager', b'eager'
       s.sendall(word)
       reply = s.recv(1024)
       self.test(reply == expect, f"Expected {expect} for {word}, got {reply}")
	   ..
   $ pyinstaller --onefile --log-level ERROR -p . client.py
   $ ansar run --code-path=. --test-run
   role "client-0" (pass/fail): 1/2
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:56 - Expected b'define' for b'explain', got b'explain'
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly'

.. note::

   Use of the ``vi`` command is a means to demonstrate the workflow. As mentioned previously, the editing
   process would normally occur within the local IDE.

A test run now reports that there is no longer an issue with the test of ``eager``. In those cases where
the problem lies in the ``server`` it is similarly easy, though a small phase of setup is required;

.. code-block::

   $ ansar -f snapshot testing
   $ find testing
   testing
   testing/settings-by-role
   testing/settings-by-role/server-0.json
   testing/settings-by-role/client-0.json
   testing/resource-by-executable
   testing/resource-by-executable/server
   testing/resource-by-executable/client
   testing/model-by-role
   testing/model-by-role/server-0
   testing/model-by-role/client-0

The ``snapshot`` command takes a *snapshot* of the current disk storage areas of the default home and arranges them under
the folder name provided. Adding the ``-f`` flag ensures that any active roles are shutdown for the duration of the
snapshot. With a safe and up-to-date image of all the operational files required by the home, it is now possible to modify
that image and then *deploy* it to the home.

The first modification is to provide a database of word mappings. The ``server`` is nice enough to operate in the absence
of that information, but full operation requires the data to be in place. Loading the ``server-0`` role with an initial set
of mappings looks like this;

.. code-block::
   :emphasize-lines: 8, 9, 17, 20, 32

   $ ansar start server-0
   $ cat word-map.json
   {
       "value": [
           ["explain", "define"]
       ]
   }
   $ cp word-map.json testing/model-by-role/server-0
   $ ansar --debug-level=OBJECT -f deploy --storage-path=testing
   23:29:19.507 + <00000008>start_vector - Created by <00000001>
   23:29:19.507 + <00000009>ansar - Created by <00000008>
   23:29:19.508 ^ <00000009>ansar - Detected 1 model changes for "server-0"
   23:29:19.509 ^ <00000009>ansar - Detect status of associated roles (server-0)
   23:29:19.509 + <0000000a>lock_and_hold - Created by <00000009>
   23:29:19.509 X <0000000a>lock_and_hold - Destroyed
   23:29:19.509 < <00000009>ansar - Received Completed from <0000000a>
   23:29:19.509 ^ <00000009>ansar - Stop roles (server-0)
   ..
   23:29:20.768 < <00000009>ansar - Received Completed from <0000000b>
   23:29:20.768 ^ <00000009>ansar - Starting transfer of materials
   23:29:20.768 + <0000000c>FolderTransfer[INITIAL] - Created by <00000009>
   23:29:20.768 < <0000000c>FolderTransfer[INITIAL] - Received Start from <00000009>
   23:29:20.768 + <0000000d>folder_transfer - Created by <0000000c>
   23:29:20.769 ^ <0000000d>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0
   23:29:20.769 ^ <0000000d>folder_transfer - Move 1 aliases to targets
   23:29:20.770 ^ <0000000d>folder_transfer - Clear 0 aliases
   23:29:20.770 X <0000000d>folder_transfer - Destroyed
   23:29:20.770 < <0000000c>FolderTransfer[RUNNING] - Received Completed from <0000000d>
   23:29:20.770 X <0000000c>FolderTransfer[RUNNING] - Destroyed
   23:29:20.770 < <00000009>ansar - Received Completed from <0000000c>
   23:29:20.770 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0"
   23:29:20.770 ^ <00000009>ansar - Restoring 1 stopped roles
   23:29:20.773 + <0000000e>Process[INITIAL] - Created by <00000009>
   23:29:20.773 < <0000000e>Process[INITIAL] - Received Start from <00000009>
   ..
   23:29:20.884 < <00000008>start_vector - Received Completed from <00000009>
   23:29:20.884 X <00000008>start_vector - Destroyed
   $ ansar status
   server-0
   $ ansar run client-0 --code-path=. --test-run
   role "client-0" (pass/fail): 2/1
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly'

There is now a single failed test - "explain" is successfully being mapped to "define". Copying a JSON encoding to
the ``testing/model-by-role/server-0`` folder and running the :ref:`ansar deploy <ansar-command-reference-deploy>` command installs the ``word-map.json``
database within the operational ``server-0``.

For this demonstration the ``server-0`` role was started before the deploy and test run commands. This was to highlight
the awareness of the current operational state and the degree of automation. Internal phases include;

* the detection of changes within the snapshot,
* evaluation of affected roles,
* detection of those affected roles that are also operational,
* terminations as necessary,
* copying of changes,
* and restoring any terminated roles.

Deploying of materials from a snapshot such as ``testing`` into an active home is optimized to the least I/O possible.
Source and destination areas are compared, resulting in a sequence of delta operations. Once any active roles are
terminated the deltas are executed, bringing the home in-sync with the external snapshot.

With all the pieces in their correct places, fixing the last remaining failed test is simple;

.. code-block::

   $ vi +61 testing/model-by-role/server-0/word-map.json
   $ cat testing/model-by-role/server-0/word-map.json
   {
       "value": [
           ["explain", "define"],
           ["fly", "droll"]
       ]
   }
   $ ansar -f deploy --storage-path=testing
   $ ansar run client-0 --code-path=. --test-run
   role "client-0" (pass/fail): 3/0
   $

A Streamlined, Multi-Process Development Loop
=============================================

The previous section was a tour through the commands used for the development of a multi-process solution. It also
introduced a variation in that basic workflow, by starting ``server-0`` and then running the ``test-client`` separately.
This section elaborates on that variation to maximize the benefits of :ref:`ansar deploy <ansar-command-reference-deploy>` and packages the related
commands into a standard makefile.

Start With Nothing
------------------

.. code-block::
   :emphasize-lines: 1

   $ make clean
   rm -f dist/analyzer dist/busy dist/client dist/factorial dist/noop dist/server dist/snooze dist/zombie
   rm -rf testing
   ansar -f destroy

The ``clean`` target deletes build artefacts, the extracted snapshot and lastly, it deletes
the composition of processes from the filesystem. Included in that last operation is a
termination of any lingering processes, i.e. the non-testing, operational roles.

Create The Multi-Process Configuration
--------------------------------------

.. code-block::
   :emphasize-lines: 1

   $ make home
   pyinstaller --onefile --log-level ERROR -p . analyzer.py
   pyinstaller --onefile --log-level ERROR -p . busy.py
   ..
   ansar create
   ansar deploy dist
   ansar add server
   ansar add client test-client
   ansar add zombie
   ansar set retry test-client --encoding-file=client-retry
   ansar snapshot testing

The ``home`` target arranges everything such that roles are ready to be executed. This time around
creation *does not* involve the redirect of the bin folder. Instead, executables are deployed to the home
with the ``ansar deploy dist`` command. Copying executables from one folder to another might seem
like a burden. In practise the runtime overhead does not intrude heavily on the workflow. Copying can
be significantly minimized. Given two folders of executables (a source and a destination) it is possible
to calculate a delta and perform optimal updates. The arrangement also separates the build chain from
the operational home.

Full discussion of build chains, software pipelines and repo management (mono-repo vs poly-repo) is beyond
the scope of this document. Having the 2 approaches available (i.e. ``--redirect-bin`` and ``ansar deploy``)
improves the possibility of integration.

Establish An Operational State
------------------------------

.. code-block::
   :emphasize-lines: 1

   $ make start
   ansar -f start "test-.*" --invert-search

The ``start`` target initiates those roles that are being tested, rather than those roles that
perform the testing. The home is now ready for test runs.

Begin Development - What Needs Doing
------------------------------------

.. code-block::
   :emphasize-lines: 1

   $ make
   ansar --force --debug-level=CONSOLE deploy dist testing
   00:05:01.532 ^ <00000009>ansar - Nothing to deploy
   ansar run "test-.*" --code-path=. --test-run
   role "test-client" (pass/fail): 0/3
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:51 - Expected b'eager' for b'fervent', got b'fervent'
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:56 - Expected b'define' for b'explain', got b'explain'
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly'
   make: *** [Makefile:64: test] Error 1

Omitting the target is a synonym for ``make test``. An ``ansar deploy`` command checks build artefacts
and the file materials under testing, for any necessary updates - the answer in this case is no. It then performs
a test run including only those roles that generate test reports. There are 3 familiar failed tests and
the ``make`` itself terminates with an error, i.e. ``--test-run`` affects the exit code of the ``ansar`` command.

Deploying An Operational File
-----------------------------

.. code-block::
   :emphasize-lines: 1

   $ cp word-map.json testing/model-by-role/server-0
   $ make
   ansar --force --debug-level=CONSOLE deploy dist testing
   00:05:23.002 ^ <00000009>ansar - Detected 1 model changes for "server-0"
   00:05:23.003 ^ <00000009>ansar - Detect status of associated roles (server-0)
   00:05:23.003 ^ <00000009>ansar - Stop roles (server-0)
   00:05:23.003 ^ <00000009>ansar - Poll for termination
   00:05:24.253 ^ <00000009>ansar - Detect status of associated roles (server-0)
   00:05:24.261 ^ <00000009>ansar - Starting transfer of materials
   00:05:24.262 ^ <0000000d>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0
   00:05:24.263 ^ <0000000d>folder_transfer - Move 1 aliases to targets
   00:05:24.264 ^ <0000000d>folder_transfer - Clear 0 aliases
   00:05:24.264 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0"
   00:05:24.264 ^ <00000009>ansar - Restoring 1 stopped roles
   ansar run "test-.*" --code-path=. --test-run
   role "test-client" (pass/fail): 1/2
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:51 - Expected b'eager' for b'fervent', got b'fervent'
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly'
   make: *** [Makefile:64: test] Error 1

Installing the ``word-map.json`` clears one of the failing tests.

Editing Source Code
-------------------

.. code-block::
   :emphasize-lines: 1

   $ vi +51 client.py
   $ make
   pyinstaller --onefile --log-level ERROR -p . client.py
   ansar --force --debug-level=CONSOLE deploy dist testing
   00:07:21.597 ^ <00000009>ansar - Detected 1 changing executables (added 1 associated roles)
   00:07:21.597 ^ <00000009>ansar - Detect status of associated roles (test-client)
   00:07:21.606 ^ <00000009>ansar - Starting transfer of materials
   00:07:21.606 ^ <0000000c>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin
   00:07:21.613 ^ <0000000c>folder_transfer - Move 1 aliases to targets
   00:07:21.614 ^ <0000000c>folder_transfer - Clear 0 aliases
   00:07:21.615 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin"
   ansar run "test-.*" --code-path=. --test-run
   role "test-client" (pass/fail): 2/1
   /home/dennis/gh/multi-processing-is-more-than-forking/client.py:61 - Expected b'droll' for b'fly', got b'fly'
   make: *** [Makefile:64: test] Error 1

Editing the test client clears the second failing test.

Modifying An Operational File
-----------------------------

.. code-block::
   :emphasize-lines: 1

   $ vi testing/model-by-role/server-0/word-map.json 
   $ make
   ansar --force --debug-level=CONSOLE deploy dist testing
   00:08:04.145 ^ <00000009>ansar - Detected 1 model changes for "server-0"
   00:08:04.146 ^ <00000009>ansar - Detect status of associated roles (server-0)
   00:08:04.146 ^ <00000009>ansar - Stop roles (server-0)
   00:08:04.146 ^ <00000009>ansar - Poll for termination
   00:08:05.396 ^ <00000009>ansar - Detect status of associated roles (server-0)
   00:08:05.427 ^ <00000009>ansar - Starting transfer of materials
   00:08:05.428 ^ <0000000d>folder_transfer - File transfer (1 deltas) to /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0
   00:08:05.429 ^ <0000000d>folder_transfer - Move 1 aliases to targets
   00:08:05.430 ^ <0000000d>folder_transfer - Clear 0 aliases
   00:08:05.430 ^ <00000009>ansar - Completed transfer to "/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/model/server-0"
   00:08:05.430 ^ <00000009>ansar - Restoring 1 stopped roles
   ansar run "test-.*" --code-path=. --test-run
   $ make
   ansar --force --debug-level=CONSOLE deploy dist testing
   00:08:26.570 ^ <00000009>ansar - Nothing to deploy
   ansar run "test-.*" --code-path=. --test-run
   $ ansar status -l
   server-0                 <960962> 17.2s
   zombie-0                 <960776> 1m46.3s

Adding another word mapping to ``word-map.json`` clears the last failing test. The ``make`` command is completing without
error, i.e. the ``test-run`` is now producing a zero exit code. Repeating the ``make`` command shows the ``ansar deploy`` command
detecting that nothing significant has changed. Lastly, the :ref:`ansar status <ansar-command-reference-status>` is used to show that ``zombie-0`` has been
left undisturbed - its runtime is significantly longer than the ``server-0`` runtime - through all the changes to code and
application files.

Summary Of The Multi-Process Development Loop
---------------------------------------------

Development activity is reduced to making the necessary changes and running the ``make`` command. This inherently allows
the developer to focus on the problems and potential fixes, without the distraction of correctly propagating changes
on every iteration of the loop. Propagation that can involve multiple source files, building of executables, deployment of
application files and management of operational processes.

Adding Some Workload
====================

Simulation of operational workloads can be difficult. In the case of network servers there is the immediate difficulty
that multiple client processes will be needed.

Ansar supports the creation of multiple processes as a single command. More accurately, this is the creation of
multiple roles for the one common executable.

.. code-block::
   :emphasize-lines: 1

   $ ansar add client 'test-client-{number}' --start=10 --count=40
   $ ansar set retry 'test-client-\d\d' --encoding-file=client-retry
   $ ansar list
   server-0
   test-client
   test-client-10
   test-client-11
   test-client-12
   test-client-13
   test-client-14
   test-client-15
   ..
   test-client-49
   zombie-0
   $ ansar get retry test-client-49
   {
       "value": {
            "first_steps": [
               1.0,
               2.0,
               4.0
           ]
       }
   }

When adding a role, the role name is optional and often omitted, i.e. adding a role for the ``client`` executable results
in the default name ``client-0``. This is because the role name is assumed to be a template to be expanded to the final name.
Templates are permitted to reference the variables ``executable`` and ``number``, where the former expands to the executable
being added and the latter expands to a runtime integer. Internally, the add command performs a loop based on ``start`` and
``count`` variables, which default to 0 and 1 respectively. For each iteration it expands the template passing the current
loop index as ``number`` before populating the home with the new definition. By setting the ``start`` and ``count`` values
on the command line, a single add command can create multiple roles. The default role name is ``{executable}-{number}``.

The :ref:`ansar add <ansar-command-reference-add>` command above adds 40 new roles with names like ``test-client-49``. The ``ansar set`` command caters to
these situations by accepting a search pattern and applying the change to each match. Setting the retry property is probably
more important in this scenario - multiple clients will be clammering for the attention of the lone server.

.. code-block::
   :emphasize-lines: 1

   $ make
   ansar --force --debug-level=CONSOLE deploy dist testing
   02:28:17.017 ^ <00000009>ansar - Nothing to deploy
   ansar --debug-level=DEBUG run "test-.*" --code-path=. --test-run
   02:28:17.317 + <00000008>start_vector - Created by <00000001>
   02:28:17.317 ~ <00000008>start_vector - Executable "/home/dennis/gh/multi-processing-is-more-than-forking/.dev/bin/ansar" as process (964752)
   02:28:17.317 ~ <00000008>start_vector - Working folder "/home/dennis/gh/multi-processing-is-more-than-forking"
   02:28:17.317 ~ <00000008>start_vector - Running object "ansar.command.ansar_command.ansar"
   02:28:17.317 ~ <00000008>start_vector - Class threads (1) "retries" (1)
   02:28:17.317 + <00000009>ansar - Created by <00000008>
   02:28:17.317 ~ <00000009>ansar - Call the sub-command function
   02:28:17.318 ^ <00000009>ansar - Detect status of associated roles (test-client-27, test-client-15, test-client-38, test-client-24, test-client-23, test-client-36, test-client-18, test-client-31, test-client-32, test-client-43, test-client-35, test-client-19, test-client-41, test-client-49, test-client-10, test-client-28, test-client-11, test-client-48, test-client-39, test-client-44, test-client-40, test-client-22, test-client-45, test-client-21, test-client-30, test-client-46, test-client-26, test-client-20, test-client-37, test-client-47, test-client-25, test-client-17, test-client-33, test-client-13, test-client-34, test-client, test-client-12, test-client-16, test-client-42, test-client-14, test-client-29)
   02:28:17.318 + <0000000a>lock_and_hold - Created by <00000009>
   02:28:17.318 > <0000000a>lock_and_hold - Sent Ready to <00000009>
   02:28:17.318 + <0000000b>lock_and_hold - Created by <00000009>
   02:28:17.319 + <0000000c>lock_and_hold - Created by <00000009>
   ..
   02:28:17.372 X <00000014>lock_and_hold - Destroyed
   02:28:17.373 < <00000009>ansar - Received Completed from <00000014>
   02:28:17.373 X <0000000f>lock_and_hold - Destroyed
   02:28:17.373 < <00000009>ansar - Received Completed from <0000000f>
   02:28:17.373 ^ <00000009>ansar - Running "test-.*" (.ansar-home)
   02:28:17.373 + <00000033>Process[INITIAL] - Created by <00000009>
   02:28:17.373 < <00000033>Process[INITIAL] - Received Start from <00000009>
   02:28:17.373 ~ <00000033>Process[INITIAL] - Execute /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin/client --call-signature=o --point-of-origin=1 --home-path=/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home --role-name=test-client-27
   02:28:17.373 ( <00000033>Process[INITIAL] - Started process (964802)
   02:28:17.373 + <00000034>wait - Created by <00000033>
   02:28:17.374 + <00000035>Process[INITIAL] - Created by <00000009>
   02:28:17.374 < <00000035>Process[INITIAL] - Received Start from <00000009>
   02:28:17.374 ~ <00000035>Process[INITIAL] - Execute /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin/client --call-signature=o --point-of-origin=1 --home-path=/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home --role-name=test-client-15
   02:28:17.374 ( <00000035>Process[INITIAL] - Started process (964804)
   02:28:17.374 + <00000036>wait - Created by <00000035>
   02:28:17.374 + <00000037>Process[INITIAL] - Created by <00000009>
   02:28:17.374 < <00000037>Process[INITIAL] - Received Start from <00000009>
   02:28:17.374 ~ <00000037>Process[INITIAL] - Execute /home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home/bin/client --call-signature=o --point-of-origin=1 --home-path=/home/dennis/gh/multi-processing-is-more-than-forking/.ansar-home --role-name=test-client-38
   02:28:17.374 ( <00000037>Process[INITIAL] - Started process (964806)
   ..
   02:28:18.208 < <00000069>Process[EXECUTING] - Received Completed from <00000083>
   02:28:18.208 ) <00000069>Process[EXECUTING] - Process (965116) ended with 0
   02:28:18.208 X <00000069>Process[EXECUTING] - Destroyed
   02:28:18.208 < <00000009>ansar - Received Completed from <00000069>
   02:28:18.208 ^ <00000009>ansar - Completion for "test-client-14" (<ansar.create.test.TestReport object at 0x7fce7c430d00>)
   02:28:18.210 X <00000084>wait - Destroyed
   02:28:18.211 < <0000006a>Process[EXECUTING] - Received Completed from <00000084>
   02:28:18.211 ) <0000006a>Process[EXECUTING] - Process (965123) ended with 0
   02:28:18.211 X <0000006a>Process[EXECUTING] - Destroyed
   02:28:18.211 < <00000009>ansar - Received Completed from <0000006a>
   02:28:18.211 ^ <00000009>ansar - Completion for "test-client-29" (<ansar.create.test.TestReport object at 0x7fce7c433580>)
   role "test-client-15" (pass/fail): 3/0
   role "test-client-35" (pass/fail): 3/0
   role "test-client-19" (pass/fail): 3/0
   role "test-client-24" (pass/fail): 3/0
   role "test-client-43" (pass/fail): 3/0
   role "test-client-31" (pass/fail): 3/0
   role "test-client-36" (pass/fail): 3/0
   role "test-client-27" (pass/fail): 3/0
   role "test-client-23" (pass/fail): 3/0
   role "test-client-41" (pass/fail): 3/0
   role "test-client-10" (pass/fail): 3/0
   role "test-client-38" (pass/fail): 3/0
   ..
   role "test-client" (pass/fail): 3/0
   role "test-client-13" (pass/fail): 3/0
   role "test-client-12" (pass/fail): 3/0
   role "test-client-16" (pass/fail): 3/0
   role "test-client-42" (pass/fail): 3/0
   role "test-client-14" (pass/fail): 3/0
   role "test-client-29" (pass/fail): 3/0
   02:28:18.272 X <00000009>ansar - Destroyed
   02:28:18.272 < <00000008>start_vector - Received Completed from <00000009>
   02:28:18.272 X <00000008>start_vector - Destroyed

The same ``make`` command now runs a much more ambitious test workload. All the test clients completed successfully, passing
their respective ``TestReports`` back to the run command. Those reports indicated that the new test encountered zero problems.
This is the correct expectation given that all test failures had been cleared and every instance of ``test-client-xx`` is
a replication of a test process that was already passing.