===============
Task statistics
===============

We want to store meta-data about previous runs of various tasks. Runtime
data of former runs can be used to:

* select a powerful worker when the task is resource hungry (or the
  opposite)
* decide whether a failure is fatal based on the existence of
  previous successful runs
* etc.

Store runtime statistics in ``WorkRequest``
===========================================

The ``WorkRequest`` model gets a new JSON field named ``output_data``. That
field is set upon completion of the work request. The values are provided
by the worker through the view ``UpdateWorkRequestAsCompletedView``.

Among the keys that can be present in that new field, we have the following
standardized values:

* ``runtime_statistics``: see RuntimeStatistics model below.
* ``errors``: a list of errors. Each error is a dictionary with the
  following keys:

  * ``message``: user-friendly error message
  * ``code``: computer-friendly error code

  .. note::

    Typically used to return validation/configuration errors to the user
    that resulted in the task not being run at all. Other additional keys
    might be set depending on the error code.

    This ``errors`` key is not required for the design that we are doing here,
    but it explains why I opted to create an ``output_data`` field instead
    of a ``runtime_statistics`` field. See `#432
    <https://salsa.debian.org/freexian-team/debusine/-/issues/432>`__ or
    `#227 <https://salsa.debian.org/freexian-team/debusine/-/issues/227>`__
    for related issues that we could fix with this new possibility.

.. _runtime-statistics:

RuntimeStatistics model
-----------------------

The model combines runtime data about the task itself:

* ``duration`` (optional, integer): the runtime duration of the task in seconds
* ``cpu_time`` (optional, integer): the amount of CPU time used in seconds
  (combining user and system CPU time)
* ``disk_space`` (optional, integer): the maximum disk space used during
  the task's execution (in bytes)
* ``memory`` (optional, integer): the maximum amount of RAM used during
  the task's execution (in bytes)

But also some data about the worker to help analyze the values and/or to
provide reference data in the case of missing runtime data:

* ``available_disk_space`` (optional, integer): the available disk space
  when the task started (in bytes, may be rounded)
* ``available_memory`` (optional, integer): the amount of RAM that was
  available when the task started (in bytes, may be rounded)
* ``cpu_count`` (optional, integer): the number of CPU cores on the worker that ran
  the task

.. _collection-task-history:

Collection ``debusine:task-history``
====================================

This :ref:`singleton collection <collection-singleton>` helps to find
previous runs of a given task that used similar input parameters and is
expected to have a similar behaviour.

To correctly represent the history of a large number of task runs,
the bare data item always has the following fields:

* ``task_type`` (required): the ``task_type`` of the work request for
  which we want to keep statistics
* ``task_name`` (required): the ``task_name`` of the work request for
  which we want to keep statistics
* ``subject`` (optional): an abstract string value representing the
  *subject* of the task. It is meant to group possible inputs into
  groups that we expect to behave similarly.
* ``context`` (optional): an abstract string value representing the
  *runtime context* in which the task is executed. It is meant to represent
  some of the task parameters that can significantly alter the runtime
  behaviour of the task.
* ``work_request_id`` (required): the ID of the WorkRequest corresponding to
  the monitored task
* ``result`` (required): duplicates the string value of the result field of
  the associated WorkRequest

For example, for the ``sbuild`` task, ``subject`` would typically be
the source package name while ``context`` would be the name of the target
suite and the target architecture.

The subject and runtime context are computed dynamically by the task's
``compute_dynamic_data()`` method and thus stored in the corresponding
field.

The name of each item is
``TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT:WORK_REQUEST_ID``.

Other collection-specific characteristics:

* Data:

  * ``old_items_to_keep``: number of old items to keep. Defaults to 5.
    For each subject/context combination, the collection always keeps the
    last success, the last failure, and a given number of most recent
    entries. The cleanup is automatically done when adding new items.

    .. note::

        At some point, we may need more advanced logic than this, for
        instance to clean up statistics about packages that are gone
        from the corresponding suite.

* Valid items:

  * ``debusine:historical-task-run`` bare data

* Lookup names:

  * ``last-entry:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT`` returns the most
    recently added entry for the specific combination of
    task/subject/context.
  * ``last-success:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT`` returns the most
    recently added entry where ``result`` is ``success`` for the specific
    combination of task/subject/context.
  * ``last-failure:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT`` returns the most
    recently added entry where ``result`` is ``failure`` or ``error`` for
    the specific combination of task/subject/context.

* Multiple lookup filters:

  * ``same_work_request``: given a :ref:`lookup-multiple`, return conditions
    matching task runs for the same work request as any of the resulting
    artifacts
  * ``same_workflow``: given a :ref:`lookup-multiple`, return conditions
    matching task runs for work requests from the same workflow as any of
    the resulting artifacts

* Constraints:

  * None.

Bare data item: ``debusine:historical-task-run``
------------------------------------------------

On top of the mandatory classification fields documented above, the
following fields are defined in the data item:

* ``timestamp`` (required): the date and time (as a Unix timestamp — cf
  ``date +%s``) of the task run
* ``runtime_statistics`` (required): duplicates the value of the
  ``runtime_statistics`` key in the ``output_data`` dictionary of the
  associated WorkRequest

Example data::

    timestamp: 1722692645
    work_request_id: 12
    result: success
    runtime_statistics:
        duration: 6230
        cpu_time: 4300
        disk_space: 14780131
        memory: 344891034
        available_disk_space: 12208271360
        available_memory: 32839598080
        cpu_count: 4

New action ``record-in-task-history``
=====================================

This action is meant to be used as an event reaction to store the current
task run in a ``debusine:task-history`` collection. The following fields
are supported:

* ``collection`` (:ref:`lookup-single`, required):
  ``debusine:task-history`` collection to update
* ``subject`` (optional, defaults to value stored in dynamic_data): the
  subject string used to record the statistics
* ``context`` (optional, defaults to value stored in dynamic_data): the
  *runtime context* string used to record the statistics

When the action is executed, it simply adds a new entry to the collection.

.. note::

   This action is not meant to be manually added on each work request.
   Instead it should be automatically executed upon completion of each
   work request provided that the target collection has been set in the
   new ``task_history`` task_data field.

Open question: how and where to use the statistics
==================================================

In theory, the statistics might only be available when the task becomes
pending when we have the final result for ``compute_dynamic_data()`` and
the guarantee to have values for subject/context.

If we want to use those statistics to tweak the configuration of the work
request (i.e. adding new worker requirements), then it needs some careful
coordination between the scheduler and the workflow.

In practice, many workflows will know the subject/context values by
advance and can possibly configure the work request at creation time.

Implementation plan
===================

* Add a new optional ``task_history`` :ref:`lookup-single` field in
  BaseTaskData.

* Add a new ``get_event_reactions(event_name)`` method on the ``BaseTask``
  class that returns a list of actions. By default that list should
  contain the new ``record-in-task-history`` action configured with the
  collection passed in the ``task_data.task_history`` field. If the
  collection is not set, then the returned list is empty.

* Tweak ``WorkRequest.get_triggered_actions()`` to combine the event
  reactions provided by the task implementation with the event reactions
  explicitly configured when creating the task.
