.. _collections:

===========
Collections
===========

Collections are abstract aggregates of artifacts. To be able to make
meaningful use of the system, they need to be assigned categories, each of
which has some additional key-value data.

Some additional key-value data is associated with each item in a collection.
The structure of that data also depends on the category used for the
collection.

Items in collections may be looked up using various names, depending on the
category. These names are analogous to URL routing in web applications (and
indeed could be used by debusine's URL routing, as well as when inspecting
the collection directly): a name resolves to at most one item at a time, and
an item may be accessible via more than one name.  The existence of multiple
"lookup names" that resolve to an item does not imply duplicates of that
item or any associated artifacts.

Data and per-item data key names are used in ``pydantic`` models, and must
therefore be valid Python identifiers.

.. _collection-archive:

Category ``debian:archive``
===========================

This collection represents a `Debian archive (a.k.a. repository)
<https://wiki.debian.org/DebianRepository/Format>`_.

* Data:

  * ``may_reuse_versions``: if true, versions of packages in this archive
    may be reused provided that the previous packages with that version have
    been removed; this should be false for typical user-facing archives to
    avoid confusing behaviour from apt, but it may be useful to set it to
    true for experimental archives

* Valid items:

  * ``debian:suite`` collections

* Per-item data: none

* Lookup names:

  * ``name:NAME``: the suite whose ``name`` property is ``NAME``
  * ``source-version:NAME_VERSION``: the source package named ``NAME`` at
    ``VERSION``.
  * ``binary-version:NAME_VERSION_ARCH``: the set of binary packages on
    ``ARCH`` whose ``srcpkg_name`` property is ``NAME`` and whose
    ``version`` property is ``VERSION``.

* Constraints:

  * there may be at most one package with a given name and version (and
    architecture, in the case of binary packages) active in the collection
    at a given time, although the same package may be in multiple suites
  * each poolified file name resulting from an active artifact may only
    refer to at most one concrete file in the collection at a given time
    (this differs from the above constraint in the case of source packages,
    which contain multiple files that may overlap with other source
    packages)
  * if ``may_reuse_versions`` is false, then each poolified file name in the
    collection may only refer to at most one concrete file, regardless of
    whether conflicting files are active or removed

.. _collection-suite:

Category ``debian:suite``
=========================

This collection represents a single `suite
<https://wiki.debian.org/DebianRepository/Format#Suite>`_ in a Debian
archive. Its ``name`` is the name of the suite.

* Data:

  * ``release_fields``: dictionary of static fields to set in this suite's
    ``Release`` file
  * ``may_reuse_versions``: if true, versions of packages in this suite may
    be reused provided that the previous packages with that version have
    been removed; this should be false for typical user-facing suites to
    avoid confusing behaviour from apt, but it may be useful to set it to
    true for experimental suites

* Valid items:

  * ``debian:source-package`` artifacts
  * ``debian:binary-package`` artifacts

* Per-item data:

  * ``srcpkg_name``: for binary packages, the name of the corresponding
    source package (copied from underlying artifact for ease of lookup and
    to preserve history)
  * ``srcpkg_version``: for binary packages, the version of the
    corresponding source package (copied from underlying artifact for ease
    of lookup and to preserve history)
  * ``package``: the name from the package's ``Package:`` field (copied from
    underlying artifact for ease of lookup and to preserve history)
  * ``version``: the version of the package (copied from underlying artifact
    for ease of lookup and to preserve history)
  * ``architecture``: for binary packages, the architecture of the package
    (copied from underlying artifact for ease of lookup and to preserve
    history)
  * ``component``: the component (e.g. ``main`` or ``non-free``) in which
    this package is published
  * ``section``: the section (e.g. ``python``) for this package
  * ``priority``: for binary packages, the priority (e.g. ``optional``) for
    this package

* Lookup names:

  * ``source:NAME``: the current version of the source package named
    ``NAME``.
  * ``source-version:NAME_VERSION``: the source package named ``NAME`` at
    ``VERSION``.
  * ``binary:NAME_ARCH`` the current version of the binary package named
    ``NAME`` on ``ARCH``.
  * ``binary-version:NAME_VERSION_ARCH`` the binary package named ``NAME``
    at ``VERSION`` on ``ARCH``.

* Constraints:

  * there may be at most one package with a given name and version (and
    architecture, in the case of binary packages) active in the collection
    at a given time
  * each poolified file name resulting from an active artifact may only
    refer to at most one concrete file in the collection at a given time
    (this differs from the above constraint in the case of source packages,
    which contain multiple files that may overlap with other source
    packages)
  * if ``may_reuse_versions`` is false, then each poolified file name in the
    collection may only refer to at most one concrete file, regardless of
    whether conflicting files are active or removed

.. _collection-environments:

Category ``debian:environments``
================================

.. todo::

   The definition of this category is not yet fully agreed.  We'll revisit
   it when we're closer to being able to try out an implementation so that
   we can see how the lookup mechanisms will work.

This collection represents a group of :ref:`debian:system-tarball
<artifact-system-tarball>` and/or :ref:`debian:system-image
<artifact-system-image>` artifacts, such as the tarballs used by build
daemons across each suite and architecture.

In the short term, there will be one ``debian:environments`` collection per
distribution vendor with the collection name set to the name of the vendor
(e.g. "debian"), so that it can be looked up by the vendor's name.  This is
subject to change.

* Data: none

* Valid items:

  * ``debian:system-tarball`` artifacts
  * ``debian:system-image`` artifacts

* Per-item data:

  * ``codename``: codename of the distribution version (copied from
    underlying artifact for ease of lookup and to preserve history, but may
    be overridden to reuse the same tarball for another distribution
    version)
  * ``arch``: architecture name (copied from underlying artifact for ease of
    lookup and to preserve history)
  * ``variant``: an optional identifier indicating what kind of tarball or
    image this is; for example, an image optimized for use with autopkgtest
    might have its variant set to "autopkgtest"

* Lookup names:

  * ``tarball:CODENAME:ARCH``: the current system tarball for codename
    ``CODENAME`` and architecture ``ARCH``
  * ``tarball:CODENAME:ARCH:VARIANT``: the current system tarball for
    codename ``CODENAME``, architecture ``ARCH``, and variant ``VARIANT``
  * ``image:CODENAME:ARCH``: the current system image for codename
    ``CODENAME`` and architecture ``ARCH``
  * ``image:CODENAME:ARCH:VARIANT``: the current system image for codename
    ``CODENAME``, architecture ``ARCH``, and variant ``VARIANT``

* Constraints:

  * there may be at most one active tarball or image respectively with a
    given vendor, codename, variant and architecture at a given time
