.. Command line tool documentation

The Command Interpreter Tool
****************************

This chapter describes the part of ``VDAT`` responsible for the reduction of the
data.

Introduction
============

The main scope of the ``VDAT`` GUI is to allow users to select, visualize and
reduce ``VIRUS`` data. ``VDAT`` relies mostly on ``cure`` to execute the
reduction steps. ``cure`` is a C++ library that provides a number of executable
that operates on single or group of fits files.

For each of the reduction steps, ``VDAT`` must collect the input files and
command line options according to the directories and IFUs selected by the user
and run the appropriate ``cure`` tool.

Although ``cure`` is the main tool to use, some of the steps of the reduction
are not implemented there. We also want to allow users to execute generic
commands without any prior knowledge of the signature and name of the files.

We have solved those requirements designing a command line tool based on these
two building blocks:

1) an interpreter that parse the command string, containing placeholders, and
   execute the command in a loop replacing the placeholders with the correct
   values; we use the standard `python string Template
   <https://docs.python.org/2/library/string.html#template-strings>`_ to define
   placeholders;
2) one or more `yaml <http://yaml.org/>`_ configuration files to instruct the
   interpreter on how to expand the placeholders for any provided commands


.. _interpreter:

The interpreter
===============

The public interface of the interpreter is defined by the constructor of the class
:class:`~vdat.command_interpreter.CommandInterpreter` and its method
:meth:`~vdat.command_interpreter.CommandInterpreter.run`.

Constructor
-----------

The constructor has the following signature

.. autoclass:: vdat.command_interpreter.CommandInterpreter
    :noindex:

1) ``command`` is a string with the command to execute. E.g.::

    subtractfits $args -o $biassec $fits

2) ``command_config``: the relevant part of the parsed ``yaml`` configuration
   file containing the instructions on how to expand
   placeholders like ``args``, ``biassec`` and ``fits`` while running the
   command ``subtractfits``. The part of the configuration file necessary to run
   the above command is

   .. code-block:: yaml

    subtractfits:
        # mandatory keys
        mandatory: [fits, ]

        # primary key: the interpreter collects files according to 
        # the instructions in the `fits` key, loop over them,
        # replace all the placeholders and execute the command
        primary: fits

        # looks for all the files matching the pattern in the `selected_dir`
        fits: '[0-9]*.fits'

        # Get the `BIASSECT` value from the header of every file
        # and from it extract the part within square brackets
        biassec:
            type: header
            keyword: BIASSEC
            extract:
                - \[(.*)\]
                - \1

        args: '-s -a -k 2.8 -t -z'

   Both the GUI and the command line interface inject into the
   ``command_config`` the following keys:

    * ``target_dir``: is the directory selected by the user; in the above
      examples, the ``fits`` files are searched in this directory
    * ``cal_dir``: the reference calibration directory
    * ``zro_dir``: the reference bias directory

   If no directory ``cal`` or ``zro`` has been explicitly selected in the GUI,
   the default ones are added.

   .. warning::

    If any of these entries is already in the configuration file, they will be
    overwritten

3) ``selected``: list of selected items or ``None``, for selecting all. It
   tells the interpreter which of the ``primary`` elements must be run.  E.g.
   the ``VDAT`` GUI passes as ``selected`` the list of IFUs selected by the
   user. The instructions on how to extract the information to match against
   ``selected`` from the files while running the command is defined in the
   :ref:`command configuration file <command_conf>`.

   .. note::
    
      VDAT pass the IFU head mount plate IDs (ihmpid) to the command
      interpreter. This id is a 3 digit number stored in the file headers under
      the IFUSLOT key.

----

In the constructor the following steps are performed:

1) the configuration object is copied and saved in local variables: this
   allows to enqueue multiple commands;
2) validations:
    a) the command executable, e.g. ``subtractfits``, is searched in the path
    b) check that all the mandatory fields are present in the command
    c) check that all the required keywords are present in the configuration
    d) check that all the required keywords are of known type
    e) map all the types to the functions implementing them

The ``run`` method
------------------

Invoking

.. automethod:: vdat.command_interpreter.CommandInterpreter.run
    :noindex:

will:

1) collect all the ``primary`` files 
2) filter them according to the list of selected items
3) loop over the ``primary`` files
4) check whether the step must be executed or not
5) for each step in the look replace the placeholders in the input command
   according to the instructions from the configuration
6) execute the command
7) report execution progress
8) collect and send out execution results

.. _command_conf:

The configuration file
======================

To allow for flexibility and extendability, the instructions on how to expand
keyword comes from one or more configuration files, written using the ``yaml``
standard.

When validating the ``command``, the keywords are extracted and searched in the
configuration. The value of a keyword can be either a string or a dictionary.
If it's a string, like ``'-a -b'``, it is converted into a keyword of type
``plain``: ``{'type': 'plain'; 'value': '-a -b'}``. If it is a dictionary, it
must contain a key ``type``, whose value define the type of the keyword

.. _special_keys:

Special keywords
----------------

These keywords are understood and used by the interpreter, but should not be
used as variables to expand

``is_alias_of``
^^^^^^^^^^^^^^^

If exists, its value is the real name of the executable. This allows to create
various commands using the same underlying executable. If e.g. the command is::

    do_something $args -o $ofile $ifiles

and the configuration file contains

.. code-block:: yaml

    is_alias_of: an_executable
    args: "-a -b"
    ofile: outfile
    ifiles: file[1-9].txt
    primary: ifiles

then the interpreter will loop through all the files matching the ``ifiles``
pattern in ``target_dir``. For the first file, it will execute::

    an_executable -a -b -o outfile file1.txt

``mandatory``
^^^^^^^^^^^^^

List of mandatory fields; field names defined under ``mandatory``  must exist
in the provided command; if not found, or empty, no check is done

.. code-block:: yaml

    mandatory: [ifiles]
    # or equivalently
    mandatory:
        - field1
        - field2

``primary``
^^^^^^^^^^^

Name of the keyword to use as primary. A primary keyword has a special status:
files are collected from the ``target_dir`` according to the type of the
underlying keyword, then they are looped over and for each step the command
string is created and executed. If the value of any other keywords needs to be
built at run time, it will use the ``primary`` files to do it.  ``VDAT`` is
shipped with few :ref:`primary types <primary_types>`.

``filter_selected``
^^^^^^^^^^^^^^^^^^^

Tells the interpreter how to filter the list of primary files. If this option
is not found in the configuration or the ``selected`` keyword in
:class:`~vdat.command_interpreter.core.CommandInterpreter` is ``None``, no
filtering is performed. Otherwise, for each element in the primary list:

* uses the instructions from the value of ``filter_selected`` to extract a
  string
* check if the string is in ``selected``.
  
The value of ``filter_selected`` can be any of the :ref:`keyword types
<keyword_types>` described below.

With the following settings:

.. code-block:: yaml

    # Use the value of the header keyword ``IFUSLOT`` to decide whether to
    # keep the primary field or not
    filter_selected:
        type: header
        keyword: IFUSLOT

the content of the fits header keyword ``IFUSLOT`` is extracted and compared
with the list provided with the ``selected`` options in
:class:`~vdat.command_interpreter.core.CommandInterpreter`

``execute``
^^^^^^^^^^^

For each iteration of the ``primary``, tells the interpreter whether to run or
not the command. If the option is not found, no filtering will be performed.
``VDAT`` is shipped with a few :ref:`execute types <execute_types>`.

The following configuration:

.. code-block:: yaml

    execute:
        type: new_file
        sub_type: format
        value: masterbias_{ica}.fits
        keys:
            ica:
                type: regex
                match: .*\d*?T\d*?_(\d{3}[LR][LU])_.*\.fits
                replace: \1
  
Create the ``value`` extracting the ``ica`` keyword from the primary file name
and returns false if the file already exists.

If the handling of the keyword raises and exception, it is logged and the
command is executed.

.. _primary_types:

Build-in primary keyword types
------------------------------

``plain``
^^^^^^^^^

It looks for all the files matching the give pattern in the target directory. If
the value of a keyword is a string, it is interpreted as of ``plain`` type.
These three definitions are equivalent:

.. code-block:: yaml

    keyword: 20*.fits
    ---
    keyword:
        type: plain
        value: 20*.fits
    ---
    keyword: {type: plain, value: 20*.fits}

``loop``
^^^^^^^^

1) collects the ``keys``
2) cycles through all the possible combinations of the keys
3) for each combination replaces the corresponding entries in ``value`` using
   the standard python `format string syntax
   <https://docs.python.org/3/library/string.html#format-string-syntax>`_
4) look for all the files matching the resulting strings
5) if any file is found, construct a string with space separated file names
    and yields it.

The value of ``keys`` is a map between the names of the keys, e.g. ``ifu``
and the values that they can have. Their value can be either a list or three
comma separated numbers: ``start, stop, step``. The latter case is converted
into a list of numbers from ``start`` to ``stop`` excluded every ``step``

The following configuration:

.. code-block:: yaml

    keyword:
        type: loop
        value: 's[0-9]*{ifu:03d}{channel}{amp}_*.fits'
        keys:   # dictionary of keys to expand in ``value``
            ifu: 1, 100, 1     # start, stop, step values of a slice
            channel: [L, R]    # a list of possible values
            amp:               # alternative syntax for the list
                - L
                - U

cycles through all the possible combinations of the three lists: ``[1, 2, ..,
99]``, ``['L', 'R']`` and ``['L', 'R']``. For the first combination we get:
``ifu``: 1, ``channel``: L, ``amp``: L and ``value`` becomes
``s[0-9]*001LL_*.fit``. Then all the files matching this pattern are
collected.

``groupby``
^^^^^^^^^^^
  
1) collects all the files matching ``value`` and loop through them
2) for each of the files replace ``match`` with all the values in ``replace``
    using the `python regex syntax <https://docs.python.org/3/library/re.html>`_

The following configuration:

.. code-block:: yaml

    keyword:
        type: groupby
        value: 'p[0-9][LR]L_*.fits'
        match: (.*p\d[LR])L(_.*\.fits)
        replace:
            - \1U\2
  
cycles through all the files matching ``value`` in the ``target_dir``, e.g.
"p2LL_sci.fits", and for each of them creates a new file name the last "L" with
"U", e.g. "p2LU_sci.fits". The two files are then returned.

To create multiple files out of the first one, it's enough to provide other
entries to ``replace``. E.g.:

.. code-block:: yaml

    replace: [\1U\2, \1A\2, \2_\1]

will create three new files: "p2LU_sci.fits",  "p2LA_sci.fits" and
"_sci.fits_p2L"

.. _keyword_types:

Build-in keyword types
----------------------

``plain``
^^^^^^^^^

A static string. These three definitions are equivalent:

.. code-block:: yaml

    keyword: '-a -b --long option'
    ---
    keyword:
        type: plain
        value: '-a -b --long option'
    ---
    keyword: {type: plain, value: '-a -b --long option'}


``header``
^^^^^^^^^^

Extract and manipulate a fits header keyword from the primary files. If the
primary is a space-separated list of file names, it uses the first one.
If ``extract`` is present, it uses :func:`re.sub` to replace ``extract[0]`` in
the header value with ``extract[1]``. Assuming that the primary files have a
header keyword ``BIASSEC = [1:32,1:1032]``

.. code-block:: yaml

    keyword:
        type: header
        value: BIASSEC

will extract ``[1:32,1:1032]``, while

.. code-block:: yaml

    keyword:
        type: header
        value: BIASSEC
        extract:
            - \[(.*)\]
            - \1

will extract ``1:32,1:1032``.

``format``
^^^^^^^^^^

Creates a new string `formatting
<https://docs.python.org/3/library/string.html#format-string-syntax>`_
``value`` using the ``keys``. They can be of any type defined in this sections,
except ``format`` to avoid circular recursion. Assuming to have a fits file
called ``file_001_LL.fits``, with a header keyword ``DATE-OBS = 2013-01-01``,
the following configuration instructs the interpreters to extract the ``id`` key, a
three digit number, from the file name and the ``DATE-OBS`` fits header value.
The resulting value is the string ``file_001_2013-01-01.fits``.
If the types for the keys do not exist, a ``CIKeywordTypeError`` will be raised
at run time. If one of the keys has a string as value, it will be interpreted
as of type ``plain``.

.. code-block:: yaml

    keyword:
        type: format
        value: file_{id}_{sec}.fits
        keys:
            id:
                type: regex
                match: .*_(\d{3}).*\.fits
                replace: \1
            date:
                type: header
                value: DATE-OBS

``regex``
^^^^^^^^^

Returns a string obtained from primary replacing ``match`` with ``replace``. It
uses :func:`re.sub` to do the substitution. If e.g. the primary is called
``file_001_LL.fits``, the following entry returns ``L001``

.. code-block:: yaml

    keyword:
        type: regex
        match: .*_(\d{3})([LR]).*\.fits
        replace: \2\1


.. _execute_types:

Build-in execute types
----------------------

``new_file``
^^^^^^^^^^^^

For each of the primary entry, it constructs a string using the keyword
type defined by ``subtype``. If that string corresponds to something
existing in the file system, returns ``False``.

Besides ``type``, ``subtype`` is the only mandatory keyword and its value must
be one of the available keyword types. All the relevant keywords for that type
must of course exist.

.. _plugin_types:

Add new types
=============

To any type, be it primary or not, there is a corresponding function that
implements how to handle it.

All the types are implemented as plugins, `discovered
<https://pythonhosted.org/setuptools/pkg_resources.html#entry-points>`_ and
`dynamically loaded
<https://pythonhosted.org/setuptools/setuptools.html#dynamic-discovery-of-services-and-plugins>`_
at run time.

The command interpreter look for two entry points:

* ``vdat.cit.primary``: for the definition of primary types
* ``vdat.cit.keyword``: for the definition of other types
* ``vdat.cit.execute``: for the definition of types to decide whether to
  execute or not the command

Each entry point is defined as a string like::

    type = package.module:func

where ``type`` is the name of the type and ``func`` is the function handling
the keyword of ``type``; ``func`` is implemented in the ``module`` module of the
package ``package``.

The functions implementing primary and secondary keywords have the following
signature:

.. autofunction:: vdat.command_interpreter.types.primary_template
    :noindex:

.. autofunction:: vdat.command_interpreter.types.keyword_template
    :noindex:

.. autofunction:: vdat.command_interpreter.types.execute_template
    :noindex:


Communication
==============

The command interpreter communicate with the rest of the world through different
channels.

* Upon errors directly handled by the interpreter, one of the errors defined in
  :mod:`vdat.command_interpreter.exceptions` is raised. Please check the
  documentation of :class:`~vdat.command_interpreter.CommandInterpreter` for
  more details.

* During normal execution of the command, the resolved command string, standard
  output, error and any exception raised while executing the code are logged
  to a logger with the name of the executable. In ``VDAT``, these loggers are
  set to write to files located in the directory defined in the ``VDAT``
  configuration file; the name of those files are the executable name with a
  ``.log`` extension. These loggers are set in the main ``VDAT`` code, not in
  the command interpreter sub-package.

* .. automodule:: vdat.command_interpreter.relay
      :noindex:

  The relays ``emit`` method can be replaced by other applications using

  .. autofunction:: vdat.command_interpreter.override_emit
      :noindex: