The Command Interpreter Tool

This chapter describes the part of VDAT responsible for the reduction of the data.

Introduction

The main scope of the VDAT GUI is to allow users to select, visualize and reduce VIRUS data. VDAT relies mostly on cure to execute the reduction steps. cure is a C++ library that provides a number of executable that operates on single or group of fits files.

For each of the reduction steps, VDAT must collect the input files and command line options according to the directories and IFUs selected by the user and run the appropriate cure tool.

Although cure is the main tool to use, some of the steps of the reduction are not implemented there. We also want to allow users to execute generic commands without any prior knowledge of the signature and name of the files.

We have solved those requirements designing a command line tool based on these two building blocks:

  1. an interpreter that parse the command string, containing placeholders, and execute the command in a loop replacing the placeholders with the correct values; we use the standard python string Template to define placeholders;
  2. one or more yaml configuration files to instruct the interpreter on how to expand the placeholders for any provided commands

The interpreter

The public interface of the interpreter is defined by the constructor of the class CommandInterpreter and its method run().

Constructor

The constructor has the following signature

class vdat.command_interpreter.CommandInterpreter(command, command_config, selected=None)

Interpret and execute the command.

See The interpreter section in the documentation for more details

All the custom errors are defined in vdat.command_interpreter.exceptions. The ones raised in the constructor are derived from from CIValidationError,

Parameters:

command : string

command to parse

command_config : dict

dictionary containing the instructions to execute the command. A deep copy is executed

selected : list-like, optional

None or a list of the selected items; if None no filtering of the primary files is done; otherwise must be an object supporting the membership test operator in.

Raises:

CINoExeError

if the executable does not exists

CIParseError

if there is some error when extracting the keywords

CIKeywordError

for missing keywords or for keywords of wrong type

CIKeywordTypeError

if the type of the keywords is not among the known ones

  1. command is a string with the command to execute. E.g.:

    subtractfits $args -o $biassec $fits
    
  2. command_config: the relevant part of the parsed yaml configuration file containing the instructions on how to expand placeholders like args, biassec and fits while running the command subtractfits. The part of the configuration file necessary to run the above command is

    subtractfits:
        # mandatory keys
        mandatory: [fits, ]
    
        # primary key: the interpreter collects files according to
        # the instructions in the `fits` key, loop over them,
        # replace all the placeholders and execute the command
        primary: fits
    
        # looks for all the files matching the pattern in the `selected_dir`
        fits: '[0-9]*.fits'
    
        # Get the `BIASSECT` value from the header of every file
        # and from it extract the part within square brackets
        biassec:
            type: header
            keyword: BIASSEC
            extract:
                - \[(.*)\]
                - \1
    
        args: '-s -a -k 2.8 -t -z'
    

    Both the GUI and the command line interface inject into the command_config the following keys:

    • target_dir: is the directory selected by the user; in the above examples, the fits files are searched in this directory
    • cal_dir: the reference calibration directory
    • zro_dir: the reference bias directory

    If no directory cal or zro has been explicitly selected in the GUI, the default ones are added.

    Warning

    If any of these entries is already in the configuration file, they will be overwritten

  3. selected: list of selected items or None, for selecting all. It tells the interpreter which of the primary elements must be run. E.g. the VDAT GUI passes as selected the list of IFUs selected by the user. The instructions on how to extract the information to match against selected from the files while running the command is defined in the command configuration file.

    Note

    VDAT pass the IFU head mount plate IDs (ihmpid) to the command interpreter. This id is a 3 digit number stored in the file headers under the IFUSLOT key.


In the constructor the following steps are performed:

  1. the configuration object is copied and saved in local variables: this allows to enqueue multiple commands;

  2. validations:
    1. the command executable, e.g. subtractfits, is searched in the path
    2. check that all the mandatory fields are present in the command
    3. check that all the required keywords are present in the configuration
    4. check that all the required keywords are of known type
    5. map all the types to the functions implementing them

The run method

Invoking

CommandInterpreter.run()

Collect the files, expand and run the required command

All the custom errors raised here derive from CIRunError.

Raises:

CICommandFmtError

if the substitution of the command doesn’t work

will:

  1. collect all the primary files
  2. filter them according to the list of selected items
  3. loop over the primary files
  4. check whether the step must be executed or not
  5. for each step in the look replace the placeholders in the input command according to the instructions from the configuration
  6. execute the command
  7. report execution progress
  8. collect and send out execution results

The configuration file

To allow for flexibility and extendability, the instructions on how to expand keyword comes from one or more configuration files, written using the yaml standard.

When validating the command, the keywords are extracted and searched in the configuration. The value of a keyword can be either a string or a dictionary. If it’s a string, like '-a -b', it is converted into a keyword of type plain: {'type': 'plain'; 'value': '-a -b'}. If it is a dictionary, it must contain a key type, whose value define the type of the keyword

Special keywords

These keywords are understood and used by the interpreter, but should not be used as variables to expand

is_alias_of

If exists, its value is the real name of the executable. This allows to create various commands using the same underlying executable. If e.g. the command is:

do_something $args -o $ofile $ifiles

and the configuration file contains

is_alias_of: an_executable
args: "-a -b"
ofile: outfile
ifiles: file[1-9].txt
primary: ifiles

then the interpreter will loop through all the files matching the ifiles pattern in target_dir. For the first file, it will execute:

an_executable -a -b -o outfile file1.txt

mandatory

List of mandatory fields; field names defined under mandatory must exist in the provided command; if not found, or empty, no check is done

mandatory: [ifiles]
# or equivalently
mandatory:
    - field1
    - field2

primary

Name of the keyword to use as primary. A primary keyword has a special status: files are collected from the target_dir according to the type of the underlying keyword, then they are looped over and for each step the command string is created and executed. If the value of any other keywords needs to be built at run time, it will use the primary files to do it. VDAT is shipped with few primary types.

filter_selected

Tells the interpreter how to filter the list of primary files. If this option is not found in the configuration or the selected keyword in CommandInterpreter is None, no filtering is performed. Otherwise, for each element in the primary list:

  • uses the instructions from the value of filter_selected to extract a string
  • check if the string is in selected.

The value of filter_selected can be any of the keyword types described below.

With the following settings:

# Use the value of the header keyword ``IFUSLOT`` to decide whether to
# keep the primary field or not
filter_selected:
    type: header
    keyword: IFUSLOT

the content of the fits header keyword IFUSLOT is extracted and compared with the list provided with the selected options in CommandInterpreter

execute

For each iteration of the primary, tells the interpreter whether to run or not the command. If the option is not found, no filtering will be performed. VDAT is shipped with a few execute types.

The following configuration:

execute:
    type: new_file
    sub_type: format
    value: masterbias_{ica}.fits
    keys:
        ica:
            type: regex
            match: .*\d*?T\d*?_(\d{3}[LR][LU])_.*\.fits
            replace: \1

Create the value extracting the ica keyword from the primary file name and returns false if the file already exists.

If the handling of the keyword raises and exception, it is logged and the command is executed.

Build-in primary keyword types

plain

It looks for all the files matching the give pattern in the target directory. If the value of a keyword is a string, it is interpreted as of plain type. These three definitions are equivalent:

keyword: 20*.fits
---
keyword:
    type: plain
    value: 20*.fits
---
keyword: {type: plain, value: 20*.fits}

loop

  1. collects the keys

  2. cycles through all the possible combinations of the keys

  3. for each combination replaces the corresponding entries in value using the standard python format string syntax

  4. look for all the files matching the resulting strings

  5. if any file is found, construct a string with space separated file names

    and yields it.

The value of keys is a map between the names of the keys, e.g. ifu and the values that they can have. Their value can be either a list or three comma separated numbers: start, stop, step. The latter case is converted into a list of numbers from start to stop excluded every step

The following configuration:

keyword:
    type: loop
    value: 's[0-9]*{ifu:03d}{channel}{amp}_*.fits'
    keys:   # dictionary of keys to expand in ``value``
        ifu: 1, 100, 1     # start, stop, step values of a slice
        channel: [L, R]    # a list of possible values
        amp:               # alternative syntax for the list
            - L
            - U

cycles through all the possible combinations of the three lists: [1, 2, .., 99], ['L', 'R'] and ['L', 'R']. For the first combination we get: ifu: 1, channel: L, amp: L and value becomes s[0-9]*001LL_*.fit. Then all the files matching this pattern are collected.

groupby

  1. collects all the files matching value and loop through them

  2. for each of the files replace match with all the values in replace

    using the python regex syntax

The following configuration:

keyword:
    type: groupby
    value: 'p[0-9][LR]L_*.fits'
    match: (.*p\d[LR])L(_.*\.fits)
    replace:
        - \1U\2

cycles through all the files matching value in the target_dir, e.g. “p2LL_sci.fits”, and for each of them creates a new file name the last “L” with “U”, e.g. “p2LU_sci.fits”. The two files are then returned.

To create multiple files out of the first one, it’s enough to provide other entries to replace. E.g.:

replace: [\1U\2, \1A\2, \2_\1]

will create three new files: “p2LU_sci.fits”, “p2LA_sci.fits” and “_sci.fits_p2L”

Build-in keyword types

plain

A static string. These three definitions are equivalent:

keyword: '-a -b --long option'
---
keyword:
    type: plain
    value: '-a -b --long option'
---
keyword: {type: plain, value: '-a -b --long option'}

format

Creates a new string formatting value using the keys. They can be of any type defined in this sections, except format to avoid circular recursion. Assuming to have a fits file called file_001_LL.fits, with a header keyword DATE-OBS = 2013-01-01, the following configuration instructs the interpreters to extract the id key, a three digit number, from the file name and the DATE-OBS fits header value. The resulting value is the string file_001_2013-01-01.fits. If the types for the keys do not exist, a CIKeywordTypeError will be raised at run time. If one of the keys has a string as value, it will be interpreted as of type plain.

keyword:
    type: format
    value: file_{id}_{sec}.fits
    keys:
        id:
            type: regex
            match: .*_(\d{3}).*\.fits
            replace: \1
        date:
            type: header
            value: DATE-OBS

regex

Returns a string obtained from primary replacing match with replace. It uses re.sub() to do the substitution. If e.g. the primary is called file_001_LL.fits, the following entry returns L001

keyword:
    type: regex
    match: .*_(\d{3})([LR]).*\.fits
    replace: \2\1

Build-in execute types

new_file

For each of the primary entry, it constructs a string using the keyword type defined by subtype. If that string corresponds to something existing in the file system, returns False.

Besides type, subtype is the only mandatory keyword and its value must be one of the available keyword types. All the relevant keywords for that type must of course exist.

Add new types

To any type, be it primary or not, there is a corresponding function that implements how to handle it.

All the types are implemented as plugins, discovered and dynamically loaded at run time.

The command interpreter look for two entry points:

  • vdat.cit.primary: for the definition of primary types
  • vdat.cit.keyword: for the definition of other types
  • vdat.cit.execute: for the definition of types to decide whether to execute or not the command

Each entry point is defined as a string like:

type = package.module:func

where type is the name of the type and func is the function handling the keyword of type; func is implemented in the module module of the package package.

The functions implementing primary and secondary keywords have the following signature:

vdat.command_interpreter.types.primary_template(target_dir, key_val)[source]

Template for a function that deals with a primary keyword.

It collects the files from the target_dir according to the instructions in key_val, if any and either yield a value or return an iterable.

Parameters:

target_dir : string

directory in which the files must be collected

key_val : dictionary

configuration for the key handle

Returns:

yield a string or iterable of strings

Raises:

CIPrimaryError

if something goes wrong when handling the primary key

vdat.command_interpreter.types.keyword_template(primary, key_val)[source]

Template for a function that deals with a non-primary keyword.

A keyword has a value either statically stored in key_val or its value need to be extracted from the value of the primary file(s).

Parameters:

primary : string

the value of one of the items returned by primary_template()

key_val : dictionary

configuration for the key handle

Returns:

string

value to associate to the keyword

Raises:

CIKeywordError

if something goes wrong when handling the key

vdat.command_interpreter.types.execute_template(primary, config)[source]

For each of the primary entry, this function is called to decide whether to execute or skip the command.

Parameters:

primary : string

the value of one of the items returned by primary_template()

config : dictionary

configuration for the command

Returns:

bool

True: the command is executed; False: the command is skipped

Communication

The command interpreter communicate with the rest of the world through different channels.

  • Upon errors directly handled by the interpreter, one of the errors defined in vdat.command_interpreter.exceptions is raised. Please check the documentation of CommandInterpreter for more details.

  • During normal execution of the command, the resolved command string, standard output, error and any exception raised while executing the code are logged to a logger with the name of the executable. In VDAT, these loggers are set to write to files located in the directory defined in the VDAT configuration file; the name of those files are the executable name with a .log extension. These loggers are set in the main VDAT code, not in the command interpreter sub-package.

  • Except for the logging mechanism, the CommandInterpreter uses relays-like objects to communicate with the external word. This module defines a few classes with an emit method, that mimic PyQt signals

    The names of the emit method arguments are the type of the parameter followed by an underscore and optionally by an explanatory name.

    Available relays:

    • command_string: accept an int and a string

      _CommandString.emit(int_, string_)[source]

      Default implementation: print the string_ for when int_ == 0

    • progress: accept three numbers, the total expected number, the number of successes and of failures;

      _Progress.emit(int_tot, int_done, int_fail)[source]

      Default implementation: Print the percentages of finished, successful and failed jobs, overwriting the line

    • logger: accept a two strings

      _Logger.emit(int_level, string_msg)[source]

      Default implementation: print int_level: string_msg

      The int_level values are chosen to be the standard logging levels and are converted to strings accordingly

      The idea behind this relay is to bind it to the main logger if needed. We decided not to do this directly as we don’t know the name of the logger. We implement it in this way to have a more coherent interface.

    The relays emit method can be replaced by other applications using

    vdat.command_interpreter.override_emit(name, new_emit)

    Replace the emit method in the relay instance called name with new_emit. The original class is not modified.

    Parameters:

    name : string

    name of the relay

    new_emit : callable

    function to use to replace the default emit method. This function must have at least one argument, self and it’s signature must match the original one to avoid errors at runtime