wiki:HetProcedures/RA/virus

Version 193 (modified by stevenj, 4 years ago) (diff)

--

VIRUS Operating Procedures

The VIRUS instrument is controlled from mcs which issues commands to vdas.

Startup

Starting the Servers

The camra_server and data transfer (called pivot) should be running at all times on vdas, thus no action from the RA should be required.

If you want to check on these servers try:

<user>@vdas ~]$ chksys
hetdex    8817     1  3 19:27 ?        00:00:02 /home/hetdex/code/het/trunk/camra/camra_server -c /home/hetdex/code/het/trunk/camra/testing/vdas-het.conf
hetdex    8847     1  0 19:27 ?        00:00:00 python /home/hetdex/code/het/trunk/camra/testing/vdas_monitor_temperatures.py
hetdex    8867  8407  0 19:28 pts/2    00:00:00 grep -i camra 

If you need to restart the virus server or pivot server use the RAlauncher under Operations and Camera Servers.

Setting up monitoring

It is advisable to watch the virus_server.log located on the vdas machine in /var/log/tcs_logs/virus/virus_server.log. This can be done with

tail -f /var/log/tcs_logs/virus/virus_server.log

It is useful to pull some logging output to the RA console to be able to follow what is going on with the instruments. The easiest way is using jrf's monitor command:

On another terminal, start a monitor that shows tcs and pfip activity (note the different filter, we're excluding debug messages here):

monitor --verbose -T -P --key-filter 'log_[^d].*' --log-print

At the start of RA OPS the RA should turn off the V309_IONPumpGC1 0.15999 GC1 -0.00001 GC2 -0.15999 GC2 0.00000 WFS1 0.00002 WFS1 0.15997 WFS2 -0.00001 WFS2 -0.16000

on the APC gui.

Notes on File numbering and engineering

  • Science and science calibrations should use numbers below 1000.
  • Test exposures by the night staff should use 6xxx.
  • VIRUS Defoccused flats (VDF) engineering should use numbers from 7000 - 7099.
  • VIRUS Alignment Characterization data sets should use numbers from 7100 - 7199.
  • PJM test data should use 8000 - 8099.
  • Afternoon RA OPS work should use the range 9900 - 9999.

Calibrations

Calibration Script

The basic set of cals for VIRUS is:

  • Sometime during the night take Cd-A, Hg, LDLS, darks, and biases:

cal virus -l ldls_long cd-a hg -B - about 18 minutes

  • To get LRS2 and VIRUS biases in parallel use

cal virus -pb -b 11 - about 9 minutes

  • To get LRS2 and VIRUS darks in parallel use

cal virus -pd -de 360 -d 3 - about 22 minutes

NB

  • in case of any problem try to get hardware status of instruments.

Calling the script with --help produces more information:

cal virus --help

The list of lamp combination is spelled out in a table at the beginning of the script. That table also sets the exposure times and warmup times for the lamps.

Use the --dry-run option to see what the script would do given some options without actually executing any commands on the hardware.

Stopping Calibrations

If you wish to stop the calibrations elegantly you should use the command sc There is a help for sc with the command sc -h. If you want to stop cal.py in an exposure immediately use the command sc -i.

Special calibrations

  1. VIRUS line (Cd and Hg) lamps during night
    • during night can be acquired by gcals
    • at the end of the night by eon_tasks
  1. Special VIRUS cals for Karl

Twilight

See Overview of start of night procedures

Taking Science Exposures

NB. Take science exposures ONLY if 20 or more IFU units are online

NB. For HETDEX targets: in the seeing range of 2.0-2.5 arcseconds please increase exposure times but over 2.5 arcseconds please do not observe.

It is important to tell lrs2 that it is going to have control of the PFIP shutter. This is done by executing the following on lrs2 (logged in as yourself):

syscmd -V -v 'enable_shutter()' #!bash There are many ways to take an exposure, the simplest uses Python to issue the expose command.

vlexp -i virus -texp 20 -pobj Test -B

The command will not return until the shutter closes. at which point in the example above you'll hear sound.

or

hetdex-dither -e 360 17 GOODSN_west which would also be done through vlexp with vlexp -i virus -dither -np -texp 360 -pobj GOODSN_west -B

There are two positional arguments: the exposure number and the object name. If the -e exposure time flag is excluded it defaults to 360 seconds. The exposure command will return once the readout has started. The output FITS files will be available 40-50 seconds after that, depending on binning and disk I/O. The return string of the expose command provides the output directory for that exposure.

The exposure time is in the "seconds" parameter (minimum of 15).

In rare cases, mostly for engineering purposes, you may have to manually set the dither mechanism to a specific position. Here iare teh command to do that (the last one show how to query the position):

syscmd -v -P 'AdjustDither(pos=1)'
syscmd -v -P 'AdjustDither(pos=2)'
syscmd -v -P 'AdjustDither(pos=3)'
syscmd -v -P 'GetACQSystemInfo()'

Observation and exposure determine the directories under which the data are grouped. An example output directory structure might be

/hetdata/data/NWD/virus/virus0000023/exp01/
                               /exp02/

So here two exposures are grouped under the observation number 23 taken on 20160309.

The combination of observation/exposure must be unique for the UTC date. The agreed upon types are "sci", "cmp", "zro", "flt", "drk", "tst".

To stop and readout an exposure use the command syscmd -V 'abort_exposure()'

Doing Parallel Science with VIRUS as primary

To get help on this command try vlexp --help for an example with the virus as primary

vlexp -i virus -t sci -texp 360 -pobj HF35_W

NOTE: This is a little non-standand and we generally never have VIRUS with primary.

To stop and readout both the LRS2 and VIRUS do syscmd -l 'abort_exposure()' and syscmd -V 'abort_exposure()' quickly back to back.

Standard vlexp behavior is to run parallel exposures if exposure time more then 300 seconds To disable that use keyword "-np"

VIRUS Health Check

The VIRUS Health Check (vhc) can be run on vdas as astronomer on data located at /hetdata/data/. When run it performs a series of checks on the VIRUS frames and generates a summary html file. The latest html file is always located in the directory where vhc is run (usually /home/mcs/astronomer/vhcrun). The older html summary files are located observation directory, e.g. /hetdata/data/20170415/virus/virus0000001.

A readme.txt can be found in the /home/mcs/astronomer/vhcrun directory. The full set of help pages can be found at https://vhc.readthedocs.io/en/latest/

PIVOT currently runs vhc for each virus exposure that is executed and saved in the /hetdata/data/ directory. You can point your browser to any of these summary html files either browsing for it using cntrl-O in firefox or by typing in the file location directly e.g. file:///hetdata/data/20170513/virus/virus0000005/flat_recap_0.html.

To run vhc manually you can follow the following procedure for running this is:

  • Login in as astronomer to vdas: ssh -Y astronomer@vdas
  • cd vhcrun
  • Example call: vhc -c vhc_settings.cfg -f fplane.txt /hetdata/data/20170415/virus/virus0000001
  • To view the result, open vhc_current.html via: firefox /home/mcs/astronomer/vhcrun/vhc_current.html &
  • leave the browser open and simply reload the same vhc_current.html after each time you run the check.

To get a nice summary of the errors try viruserr

Example: viruserr This will show vhc failed tests for current utdate latest observation

For specific utdate and observation use command viruserr -utdate YYYYMMDD -obs xxxxxxx

VIRUS Quicklook with Remedy

The code Remedy is a python script that can be run on mcs or vdas to reduce a single visit and turn a single ifu into an image. This can be useful for looking to see if an object is centered properly on the IFU and it is used with the Gravity Wave experiment to look for the optical transient (OT).

The source code can be downloaded from: https://github.com/grzeimann/Remedy

and it is currently living in: /home/mcs/astronomer/bin/Remedy/

As the documentation explains, you need an fplane file and a *.h5 calibrations file (in Remedy/CALS/).

-- Fplane is sym-linked to /opt/het/hetdex/etc/Virus/vhc_config/fplane/fplane.txt (but later is called explicitly, so this doesn't matter)

-- The latest *h5 cal file is from TACC: /work/03730/gregz/maverick/test_cal_20190412.h5

When I run it as astronomer on mcs for a particular IFU, this is the command: (note that this will produce output files where ever you run it)

python ~/bin/Remedy/quick_reduction.py 20190412 18 47 ~/bin/Remedy/CALS/test_cal_20190412.h5 --sky_ifuslot 46 --fplane_file /opt/het/hetdex/etc/Virus/vhc_config/fplane/fplane.txt --rootdir /hetdata/data/

That example shows the first GW trigger on HET from the night of 20190412 UT, using observation #18, analyzing IFU 47, and using 46 as sky.

It produces a single image of that IFU called: 20190412_0000020_047.fits and also a full data cube 20190412_0000020_047_cube.fits , which is fun to scroll through. Best to run this in a clean directory and not in the data directories...

Compare to an archival DSS image with "ds9gw" code: /home/mcs/astronomer/bin/ds9gw as follows:

ds9gw 20190412_0000020_047.fits

This displays the IFU image in ds9 and then loads a DSS archival image (can do manually under: Analysis, Image Server, DSS changing the size to 2 arcmin in size and then doing a Frame - Match - Frame - WCS) and shows the SDSS DR9 sources which have g<21.5 and their g filter magnitudes.

If your observations are not within the SDSS footprint, use this command instead:

ds9gw_noSDSS 20190412_0000020_047.fits

End of Night Procedures

Ion Pump

The VIRUS unit 309 has its own ion pump. If this unit is not currently warm or warming out of control then the RA should turn on the V309_IONPump at the end of the night (after all cals are complete). This can be done through the APC gui under VEncl2Misc or with the command apcCmd on V309_IONPump

The Servers

Please leave the servers and monitors up at the end of the night so that they can check on the health of the instrument. You can see if they are running the following from vdas logged in as yourself:12637

hetdex@lrs2 ~]$ ps -ef | grep -i vdas
hetdex     9246      1  0 Apr10 ?        00:00:13 /home/hetdex/code/het/trunk/proxy/tcs_proxy --proxy-route ipc:///tmp/vdas_pivot.ipc --proxy-command /home/hetdex/code/het/trunk/camra/testing/proxy-pivot-het-data.sh
xymon      9275      1  0 Apr10 ?        00:00:00 /home/xymon/bin/xymonlaunch --config=/home/xymon/etc/clientlaunch.cfg --log=/home/xymon/logs/clientlaunch.log --pidfile=/home/xymon/logs/clientlaunch.vdas.pid
hetdex    10809      1  0 Apr10 ?        00:09:45 /home/hetdex/code/het/trunk/camra/camra_server -c /home/hetdex/code/het/trunk/camra/etc/vdas.conf
xymon     44106      1  0 01:18 ?        00:00:00 sh -c vmstat 300 2 1>/home/xymon/tmp/xymon_vmstat.vdas.44020 2>&1; mv /home/xymon/tmp/xymon_vmstat.vdas.44020 /home/xymon/tmp/xymon_vmstat.vdas
shetrone  44172  43977  0 01:23 pts/3    00:00:00 grep -i vdas

Transfer of data

At the present time all data transfers are automated and no interaction from the RA is required.

Troubleshooting

Checking Hardware Status

syscmd -v -l 'get_hardware_status( update_state="true" )'
syscmd -v -V 'get_hardware_status( update_state="true" )'

The status may be queried mid-exposure by omitting the update_state parameter.

Resetting just one controller

1) become user hetdex on computer vdas (e.g., ssh vdas then su - hetdex)
2) from the command line: /home/hetdex/bin/vreload <spectrograph> <-cv>

I have a new version of the program to be tested Monday, and when that's installed on vdas the digit 3 will be dropped from the command name (it'll be called vreload, where the name means 'load the VIRUS Digital to Analog Converters).

There are a number of flags and options we can cover at some other time. Two of them are:
1) -v makes the command verbose, reporting the voltage values that will be loaded into the controller
2) -n reports what will be loaded into the controller, but does not load the controller. This is basically a test to see if the configuration file (Jim, it's the spectro-nnn.conf file) can be found and read for the spectrograph specified on the command line. The values in the configuration file are what get loaded into the controller.

Here are the details for the argument <spectrograph> which I'll give as three example command lines for spectrograph V320:

/home/hetdex/bin/vreload V320 -cv -s 0.8

/home/hetdex/bin/vreload S053L -cv -s 0.8

home/hetdex/bin/vreload V320R -cv -s 0.8

You can reload both the left and right CCDs (LL, LU, RL, and RU as in the first example), the left CCD only (LL and LU as in the second example), and the right CCD only (RL and RU as in the third example).

After this has been run, take an LDLS exposure and a bias frame, with syntax something like this: cal virus -l ldls_long -L 1 -b -1 -B -o 9950

Some more comments:
1) note that this uses either the V number or the S number
2) VIRUS must be inactive when the command is executed. That is, it can't be doing anything except waiting for a command, including integrating and reading out
3) command load_vdacs cannot work on some types of controller problems, for example, a blown A-to-D converter. The command can be run without an issue, it just fix the problem in that case
4) should a VIRUS stop working 'during the night', I think you should try this command once (or twice if you feel really lucky)
5) you can't do harm with this command if used as specified (and probably even if used randomly)
6) no, it's not in svn, and won't be for a bit

Restarting just the Controllers and Muxes

Soemtimes we can fix the situation by restarting the mux/cntrl without a full reboot of the vdas computer.

  • shutdown virus server
  • power off mux/cntrl
  • cycle magma cards
  • power up mux (start Vencl1Mub0)
  • power up cntrl (start 1:1)
  • restart virus server
  • confirm the systme is back with a single bias using cal virus -b 1

Restarting all of the Controllers and Muxes

When CAMRA crashes, it is likely due to a controller going offline. That will not be seen in the log messages displayed from the monitor and, in the case described here, if it happens in the middle of an exposure it can leave a connected client script hanging waiting for the response to a command (i.e. expose, get_ccd_temp, etc). The *only* place you can see what is going on in this case is by tailing the /var/log/tcs_logs/virus/virus_server.log file on vdas. You will see that the log ends with a string that includes "CArc" and "exception" in the case of a hardware failure.

If you find the vdas services offline please ask around to see if there was a problem with the vdas machine or perhaps a restart of the machine. If so then try to start the services as soon as possible.

Here is a prescription for how to restart vdas from scratch if you can just get the services to come back up by starting the camra server from the RA launcher Operations menu under Servers.

Recommended procedures

Procedure 1

Try this twice, then try procedure two

  • shutdown [virus|lrs]Server
  • power off controllers
  • power off muxes
  • wait about a minute
  • power on muxes
  • power on controller
  • restart [virus|lrs2]Server
  • monitor logs
  • verify the virus|lrs2_server has restart properly
  • try procedure 1 again
  • try procedure two

Procedure 2

Try this directly if the error is "readout in progress". Try this twice then call Jim if you can not restart system.

  • save the file /tmp/camra_temperatures.json (as a precaution)
  • power off controllers
  • power off muxes
  • shutdown vdas and magma boxes
  • wait about a minute
  • power on muxes
  • power on controller
  • power on magma boxes and vdas
  • monitor logs
  • verify the virus|lrs2_server has restart properly
  • try procedure one if server does not come up the first time
  • try procedure two again
  • call Jim

  • From the RA launcher under Operations and Camera Servers and Virus Server select stop. This is equivalent to the old way "From your account on vdas: If running, stop vdas sudo service vdas stop "
  • power off all virus controllers and muxes from the APC gui PLEASE NOTE which muxes are not being used because you don't want to turn those on at the end.
  • log into the vdas computer
  • run the command sudo shutdown -h now using your own account (guider or astronomer can't do this)
  • go down to the UER and turn of the Magma boxes
  • wait one minute
  • turn on the Magma boxes
  • turn on the vdas computer with power button on upper left of machine. You can watch the system power up on VNC.
  • wait for the vdas computer to boot up, them log into vdas
  • run chksys and look for the pivot script and the virus_server
  • kill the virus_server process with the pull down RAlauncher menu or sudo kill -9 <pid>
  • kill the pivot process with the pull down RAlauncher menu or 'sudo kill -9 ,pid.'
  • power on muxes from the APC gui starting with mux 0 on VE1 through the last mux used on VE1 then mux 0 on VE2 and ending on the last mux used on VE2.
  • power on the controllers from the APC gui starting with 1 then 2

{start here for resetting service only}

  • From the RA launcher under Operations and Camera Servers and Virus PIVOT select restart. This is equivalent to the old way "From your account on vdas: If running, stop vdas sudo service virusPivot restart
  • From the RA launcher under Operations and Camera Servers and Virus Server select restart. This is equivalent to the old way "From your account on vdas: If running, stop vdas sudo service virusServer restart
  • monitor log file with tail -F /var/log/tcs_logs/virus/virus_server.log for problems and look for the Accepting commands.
  • Take a test bias frame with VIRUS to look for bias frames with 0 value (which are not warmed)

NOTE: Even if you see a frame with an all 0 value make sure that one is not currently warmed for repumping. If not then try just restarting the server ( vdas ) and see if it fixes that. If you see errors in the vdas.log of any kind but the server is running then try restarting the server only and see if that fixes the problem.

Pivot script problem

If pivot is not running and vdas_pivot.log have message like:

Failed to bind to tcp://127.0.0.1:39999: Address already in use

use following recipe (from RAlauncher)

  • stop VIRUS Server
  • wait 1 minute
  • start VIRUS Pivot and look at the PIVOT log to see if you get an error with the address or not.
  • start VIRUS Server
  • check that virusDataTransfer service are running (through RA launcher)
    It should report something like : virusDataTransfer (pid 36304) is running...
  • if not restart it through RA launcher

If the ramdisk fills up (probably because pivot is not running) then try the following:

  • login to vdas as hetdex
  • run dt_fix virus

Changing the CCD Temperatures

Usually you should make temperature changes through the TCS gui under VIRUS Environment tab look under the Warm-Up/Cool?-Down sub-tab. Enter the VIRUS spectrograph ("V") number under Spec. Hit update temps to see what the current set points are. Enter the new set points under the Target Temps section.

If you have to do this through the command line the command is.

syscmd -V -v 'set_ccd_temp( ccd=XXXXX, temp=-110 )'

To set the temperature for all CCDs in the array, specify ccd=0 in the command parameters

NOTE: The commanded setpoint is reset to the configured settings when vdas is restarted. These are located in ldas.conf and vdas.conf

To change an individual CCD away from the global configured settings look at /home/hetdex/code/het/trunk/camra/etc/spectro-NNN.conf

Hardware Changes

If one of the electronics day staff changes a controller a change must be made to the configuration files. For both LRS2 and VIRUS this must be done by either Phillip, Jason or Jim so call them at home at any hour needed to get the instrument running.

Manual fplane update

Automatic generation of fplane file runs daily at 22:30UT. In case if this did not work or there is a need of manually fplane update it can be done loging into vdas as hetdex and by running:
update_fplane -c -o /opt/het/hetdex/etc/Virus/vhc_config/fplane/fplane.txt

Recovering the dither position

If you are observing with the dither script and you have a crash you can check the dither position through the TCS GUI in the "Acquisition Camera Status" tab near the bottom. You can move the dither position with the following command:

syscmd -P -v 'AdjustDither(pos=1)'

Additional Help

Help for the complete set of CAMRA commands can be obtained $HET_SRC_ROOT/scripts/by issuing a 'help()' command to the CAMRA subsystem.

syscmd -v -l 'help()'

Attachments (2)

Download all attachments as: .zip