Casa Cook Arvind
Casa Cook Arvind
Casa Cook Arvind
GOEDHART
INTRODUCTION TO CASA:
A K AT- 7 D ATA R E D U C T I O N
GUIDE
2
. . . YO U C A N N O T A C Q U I R E E X P E R I E N C E B Y M A K I N G E X P E R I M E N T S . YO U C A N -
N O T C R E AT E E X P E R I E N C E . Y O U M U S T U N D E R G O I T.
ALBERT CAMUS.
Contents
Bibliography 65
1
Why do you want to learn CASA?
For quite some decades, radio astronomy data reduction has been per-
formed using AIPS, Miriad and other packages. The coming online
of new and substantially enhanced radio interferometers like ALMA,
LOFAR and the JVLA has driven the development of a new software
package for interferometric data named CASA. CASA is better suited
to the complexity and data-volume from these instruments than previ-
ous existing packages. CASA was developed by NRAO, and was built
on the Aips++ codebase. It seems to be becoming the software suite of
choice for radio interferometric data. KAT-7 is a small interferometer
array and this tutorial presents a guide to reducing the Karoo Array
Telescope (KAT-7) data using CASA.
Producing an image from interferometer data is not straightfor-
ward: an interferometer does not form an image in the way an optical
telescope does, but rather the observations are made in a spatial fre-
quency space, called the u-v or visibility plane, which is essentially a
Fourier transform of the image plane. So, in order to make an image,
the interferometer measurements must be Fourier transformed.
Due to the nature of the interferometer, the (u, v) plane is not fully Figure 1.1: One of the KAT-7 dishes.
This document assumes that you are conversant with Linux and Python.
If not, then the book “SAMS teach yourself Linux in 10 minutes”
will be a good start even though you will take more than 10 min-
why do you want to learn casa? 7
utes to read the whole book. Reading “Python for Dummies” will
also help. We will also not cover all the fundamentals of radio as-
tronomy, but from time to time we shall try to discuss some concepts
before proceeding so that the user is aware of what we are doing.
For a thorough introduction to the fundamentals of radio astronomy,
we recommend the NRAO Essential Radio Astronomy online course
(http://www.cv.nrao.edu/course/astr534/ERA.shtml).
The procedures consist of various steps and each step generally de-
pends on the previous ones. So avoid skipping to the next step if you
have warnings and errors, without clearing them. We will note those
steps that are independent of the preceding ones.
CASA command:
prefix = ’CirX1’
msfile = prefix+’.ms’
1.3 Methodology
Note that for the data set we are using for the tutorial, this step has
already been done, so you can skip ahead to the next section. If you
need to do this step, go to the KAT-7 archive to search for your file.
Then check the eLog to see if the telescope operator entered any infor-
mation during the observations, and take note of any if they did. Then,
proceed to converting the archive file to a Measurement Set (ms). An
ms is the form in which CASA will work on your data. Also remem-
10 introduction to casa: a kat-7 data reduction guide
ber at the end of the data reduction to add an entry in the eLog of
the results. Convert the hdf5 archive file to an ms using the in-house
script h5toms as follows:
linux command:
you are running on the SKA Pinelands server), and then proceed with
the data reduction.
It’s probably a good idea to rename your ms to something a little
more descriptive at this stage.
linux command:
mv myfile.full_pol.ms School_data_av.ms
.
Of course if you’re looking at real data its probably a good idea to
use a name for the ms which reflects what the observations are.
linux command:
mkdir my_working_directory
let us get serious 11
The first main step is to move the ms into your chosen working
directory, and then also to make the working directory your current
directory. For the workshop, you may have downloaded the file
School_data_av.ms.tar.gz, which you would unpack with the linux
command:
linux command:
Locate the ms directory from the previous step and move it into
the working directory.
mv myfile.full_pol.ms my_working_directory
cd my_working_directory
Most of the data reduction will be done from within casapy, which
provides an interactive shell (essentially the equivalent of interactive
python or ipython). From the Linux prompt you will therefore first
run casapy. You will then give the commands to run various CASA
programs from within casapy. In casapy you can also create vari-
ables, assign values to them, and then subsequently use them at any
time during the processing. It is generally convenient to define some
variables to hold the names of ms’s and tables. The sequence of cas-
apy commands can then be re-used for a different data set by merely
changing the values assigned to these variables.
Never delete the ms file unless you have finished splitting your
relevant data and are sure you do not need the ms anymore.
CASA command:
prefix = ’Cirx1_school’
msfile = prefix+’.ms’
let us get serious 13
We now define variables that hold the names of various tables that
will be used during the reduction. The table names could be specified
directly, but if you do it by means of variables, the subsequent com-
mands we use can be copied and pasted and re-used for some other
reduction with the only modification required being the re-definition
of these variables.
CASA command:
reference_antenna= ’ant5’
Next, we use the task listobs to print out a summary of the obser-
vations in the ms (similar to the AIPS task LISTR with OPTY=’SCAN’):
CASA command:
listobs(vis=msfile)
14 introduction to casa: a kat-7 data reduction guide
CASA output:
(we have removed the leading part of the lines of the listobs output for clarity.)
================================================================================
MeasurementSet Name: /home/michael/casa-workshop2012/CirX1.ms MS Version 2
================================================================================
Observer: lindsay Project: 20120701-0006
Observation: KAT-7
Data records: 71904 Total integration time = 56296.5 seconds
Observed from 01-Jul-2012/13:57:43.4 to 02-Jul-2012/05:35:59.9 (UTC)
ObservationID = 0 ArrayID = 0
Date Timerange (UTC) Scan FldId FieldName nRows Int(s) SpwIds
01-Jul-2012/13:57:43.4 - 13:59:27.9 1 0 PKS 1934-638 168 14.1 [0]
14:00:13.4 - 14:04:58.4 2 1 Circinus X-1 420 14.8 [0]
14:05:28.4 - 14:06:21.4 3 2 PKS 1613-586 105 11.6 [0]
14:06:44.5 - 14:11:30.5 4 1 Circinus X-1 420 14.8 [0]
14:12:00.5 - 14:12:45.5 5 2 PKS 1613-586 84 14.5 [0]
What can you deduce from this listing of the data set?
CASA command:
plotants(vis=msfile)
16 introduction to casa: a kat-7 data reduction guide
From the above antenna plots, we can see that ant4 was not used in
this observation and also that the plots show the expected positions of
the KAT-7 antennas.
• viewer: can display (as a raster image) ms data, with some editing
capabilities
CASA command:
flagdata(vis=msfile,mode=’clip’,field =’’,
clipzeros=True, flagbackup = False)
flagdata(vis=msfile,mode=’elevation’,
lowerlimit=10.0, flagbackup=True)
Now lets plot the remaining visibilities. Phase stability, both in fre-
quency and time, is probably the most important thing to check first:
unless the visibility phases for the calibrator sources are reasonably
stable in time and across the different channels, calibration will likely
prove impossible.
Each visibility point is associated with one baseline, that is some
pair of antennas. However, instrumental failures are almost always
antenna-dependent, that is, due to a failure of some particular antenna,
and therefore all the baselines involving the bad antenna will have bad
visibility data. When identifying data to flag, it is best to try first to
establish whether it is some antenna that is bad, and if so flag all the
data to that antenna. Some bad data, however, in particular that due
to RFI, may not be antenna-dependent, and may have to be flagged
individually per baseline.
For starters, let us look at the phase as a function of frequency
on a per-baseline basis. We will start with just one example scan of
our strongest calibrator, PKS 1934-638, pick one scan from the listing
above, somewhere near the middle of the run. In this case we will pick
scan ID = 57, occurring between times 17:01:16.3 and 17:03:01.3.
Before you carry on, do spend some time getting to know the GUI
behind plotms and how to use it:
http://casaguides.nrao.edu/index.php?title=Data_flagging_with_plotms.
For axis definition in plotms check this link :
http://casaguides.nrao.edu/index.php?title=What%27s_the_difference_
between_Antenna1_and_Antenna2%3F_Axis_definitions_in_plotms.
Now actually run plotms to bring up the plot window and get a first
look at part of our visibility data. Note that the iteraxis=’baseline’
tells plotms to plot each baseline individually. Pressing the green
arrow at the bottom of the plotms interface will allow you to step
through the individual baseline plots interactively.
let us get serious 19
CASA command:
CASA command:
Again you will see a similar pattern. The visibility phases for XX are
quite stable in time, showing only small variations from scan to scan
and within each scan (mostly, the ∼10 or so visibility points within
a scan are so close together that the whole scan appears as a single
point in the plot). However, again, the visibility phases for the YY
polarization (tan colour) for ant2 and ant4 are quite unstable.
Since we have shown that the visibility phases for YY, ant2 and
ant4 are quite unstable both in time and across channels, we will
delete these points. Note, however, that so far, we have only examined
the data for one source, namely PKS 1934-638. Perhaps the data for
the other sources are fine! Before throwing out all the data, its worth
checking. In the plotms window, change “field” to PKS 1613-586 and
hit “plot”. There are more data for this source, but you will see the
exact same pattern. Since the data for YY, ant2 and ant4 are bad for
both our calibrator sources, we have no choice but to flag all these
data, including that for our target source Circinus X-1. Very likely
the target source data would also be bad, but in any case it cannot be
calibrated so we cannot use it. You could plot also the visibility data
for Circinus X-1, however since its not known to be a compact source
we do not know what to expect from the visibility phases. They may
vary strongly because the source is complex. It is only for calibrator
sources, where the actually visibility phase should be constant that we
can judge stability by plotting the visibility phases as we have done.
At present CASA has the limitation that it cannot calibrate one of
the XX, YY polarizations when the second one is missing, so we will ac-
tually have to flag all the polarizations for ant2 and ant4, even though
the XX polarization data does not have any problem. (There are four
let us get serious 21
total correlations, XX, YY, XY, YX. We will deal only with first two,
whose sum represents the unpolarized intensity. The calibration of the
polarized parts, XY,YX involves extra steps, and is not yet implemented
in CASA for linearly polarized feeds like those of KAT-7). Having
identified the bad data, we once again turn to flagdata to actually
flag it. We do not specify either field or correlation so flagdata
flags the requested data for all fields and correlations.
CASA command:
flagdata(vis=msfile, antenna=’ant2,ant4’,
flagbackup=True)
If you now repeat the above plotms commands, you will see that the
visibility phases are now nicely coherent for all the plotted baselines,
and the baselines we have flagged no longer show up.
Now lets turn to the visibility amplitudes. Again we make use of
the fact that for continuum calibrator sources, the expected values are
known and should be the same on all baselines, and vary only slightly
across the channels. This time we can plot all the baselines together,
rather than iterating through them as we did before:
CASA command:
You will see that for most channels there is only a relatively small
variation of the visibility amplitude with channel for PKS 1934-638
(note the amplitude scale on the left). This small variation will be
corrected later with bandpass. However, channel ID=6 shows a much
larger scatter than any of the others, which is indicative that there may
be RFI in this channel. Switching to the other calibrator source shows
the same pattern. It is sometimes diagnostic to also examine the XY,
YX polarizations here. For calibrator sources, we expect the polarized
flux density to be a relatively small fraction of the unpolarized (i.e.,
XX, YY) one, whereas RFI is often strongly polarized. We conclude
that channel ID=6 is likely affected by RFI, and therefore we will pro-
ceed to flag it also, once again using flagdata. Once again, we will
flag that channel also for the target source data, since even if the tar-
get source data were not also affected by the RFI, the data would be
uncalibratable.
CASA command:
We have now flagged most of the bad data affecting this run, in
particular the bad data for the calibrator sources, and can now proceed
to the actual calibration.
real, V (u, v) is generally complex, with non-zero values for both real
and imaginary parts. V (u, v) is often described as an amplitude and a
phase, rather than by its real and complex parts. Note that V (u, v) is
symmetric with V (u, v) = V (−u, −v): generally the Fourier transform
of a real function, like I, is complex but symmetric.
The correlator output differs from the V measured by an ideal in-
terferometer for a variety of reasons, to do with both instrumental
effects as well as propagation effects in the earth’s atmosphere and
ionosphere2 . The relationship between the observed and the true vis- 2
At low frequencies, such as those ob-
ibility on the baseline between two telescopes, i and j, can be written served by KAT-7, the effect of the iono-
sphere is usually the dominant one
in a very general way as:
thenceforth into a series of different factors, and each factor will be de-
termined in a different step. Our approach is similar to the so-called
“Measurement Equation”, which is a more formal approach to factor-
ing Jij described in [Sault and Cornwell, 1999], but we factor Jij into
slightly different components.
So we can write:
where
Ji = Ai Bi Gi . . . (2.4)
clearcal(vis=msfile,field=’’)
Time to take a break and have some coffee... The steps after will
need more attention.
26 introduction to casa: a kat-7 data reduction guide
We still have to enter the actual flux density for PKS 1939-638 for
the actual frequency of the observations. In general the flux density
varies with frequency, and setjy will calculate a slightly different
value for each channel from the known spectrum. (In the case that
a model image is used for the flux-density calibrator, setjy will also
scale the model image to the total flux density for each channel in the
ms, and then enter the Fourier transform of the scaled model image
into the MODEL_DATA column for that source and channel.) Setting
let us get serious 27
CASA command:
CASA output:
Once we are happy that the correct flux density has been entered
for the flux-density calibrator, we can proceed to the next step, which
is to solve for the bandpass response, in other words for the variation
of the complex antenna gains across the different channels. (Note that
setjy above did not yet solve for any part of Ji , it only entered the
flux density of our calibrator, essentially the value of V true into the
ms so that we can start solving for Ji in the next steps. The variation
of the gains across the observing band is largely due to instrumental
effects and usually quite stable, so we only need to solve for it once.
Once we have solved for the bandpass response, we can combine all
the channels in a single solution for the subsequent steps.
However, we have the problem that the phase part of Ji can be quite
rapidly time-variable even though the amplitude part is not. In order
to properly calculate the bandpass response, we therefore want to do
some temporary phase-calibration for that scan which we intend to
28 introduction to casa: a kat-7 data reduction guide
CASA command:
gaincal(vis=msfile, caltable=gain_table0,
field=’PKS 1934-638’, refant=reference_antenna,
spw=’0:7~11’, calmode=’p’, solint=’int’,
minsnr=4, minblperant=4, solnorm=T, gaintable=’’)
CASA command:
plotcal(caltable=gain_table0, xaxis=’time’,
yaxis=’phase’, field=’PKS 1934-638’,
iteration=’antenna’, plotrange=[0,0,-180,180])
On the plotcal plot, you should see that the gain-table phases (an-
tenna phases, or the phase-component to Ji ) should be fairly stable
during the minute or two of the scan-length. Small variations are fine,
indeed the point of running gaincal was exactly to find these small
variations so that they can subsequently be calibrated out from the
visibilities. However, if the antenna phase appears random from one
point to the next then something has likely gone wrong.
Now we can run bandpass to solve for the complex gain as a func-
tion of channel across the observing band. Specifying gaintable=[gain_table0]
causes bandpass to apply the time-dependent antenna phase calibra-
tion we determined in the previous step before forming the bandpass
solutions.
30 introduction to casa: a kat-7 data reduction guide
CASA command:
bandpass(vis=msfile, caltable=bandpass_table0,
field=’PKS 1934-638’, refant=reference_antenna,
solnorm=True, combine=’scan’, solint=’inf’,
bandtype=’B’, gaintable=[gain_table0])
CASA command:
plotcal(caltable=bandpass_table0, xaxis=’chan’,
yaxis=’amp’, subplot=222, iteration=’antenna’)
First note that plots for the antennas for which the data have been
flagged (ant2, ant4) are blank, which is as expected. For the others
the gain amplitude varies only by 10% to 20% across the channels,
which is in the expected range. Since we asked bandpass to normalize
the solution, the average gain for each antenna is 1.0 — at this point
we’re only after the variation across the band. If any of the gains were
to differ from unity by more than a factor of 2 or so, then may be more
bad data to flag. One would then have to flag that bad data, and then
re-start the calibrations from the clearcal() step.
The next step derive the main, time-variable, part of Ji , which we
term Gi . This is done for each source individually, by comparing the
observed visibilities Vijobserved to the expected visibilities Vijtrue . For the
flux density calibrator, we have determined the latter values, and we
can get the correct values for Gi . For the phase calibrator sources, we
assume a point source model, so we know the phase part of all the Vtrue
is 0. We assume for now that the unknown flux-density of the phase-
calibrator sources is exactly 1.0 Jy (typically they are of this order).
If we then solve for Gi , we will obtain complex values whose ampli-
tudes differ from the correct ones by a factor depending only on the
true source flux density (by ([1 Jy]/[true source flux density])0.5 ). We
let us get serious 31
can then compare the Gi obtained for the secondary calibrators, and
scale them so that they match as well as possible the correctly-scaled
amplitudes of Gi obtained for the primary flux-density calibrator. The
scaling required gives us the true flux density of the secondary cali-
brator source in terms of the originally assumed one, i.e., 1 Jy. This
process is known as “flux density bootstrapping”.
The reason for this complicated procedure is that there are only
a handful of flux-density calibrators, so it is generally impossible to
choose a flux density calibrator close to the target source on the sky.
There are many phase calibrator sources, however, so a phase cali-
brator can be chosen that is much closer on the sky (typically a few
degrees ◦ away). The flux density bootstrapping allows us to use the
phase and amplitude calibration (i.e. Gi ) from these much nearer cal-
ibrator sources, which will provide a much better calibration for the
target source, but to set the flux density scale accurately from obser-
vations of the flux density calibrator. The flux density scale is usually
stable for hours, so once we have transferred the flux density scale to
the phase calibrator sources by flux density bootstrapping, the solu-
tions we obtain for the phase calibrator sources are equivalent to those
which would have been obtained if we’d known the flux density of the
phase-calibrators to begin with.
First we determine the values of Gi , also called the complex gains,
for the flux-density calibrator:
CASA command:
gaincal(vis=msfile, caltable=gain_table1,
field=’PKS 1934-638’, solint=’inf’,
refant=reference_antenna, gaintype=’G’,
calmode=’ap’, gaintable=[bandpass_table0])
Note that the a new solution is started for each scan, the
solint=’inf’ only means that the solution interval can be ar-
bitrarily long up to the next scan boundary.
Next we determine the complex gains for the phase calibrator (so
far under the assumption that it has a flux density of 1 Jy).
32 introduction to casa: a kat-7 data reduction guide
CASA command:
gaincal(vis=msfile, caltable=gain_table1,
field=’PKS 1613-586’, solint=’inf’,
refant=reference_antenna, gaintype=’G’, calmode=’ap’,
append=True, gaintable=[bandpass_table0])
CASA command:
plotcal(caltable=gain_table1, xaxis=’time’,
yaxis=’amp’, iteration =’antenna’)
Notice that, for each antenna, there are two sets of points. The lower
set, in this case with gain amplitudes near ∼0.1 corresponds to our flux
density calibrator (PKS 1934-638). Since we were working with the
correct flux density for this source (∼13 Jy), these gains already have
the correct values. The other set of points, with values near ∼0.2, are
those for PKS 1613-586, which differ from the true values by a scale
factor. The next step is to derive this scale factor, so as to make the two
sets of points match as well as possible. The true flux density of PKS
1613-586 can be trivially determined from the derived scale factor.
The task fluxscale calculates the scale factor, thus performing the
bootstrapping, and will output a correctly scaled version of the cali-
bration table, which we can then use to calibrate the data, as well as
printing out the derived value for PKS 1613-586’s flux density.
CASA command:
fluxscale(vis=msfile, caltable=gain_table1,
fluxtable=flux_table1, reference=[’PKS 1934-638’],
transfer=[’PKS 1613-586’])
let us get serious 33
CASA output:
CASA command:
plotcal(caltable=flux_table1, xaxis=’time’,
yaxis=’phase’, iteration =’antenna’,
plotrange=[0,0,-180,180])
And you should see that the gain phases are in fact very stable in
time, and relatively consistent between our two calibrator sources. For
ant5 the gain phases will be exactly 0◦ as this was our reference an-
tenna, so it has a gain phase of 0◦ by definition. In order to calibrate
the visibility data for our target source, we have to interpolate be-
tween the calibrator gain values. Since the gain phases for our phase
34 introduction to casa: a kat-7 data reduction guide
calibrator source PKS 1613-586 form a nice smooth line (in fact the
individual points may not be distinguishable on your plot), the inter-
polation should yield an accurate value. If, on the other hand, the gain
phases (or amplitudes) differ drastically from one point to the next,
then interpolation to the intervening target-source observations will
be dubious.
Our calibration therefore is looking good. The next step is to ac-
tually use it to calibrate the visibility data and produce our estimates
of Vijtrue . Here is where we calculate the Vijtrue = Vijobserved Jij−1 . This is
where we want to be careful about how we interpolate our gain so-
lutions (Ji ). For example, on the plot of gain phase against time, you
could see the gain phases for PKS 1613-586 all lay on a fairly smooth
curve, while those for PKS 1934-638 did not quite lie on the curve.
Since PKS 1613-586 is much closer on the sky to our target source
(Circinus X-1) the gain values (i.e., values of Ji ) derived from PKS
1613-586 are better for interpolating to the target. However, we also
want to calibrate the visibility data for PKS 1934-638, and of course
here we’re much better of using the gain solutions derived from PKS
1934-638 itself. (Note that although at this point, we’re largely done
with our calibrator sources, and will not have any immediate further
use for the calibrator-source visibilities, its probably a good idea to
apply the calibration properly to the calibrator sources as well as the
target source).
The easiest way to keep it straight is to apply the calibration sep-
arately to each source. The task that applies the calibration is called
applycal. It combines the requested gain tables, interpolates or ex-
trapolates as required to determine the full antenna-based Ji , calculates
the baseline-based Jij values as needed for each visibility measure-
ment, and then writes out the properly calibrated visibility data (our
estimate of Vijtrue ) into the CORRECTED_DATA column of the ms. You can
specify the field, which determines which source’s visibility data get
corrected, and gainfield, which specifies from which source’s gain
table entries the required correction is determined. The best calibra-
tion is usually when the field entry is the same as the gainfield
entry, but usually one does not have calibration solutions for the tar-
get sources, so for the target source, gainfield must necessarily be
different than field. Note that as applycal can simultaneously apply
several calibration tables, gainfield can have multiple entries, one for
each calibration table to be applied. The two calibration tables we want
to apply are the bandpass table, bandpass_table0, and the gain table
with the correct flux density scaling for PKS 1613-586 or flux_table1.
Since there are two tables to apply, we need two entries in gainfield.
Lets just start with the first source: PKS 1934-638. We want to
calibrate its visibility data and therefore set field to PKS 1934-638.
let us get serious 35
CASA command:
applycal(vis=msfile,
gaintable=[bandpass_table0,flux_table1],
field=’PKS 1934-638’,
gainfield=[’’, ’PKS 1934-638’],
interp=[’nearest’,’’])
applycal(vis=msfile,
gaintable=[bandpass_table0,flux_table1],
field=’PKS 1613-586’,
gainfield=[’’, ’PKS 1613-586’],
interp=[’nearest’,’’])
CASA command:
applycal(vis=msfile,
gaintable=[bandpass_table0,flux_table1],
field=’Circinus X-1’,
gainfield=[’’,’PKS 1613-586’],
interp=[’nearest’,’’])
We are now almost ready to split the calibrated visibility data for our
target source into separate ms file. Note that this step is convenient,
rather than necessary. One could just as well make images (and even
proceed to self-calibration) using the original ms, but often it is con-
venient to split out the data of interest at this point. This particular
data set has already been averaged in frequency to reduce the size for
the purposes of the workshop. These days most raw data sets you
encounter will likely have far more than 19 channels, and data sets
from interferometers with more telescopes than KAT-7, such as JVLA,
ALMA, LOFAR and MeerKAT once it comes online, will have far more
visibility measurements at each time-stamp. In such cases one might
average down in frequency to reduce the size of the data set and make
for faster processing. It is not necessary in this case, so our split
output file will have the same number of channels as our original ms.
However, before you do any averaging, it is a good idea to briefly ex-
amine the calibrated visibility data for the target source in case there is
more bad data we need to flag. Recall that so far we have only exam-
ined the data for our calibrator sources, but not yet that for Circinus
X-1. Lets turn again to plotms. We will now plot correlated flux den-
sity against baseline length in wavelengths (uvwave). Such plots are
commonly made to diagnose the nature of the source structure. We
now have to specify that we want the CORRECTED_DATA column: we’ve
gone to all this trouble to determine how to calibrate our visibilities,
so we want to make sure we’re enjoying the fruits of our labours.
CASA command:
timerange.
CASA command:
flagdata(vis=msfile,
timerange=’2012/07/02/01:32:30~2012/07/02/01:34:00’,
field=’Circinus X-1’, spw=’0:18’, flagbackup=True)
So now we’ve removed those outliers, we can split the data for our
source of interest, Circinus X-1.
CASA command:
split(vis=msfile, outputvis=’CirX1_split.ms’,
datacolumn=’corrected’, field=’Circinus X-1’)
peak is mainly due to a real signal and only a minor part comes
from the sidelobe responses to other real emission farther away in
the image. Note that one should specify one or more “clean win-
dows” in which to search for real emission.
5. Add the final residual image to the convolution from the previous
step. The result is what is called the CLEAN image. Note that for
purposes of calibration, it is usually the CLEAN components that
are used rather than the CLEAN image.
The CLEAN algorithm was designed for the case where the true
brightness distribution contains only a few unresolved sources. It
works well for cases where the true emission occurs only within small,
well-separated, regions across the field of view, in other words if the
field of view is mostly empty. The following excerpt has been taken
from Robert Reid’s Thesis [?]. “Physics and filled aperture measure-
ments give us several properties that images should have. Non-negativity:
Astronomical objects, even black holes, cannot steal light away from
our receivers, but raw interferometric images abound with negative
patches. Requiring the image to be non-negative is a strong constraint
on what should be used to fill in the gaps. Locality: Usually the range
of directions from which light can be received is limited by the an-
tennas being used. The rough location and extent of the source may
also be known from observations at other wavelengths. Raw images
always have ripples going off to infinity that should be quenched. If
nowhere else, nulls in the antenna reception pattern are invalid places
to receive light from. For many frequencies and parts of the sky, bright
40 introduction to casa: a kat-7 data reduction guide
radio sources are also quite rare and confined (as known from filled
aperture measurements), so images are expected be mostly empty.
Smoothness: Sources are expected to be continuous, and usually to
have continuous derivatives. Also, it is best to minimize the effect of
the unmeasured part of the sampling plane on the final image. Per-
fectly sharp edges depend on the entire sampling plane, far beyond
where interferometers can make measurements. Smoothing the image
concentrates its region of dependence on the finite area that interfer-
ometers sample. Agreement with theory: At first this seems to throw
out the baby with the bathwater, since true discoveries must be novel.
True discoveries, however, must rest on true observations. Since decon-
volution ideally agrees with all the measurements, we should not let
non-measurements mislead us. Another way to see this is to think of
the model as assisting us in separating reasonable interpolations from
unreasonable ones for the image, but the parameters of the model it-
self are determined by the measurements. For example a circle has an
infinite number of points on its circumference, but three points on its
locus are enough to specify it, if we are sure that it is a circle. In any
case, images tend to be constructed before models, thus this principle
is not always applicable.”
Two main inputs to clean: the image size (imsize) and cell size
(cell), which correspond to the size in pixels of the map, and the
size in arcseconds of a pixel, respectively, must be carefully chosen.
Here we show how to calculate these two parameters. By knowing the
frequency of our observations we can calculate the primary beam (i.e.,
the area of the sky observed with the telescope).
First we choose the size of the pixels or cells. This choice is deter-
mined by the resolution of the interferometer, which is given by the
synthesized beam. The value that is usually given is the full width at
half-maximum (FWHM). Although the exact value for FWHM of the
synthesized beam will depend on the details of the u, v-coverage and
weighting for each particular observations, and approximate value for
a given set of observations depends largely on the maximum baseline,
Bmax , measured in wavelengths (λ) and is given by:
1
FWHMsynthesizedbeam ' radians, or (2.5)
( Bmax /λ)
3438
FWHMsynthesizedbeam ' arcminutes (2.6)
Bmax /λ
You can determine Bmax in λ by using plotms to plot the visibility
amplitude as a function of the baseline length in λ or uvwave.
In order not to be limited by the pixel size, the central lobe of the
dirty beam needs to be well resolved, and therefore a value of ap-
proximately 41 of the above FWHM of the synthesized beam should
let us get serious 41
be chosen for the cell size. When CLEAN is run, one should check
that the smaller, or minor axis of the elliptical fitted CLEAN beam is
at least 3× the size of the cells. If not, CLEAN should be stopped and
restarted with better-suited values for cell.
In our case, we get an estimate of 3.70 for the resolution and thus
0.920 for the pixel size, which we will round down to 0.90 (note that
since the requirement is to have pixels small enough to resolve the
beam, rounding down is okay but not rounding up).
The other choice the user must make is the size of the region to
be imaged. The maximum size of the region that it is sensible to im-
age is determined by the field of view of the interferometer, which is
just determined by that of individual antennas, and is independent of
the array configuration. The FWHM of primary beam of a uniformly
illuminated antenna is given by Napier [1999]:
1.02 × ( νc )
FWHMprimarybeam = radians (2.7)
(dishdiameter
In our case, the KAT-7 dishes have a diameter of 12 m, so at our
observing frequency of 1822 MHz, the FWHM of the primary beam
is ∼480 . Of course, sources outside this field can still be seen by the
interferometer, but at reduced amplitude. In particular, a source at a
distance of FWHM/2 (remember that the FWHM is a diameter, not a
radius) will be attenuated by 50% due to the primary beam response.
For the relatively coarse resolution offered by KAT-7, one can afford
to image a relatively large region, and a reasonable choice is to image
a region with 3× the diameter of primary beam FWHM. In our case
this is equal to (3 × 480 = 1440 . Since we have determined our pixel
1440 )
size to be 0.900 , we need to image a region of = 160. It is common
0.900
to choose image sizes which are powers of 2, so we shall round this up
to 256, which is the value we use for imsize.
the theoretical rms noise is appropriate. Note: to reach this limit the
data must be well calibrated/flagged and suffer from no serious arti-
facts (resolved out extended structure/negative bowls, poor PSF/(u, v)
coverage, limited dynamic range etc).
CLEANing deeply into the noise, that is with threshold set to a
value below the image rms level is dangerous if you have a large
CLEAN window. If the area of your CLEAN window is small, fewer
than say 10 times the area of the CLEAN beam, then it is fairly safe
to set threshold to some low value and just clean deeply. If the area
of your CLEAN window is larger than overcleaning can be a prob-
lem. Basically, once you reach rms (whether close to theoretical or
not), you are just picking noise up one place and putting it down in
another. With a small CLEAN window, there is not much freedom
here so nothing much can go wrong by moving a bit of noise around
within your small window. With a large window strange artefacts can
arise. Remember also, that rms in an otherwise blank region of your
image represents only a lower limit to the uncertainty of the parts of
your CLEAN image which have emission.
Starting the cleaning
Before you proceed to make an image, you will use the CASA
viewer to look and perform various tasks. Viewer3 is a GUI for im- 3
Follow the demo from: http:
age manipulation. //casa.nrao.edu/CasaViewerDemo/
casaViewerDemo.html
CASA command:
clean(vis=’CirX1_split.ms’, imagename=’CirX1_split.im’,
niter=5000, threshold=’5.0mJy’, psfmode=’hogbom’,
interactive= True, imsize=[256,256],
cell=[’0.5arcmin’ ,’0.5arcmin’], stokes=’I’,
weighting=’briggs’, robust=0.0)
3
Spectral line calibration and imaging
prefix = ’G330_OH’
msfile = prefix+’.ms’
gtable0 = prefix + ’.G0’
btable0 = prefix + ’.B0’
gtable1 = prefix + ’.G1’
ftable1 = prefix + ’.fluxscale1’
listobs(msfile)
44 introduction to casa: a kat-7 data reduction guide
CASA output:
##########################################
##### Begin Task: listobs #####
listobs(vis="G330_OH.ms",selectdata=True,spw="",field="",
antenna="",uvrange="",timerange="",correlation="",scan="",
intent="",feed="",array="",observation="",verbose=True,
listfile="",listunfl=False,cachesize=50)
================================================================================
MeasurementSet Name: /home/sharmila/spec_line_tut_OH/G330_OH.ms MS Version 2
================================================================================
Observer: sharmila Project: 20130621-0006
Observation: KAT-7
Data records: 12327 Total integration time = 40778.8 seconds
Observed from 21-Jun-2013/14:08:01.0 to 22-Jun-2013/01:27:39.8 (UTC)
ObservationID = 0 ArrayID = 0
Date Timerange (UTC) Scan FldId FieldName nRows SpwIds AveInts
21-Jun-2013/14:07:45.9 - 14:12:17.7 1 0 1934-638 189 [0] [30.2]
14:25:02.7 - 14:25:32.9 4 1 1613-586 21 [0] [30.2]
14:26:13.2 - 14:30:45.0 5 2 G330.89-0.36 189 [0] [30.2]
14:36:37.3 - 14:37:07.5 7 1 1613-586 21 [0] [30.2]
14:37:47.8 - 14:42:19.6 8 2 G330.89-0.36 189 [0] [30.2]
14:48:11.9 - 14:48:42.1 10 1 1613-586 21 [0] [30.2]
14:49:22.3 - 14:53:54.1 11 2 G330.89-0.36 189 [0] [30.2]
...........
...........
...........
01:10:33.0 - 01:11:33.4 159 1 1613-586 42 [0] [30.2]
01:11:48.5 - 01:16:20.3 160 2 G330.89-0.36 189 [0] [30.2]
01:22:07.6 - 01:22:37.8 162 1 1613-586 21 [0] [30.2]
01:23:23.1 - 01:27:54.9 163 2 G330.89-0.36 189 [0] [30.2]
(nRows = Total number of rows per scan)
Fields: 3
ID Code Name RA Decl Epoch nRows
0 T 1934-638 19:39:25.017468 -63.42.45.60158 J2000 2079
1 T 1613-586 16:17:17.893107 -58.48.07.88902 J2000 1176
2 T G330.89-0.36 16:10:20.541228 -52.06.14.90063 J2000 9072
Spectral Windows: (1 unique spectral windows and 1 unique polarization setups)
SpwID Name #Chans Frame Ch1(MHz) ChanWid(kHz) TotBW(kHz) Corrs
0 none 1001 TOPO 1665.985 0.381 381.9 XX YY
The SOURCE table is empty: see the FIELD table
Antennas: 7:
ID Name Station Diam. Long. Lat.
f_cal = ’1934-638’
b_cal = ’1934-638’
g_cal = ’1613-586’
ref_ant = ’ant6’
CASA command:
plotms(vis = msfile,
field = f_cal,
xaxis = ’channel’,
yaxis = ’amp’,
iteraxis = ’baseline’,
yselfscale = True)
Did you spot the RFI? Identify the channel and use flagdata to flag
the channel for the affected antennas.
CASA command:
flagdata(vis = msfile,
spw = ’0:548’,
antenna = ’ant5,ant6,ant7’)
Now lets prepare the file for calibration. This step will clear all pre-
vious calibrations or, if it is a freshly made ms file, will add in the
46 introduction to casa: a kat-7 data reduction guide
clearcal(msfile)
Now we are going to set up models for our calibrators. First, we set
up our flux calibrator:
CASA command:
setjy(vis = msfile,
field = f_cal,
fluxdensity = -1,
standard = ’Perley-Taylor 99’)
CASA command:
setjy(vis = msfile,
field = ’1613-586’,
fluxdensity = [0,0,0,0],
modimage = ’1613-586.model’)
3.3 Calibration
CASA command:
plotms(vis=msfile,
field = f_cal,
xaxis = ’channel’,
yaxis = ’phase’,
avgtime = ’1e8’,
correlation = ’XX’,
antenna = ’ant6’,
coloraxis = ’antenna2’)
CASA command:
ref_chans = ’0:1~600’
gaincal(vis = msfile,
field = b_cal,
caltable = gtable0,
refant=ref_ant,
spw = ref_chans,
calmode = ’p’,
solint = ’inf’,
minsnr = 5,
solnorm = True,
interp = ’nearest’)
CASA command:
plotcal(caltable = gtable0,
xaxis = ’time’,
yaxis = ’phase’,
markersize=3,
plotsymbol=’.’,
iteration=’antenna’,
subplot=421,
fontsize=8)
48 introduction to casa: a kat-7 data reduction guide
CASA command:
bandpass(vis = msfile,
caltable = btable0,
field = b_cal,
refant = ref_ant,
solnorm = True,
combine = ’scan’,
solint = ’inf’,
bandtype = ’B’,
minsnr = 5,
gaintable = [gtable0],
interp = [’nearest’])
If you are using CASA 4.1 and above, there is a new task to plot
bandpasses. This will enable you to see both the amplitude and phase,
and overplot polynomial solutions (Figure 3.1).
CASA command:
plotbandpass(caltable = btable0,
xaxis = ’freq’,
yaxis = ’both’,
subplot = 42)
CASA command:
CASA command:
gaincal(vis = msfile,
caltable = gtable1,
field = f_cal,
solint = ’inf’,
refant = ref_ant,
gaintype = ’G’,
calmode = ’ap’,
solnorm = False,
minsnr = 5,
gaintable = [btable1],
interp = [’nearest’])
Now check that there are no bad outliers in the gain calibration.
spectral line calibration and imaging 51
CASA command:
plotcal(caltable = gtable1,
field = f_cal,
xaxis = ’time’,
yaxis = ’amp’,
markersize=3,
plotsymbol=’.’,
iteration=’antenna’,
subplot=421,
fontsize=8)
plotcal(caltable = gtable1,
field = f_cal,
xaxis = ’time’,
yaxis = ’phase’,
markersize=3,
plotsymbol=’.’,
iteration=’antenna’,
subplot=421,
fontsize=8)
CASA command:
gaincal(vis = msfile,
caltable = gtable1,
field = g_cal,
solint = ’inf’,
refant = ref_ant,
gaintype = ’G’,
calmode = ’ap’,
append = True,
solnorm = False,
minsnr = 5,
gaintable = [btable1],
interp = [’nearest’])
As usual, check the gain solutions and make sure that there’s noth-
ing nasty going on. Next, we transfer the flux scale from our flux
calibrator to the gain calibrator. This will automatically transfer the
52 introduction to casa: a kat-7 data reduction guide
CASA command:
fluxscale(vis=msfile,
caltable = gtable1,
fluxtable = ftable1,
reference = f_cal,
transfer = g_cal)
CASA output:
##########################################
##### Begin Task: fluxscale #####
fluxscale(vis="G330_OH.ms",caltable="G330_OH.G1",
fluxtable="G330_OH.fluxscale1",reference="1934-638",
transfer="1613-586",listfile="",append=False,refspwmap=[-1],
incremental=False, fitorder=1)
Opening MS: G330_OH.ms for calibration.
Initializing nominal selection to the whole MS.
Beginning fluxscale--(MSSelection version)-------
Found reference field(s): 1934-638
Found transfer field(s): 1613-586
Flux density for 1613-586 in SpW=0 is: 1.01027 +/- 0.00741467 (SNR = 136.252, N = 14)
Storing result in G330_OH.fluxscale1
Writing solutions to table: G330_OH.fluxscale1
##### End Task: fluxscale #####
##########################################
CASA command:
target = ’G330.89-0.36’
applycal(vis=msfile,
field = target,
gaintable = [btable1, ftable1],
gainfield = [b_cal, g_cal],
interp = [’nearest’, ’linear’])
spectral line calibration and imaging 53
CASA command:
split_outputvis = target+’.split.ms’
split(vis = msfile,
outputvis = split_outputvis,
field = target,
datacolumn = ’corrected’)
There are a few things that you need to do before you can dive in and
start cleaning your image. Since spectral line observations give us a
lot of information about velocities of the objects that we are observing,
it is important to be clear on what reference frames we are using. The
apparent frequency of a source, also known as the ’sky frequency’,
is influenced by Earth’s rotation and revolution around the Sun, the
motion of our Solar System around the Galaxy, the movement of our
Galaxy within the Local Group and there is also motion against the
cosmic microwave background. The various rest frames are listed in
Table 3.1. At L-band, the sky frequency change by up to 0.15 MHz
during the course of a year.
The KAT-7 system does not do any Doppler corrections, so its rest
frame is topocentric. The actual sky frequency of your source will
vary from day to day and if you are observing with high frequency
resolution, you may even see your spectral line shifting during the
course of a single day’s observation. This is definitely the case for
these hydroxyl maser observations. We have a velocity resolution of
68 m/s so we do see the source Doppler-shifted by Earth’s rotation.
You will usually want to have your final data cubes output into
a different rest frame. The velocity frame to use depends on your
science. In the case of this dataset, we are looking at a Galactic star
formation region. It is meaningfull in this case to work in the Local
Standard of Rest (LSR) frame. There are actually two definitions of
LSR, the kinematic LSR (LSRK) and dynamic LSR (LSRD). Most of the
time LSRK is used and is generally synomynous with LSR. The other
commonly used rest frame is barycentric, and has mostly replaced the
old heliocentric standard.
There are two ways to change rest frames in CASA. The CLEAN
task can change the reference frame of the spectral axis which is fine if
you don’t have to worry about doppler shifts during the course of the
observation. CVEL is a more general spectral regridding tool, which
enables you to correct for Doppler shifts, or change the channelisation
54 introduction to casa: a kat-7 data reduction guide
You can use the viewer to look at visibilities. Lets try it now.
Figure 3.3 shows a pretty severe shift over the course of the obser-
vations. If we had to image this dataset without correcting for the
Doppler shift, we would end up blending our spectral features to-
gether. We are going to be using CVEL to correct for the doppler shift,
and at the same time, change the velocity reference frame to the Local
Standard of Rest (VLSR ). Before we do that, we need to set the line rest
spectral line calibration and imaging 55
CASA command:
restfreq=1665.40184e6
tb.open(split_outputvis+’/SPECTRAL_WINDOW’,
nomodify=False)
tb.putcell(’REF_FREQUENCY’, 0, restfreq)
tb.close()
If run listobs now you should see that the reference frequency has
changed. Next we run cvel.
56 introduction to casa: a kat-7 data reduction guide
CASA command:
freq_string = str(restfreq/1e6)+’MHz’
cvel_outputvis = target+’.cvel.ms’
cvel(vis = split_outputvis,
outputvis = cvel_outputvis,
mode = ’velocity’,
interpolation = ’linear’,
outframe = ’LSRK’,
restfreq = freq_string)
This can take some time for a large dataset. If you find yourself
running low on disk space, you can now delete the split file since
we will use the cvel file from now on. Have a look at the spectral
lines using the viewer and convince yourself that we have removed
the doppler shift. In Figure 3.4 you will see that the spectral lines now
stay in the same channel. Note that some of the edge channels are now
missing data. Make sure that you have split off sufficient channels to
account for this before you apply cvel. We will lose about 10 channels
in this process. When you clean your datacube, you should make sure
that these channels are not included. I generally avoid this problem by
spectral line calibration and imaging 57
setting the start velocity and number of channels for the output from
cvel such that I cover only the channels that I want to image. This trick
works well if you have observations spread over more than a month,
because the channel in which a particular velocity will be present will
vary over the year. This brings your data into a common reference
frame with consistent channel numbers. If you don’t know what these
parameters should be, you can run cvel on the full channel range,
then examine the resulting file and rerun cvel with a smaller output
range once you know which velocity range is of interest. Remember
to leave sufficient line-free channels on either side of your dataset for
continuum subtraction.
CASA command:
plotms(vis=cvel_outputvis,
ydatacolumn = ’data’,
xaxis = ’channel’,
yaxis = ’amp’,
avgtime=’1e8’,
avgscan=True)
There may be some emission around the last channels. It also turns
out that this plot does not show sufficient detail to figure out where
the line emission is. It will be much better to make a dirty cube first
58 introduction to casa: a kat-7 data reduction guide
and inspect the result to determine where we can place the continuum
windows. To make a dirty cube, we simple set the number of iterations
of CLEAN to zero.
CASA command:
clean(vis = cvel_outputvis,
imagename = ’dirty’,
mode = ’channel’,
outframe = ’LSRK’,
restfreq = freq_string,
imsize = 256,
cell=’30arcsec’,
stokes = ’I’,
niter = 0,
psfmode = ’hogbom’,
interactive = False,
weighting = ’briggs’)
viewer(’dirty.image’)
CASA command:
cont_window = ’0:280~340’
uvcontsub(vis=cvel_outputvis,
fitspw = cont_window,
fitorder = 1,
want_cont = True)
Now you have two new measurement sets, one containing the con-
tinuum data, which you can image as usual, and one containing the
spectral data. We do not expect to see any absorption features in this
field, so if you do see negative features, it may mean that the con-
tinuum subtraction has failed. You can check this before going on
to the laborious task of deconvolution by making a dirty cube of the
spectral line calibration and imaging 59
continuum-subtracted data.
Now we can start cleaning the spectral line cube. Masers typically
have angular sizes of the order of milli-arcseconds and therefore are
not resolved with KAT-7. However, there are five individual sources
of maser emission in this field of view. Lets start the cleaning process.
You will notice that we still specify the rest frequency even though we
have already set it in the measurement set. This is because of some
inconsistencies in the way CASA handles spectral velocities - for now
it is safer to always specify the rest frequencies and the required output
frame.
CASA command:
restfreq=1665.40184e6
freq_string = str(restfreq/1e6)+’MHz’
contsub_vis = target+’.cvel.ms.contsub’
cube_namebase = target+’.cube.clean’
clean(vis = contsub_vis,
imagename = cube_namebase,
spw = ’0:1~590’,
mode = ’channel’,
outframe = ’LSRK’,
restfreq = freq_string,
imsize = 256,
cell=’30arcsec’,
stokes = ’I’,
threshold = ’0.15Jy’,
niter = 20000,
psfmode = ’hogbom’,
interactive = True,
weighting = ’briggs’)
If you find you are running out of time, try setting interactive to
False and let CASA find its own way.
Optional: Try cleaning the continuum. Below are the commands
to start the continuum cleaning. Note that I have dropped the last 11
channels to get rid of the empty channels that we were left with after
CVEL.
60 introduction to casa: a kat-7 data reduction guide
CASA command:
cont_vis = target+’.cvel.ms.cont’
cont_image = target+’.cont.clean’
clean(vis=cont_vis,
imagename = cont_image,
spw = ’0:0~590’,
imsize = 512,
cell=’30arcsec’,
stokes = ’I’,
threshold = ’10mJy’,
niter = 5000,
psfmode = ’hogbom’,
interactive = True,
weighting = ’briggs’)
You will notice that this field has a pretty complex structure, con-
sisting of point sources as well as more nebulous evolved HII regions.
It is far better to clean this image using multi-scale clean, which will
fit for several size-scales. To set up multi-scale, we need to specify the
scales, in number of pixels, where 0 indicates a point source, 5 pixels
is about one beam, and then we do a few multiples of the beam, going
up to 4 times the beam in this case. Note that CASA will complain if
the scales are too large and then drop them. If doing the cleaning non-
interactively, it is good to also tell CASA to stop when it starts creating
too many negative components, otherwise things can go wrong very
quickly.
CASA command:
cont_image = target+’.cont.multiscale.clean’
clean(vis=cont_vis,
imagename = cont_image,
multiscale = [0, 5, 10, 15, 20],
negcomponent = 1,
spw = ’0:0~590’,
imsize = 512,
cell=’30arcsec’,
stokes = ’I’,
threshold = ’10mJy’,
niter = 5000,
psfmode = ’hogbom’,
interactive = True,
weighting = ’briggs’)
4
Troubleshooting and Support