Federrath Astro Computing Course 2023
Federrath Astro Computing Course 2023
Astronomical Computing
Christoph Federrath
2023
Contents
1 Prerequisites 2
1.1 Computing environment and shell . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Package manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Window manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Other required/useful programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Word of caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Remote computing 4
2.1 Setting up remote access to ANU services . . . . . . . . . . . . . . . . . . . . . 4
2.2 Customising the Bash shell environment . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Connecting to another computer using the shell . . . . . . . . . . . . . . . . . . 6
2.4 Customising ssh connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Setting up an ssh tunnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Setting up private and public key pairs to connect to remote hosts without using
a password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.7 Copying files/data across remote computers . . . . . . . . . . . . . . . . . . . . 8
2.8 Mounting remote file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.9 Using rsync to create backups and to copy large data sets; use of tarballs . . . . 9
2.10 Use of nohup and nice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Python 17
6.1 Python startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Simple tasks in python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3 Creating python modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4 Reading data in python and making map plots . . . . . . . . . . . . . . . . . . . 18
6.5 Re-binning and beam convolution . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Page 1 of 25
6.6 Fourier analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.7 Statistical functions and PDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.8 Averaging data, making plots with error bars, and fitting . . . . . . . . . . . . . 20
6.9 Monte Carlo error propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.10 Gaussian line fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7 Version control 22
7.1 Basics of version control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Starting a git repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.3 Uploading/Communicating repository to server . . . . . . . . . . . . . . . . . . 23
7.4 Overleaf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1 Prerequisites
1.1 Computing environment and shell
1. Before we can get started with Astronomical Computing, we need a computer with certain
programs installed. The most important such program is the ’shell’. We will be using
the Bourne Again Shell (Bash) in this course. First of all, you will need a computer with
a shell program installed. This is always the case for Unix/Linux and Mac operating
systems. Windows also provides a command line tool, but you will have to install a Unix-
type shell. For Windows users there is a program called ’cygwin’, which provides all the
advanced functionality of a Unix shell, including X11 forwarding (we’ll get to what that
is). However, newer Windows versions also support a Linux subsystem, which may be
the best choice if you want to use Windows. This is a virtual machine that simulates a
Unix/Linux environment. For more performance, however, you can also consider a dual-
boot setup, where you would have Windows on one of your hard drive partitions and
Unix/Linux on the other. That way, Windows and Linux are more separated. For Mac
users, I recommend to install ’iTerm2’. The Mac operating system (OS) is a Unix-type
operating system, so you won’t need to go through as much setup hustle as you would on
a Windows machine.
2. Now make sure you have Bash installed on your computer. Open a terminal program
(e.g., ’iTerm’ on the Mac, and ’terminal’ or ”cygwin’ on Windows) and type:
> echo $SHELL
This should give you as output something like /bin/bash. The last part of the path is
the file bash, which is the Bash program that starts a Bash shell.
3. Bash is a shell program designed to listen to your commands, do what you tell it to, as
well as to browse and modify files and directories on your computer. Bash is a text-based
program, so it may seem limiting at first, but we will see that it is very powerful and
provides you with full control over advanced tasks, such as scripting (non-interactive Bash
mode) or connecting to other computers in the network.
4. Now make yourself familiar with the most basic functions of Bash. I recommend to go
through the Bash guide (http://guide.bash.academy) or a similar online tutorial on
Bash.
Page 2 of 25
1.2 Package manager
1. Bash comes with a set of basic commands and some important standard programs will
already be installed on your system. However, some other useful programs may not yet
be installed, so we have to find a way to add them to your system.
2. In order to install additional programs, it is best to use so called ’package managers’. For
Mac users, I recommend to install ’macports’ (https://www.macports.org), which is
a good package manager. Simply go to https://www.macports.org/install.php and
follow the instructions to install macports. You will need to install Xcode and do a couple
of further steps explained there.
3. For Linux users, let’s assume you already know what you are doing :) ...and for Windows
users the situation is basically hopeless :o ...oh well, you can install additional programs
with the ’cygwin’ package manager or the Linux subsystem package manager, depending
on what type of Linux distribution is supported by the Windows subsystem for Linux
(WSL). The default Linux type for WSL is Ubuntu.
4. Now you can install additional programs with your package manager, such as ’gnuplot’.
Note for Windows 10 users: if you are a Windows 10 user, you install Bash (https://itsfoss.
com/install-bash-on-windows/) and you have to install the Xming X Server for Windows.
You might also have to set export DISPLAY=localhost:0.0 in your SHELL environment (e.g.,
add that to your .bashrc). Similar may apply to Windows 11. Some further information is
located here: https://wiki.iihe.ac.be/Use_X11_forwarding_with_WSL.
Also note that if you use the Bash on Windows 10/11, there are some issues with file permissions,
i.e., new files are usually given read/write/exec permissions not only for the user, but also for
the group and everyone else. This can cause problems when configuring secure shell (in the
following lectures) to connect to other computers/servers. You can fix the file permissions by
hand (using chmod go-rwx [file]) in that case.
Page 3 of 25
> rsync
which should bring up a list of command line options for rsync and
> gnuplot
which should start gnuplot. You can exit gnuplot by typing gnuplot> exit.
3. You can install more programs as required. For example to search for a program by name,
in macports, you can type:
> sudo port search [name_of_program]
to see if that program is available for installation.
4. A note on X11 (so you can use windows and graphics on your own as well as on remote
computers), the simplest is to install gnuplot
> sudo port install gnuplot
which comes with the required xorg libraries.
2 Remote computing
2.1 Setting up remote access to ANU services
1. Most of the MSO servers and also some ANU services are only accessible from inside the
ANU network. This means that they normally cannot be accessed and used when you
are at home. However, there is a service called ’VPN’, which allows you to setup a secure
connection, such that it looks to the ANU servers as if you were actually inside the Uni
network, although you might be at home or even overseas.
2. Download the VPN server for your computer from the ANU VPN website. This is cur-
rently located here: https://services.anu.edu.au/information-technology/login-
access/remote-access, called ’GlobalProtect’ (I feel so much safer already...).
3. Follow the instructions on that web page to install the client program. This will be useful
for later also, because you can then connect to MSO servers and ANU services from
outside the University network. However, this may not give you immediate access to
MSO computing services, because ANU IT has to register access rights to MSO for your
Page 4 of 25
specific Uni account. Therefore, please fill submit the completed RSAA account form
asap.
1. First, let’s change the Bash console prompt. Add to your .bashrc in your $HOME directory
the line:
PS1=’\u@\h:\w>’
2. Now make sure that the following line is also present in your .bashrc:
test -s ~/.alias && . ~/.alias
This ensures that the content of .alias is read and added to the Bash environment. Can
you explain the syntax and what’s going on here in this one-liner?
3. We can now modify (or create, if it does not exist yet) the .alias and add some useful
shell aliases that allow you to access commands more quickly. A common command
would be listing an extended, more detailed version of ls, containing the file sizes, last
modification date of the files, and file permissions and ownership. Please add the following
lines to .alias:
alias ls=’ls -G $OPTIONS’
alias ll=’ls -Glh $OPTIONS’
The first one should enable colourised output via the -G option, which applies to Mac
OS (while the option --color=auto applies to Unix OS). Use man ls to see the manual
page for the specific version and implementation of ls installed on the system that you
are running ls.
Note that $OPTIONS is there to take additional options. Basically, we have now overloaded
the ls command with a colourised version. The second line defines a new command alias
called ll, which essentially calls ls with the additional options -l and -h for long-listing
(more details) and human-readable file size output, respectively.
Try and understand the listing of files and directories via ll. When you run this from
your home directory, what do the different columns mean, in particular the first column,
which is encoding the file/directory permissions?
4. Now let’s add a quick alias for your favourite editor, so you can quickly launch it from
the shell, for example emacs:
alias e=’emacs $OPTIONS’
Thus next time, you can very quickly open emacs by just typing e -nw [file]. Note
that the -nw option refers to ’no window’, meaning that emacs will be started directly in
shell mode instead of graphical-user-interface (GUI) mode. This is very useful if we want
Page 5 of 25
to modify a file on a remote computer, where we do not have a fast network connection for
graphical interactions. The text-based editor mode is much faster and is usually sufficient
for most tasks. If you don’t already have an efficient editor that you know how to use
remotely, I recommend to learn how to use emacs and common short-cuts by searching
the manual pages of emacs on the internet.
5. Make sure to restart Bash (or open a new Bash shell) so that your changes to .bashrc
and .alias will take effect. Another way of activating the changes is to call source
.bashrc.
6. If this does not work, you have to check your .profile or .bash profile. There should
be a call to .bashrc from either of those, similar to:
if [ "$BASH" ]; then
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
fi
Page 6 of 25
order to bring up a plot window that shows the plot, we have to ssh-connect in graphics
mode—we have to enable what’s called ’X11 forwarding’.
4. In order to achieve X11 forwarding, we have to logout and reconnect using the ssh option
-X or -Y, in order to enable X11 forwarding. Let’s try:
> ssh -Y [your_mso_username]@motley.anu.edu.au
...and then try to plot the sin again with gnuplot. This should now bring up a new window
with the sin(x) plot.
Page 7 of 25
use the VPN client for some reason, there is still another possibility to connect to the
MSO servers. This requires you to first connect to a specific server at MSO that is
accessible from the outside world. This server is called ’msossh1.anu.edu.au’. However,
as of 2021, this is not an option anymore; ANU wants us to use the VPN
:( . . . Nevertheless, we will use this as a basic example for how to set up an ssh tunnel. For
instance, some supercomputing centres will only allow access from specific IP addresses,
in which case you might want to use your laptop computer (usually dynamic IP address
assignment) to tunnel through that registered computer onto the supercomputer that sits
behind a firewall.
2. Let’s pretend motley.anu.edu.au might be protected and can only be reached from
msossh1.anu.edu.au. In order to connect to motley.anu.edu.au, we simply connect to
msossh1.anu.edu.au first and then ssh from msossh1.anu.edu.au to motley.anu.edu.au.
However, in order to make our life easier, especially when copying files between remote
computers, we can set up an ssh tunnel (in order to avoid the two-step process of first
connecting/copying to msossh1.anu.edu.au and then to motley.anu.edu.au). Do this
by adding/modifying the following lines in your .ssh/config:
Host motley
Hostname motley.anu.edu.au
User [your_mso_username]
ProxyCommand ssh -q -a -Y [your_mso_username]@msossh1.anu.edu.au nc %h %p
Save and close .ssh/config.
3. Now you should be able to connect to motley.anu.edu.au by using > motley directly
from your local computer. It will probably ask for your password twice; the first time
when it connects to msossh1.anu.edu.au and the second time when it connects from
there to motley.anu.edu.au. So what this effectively does is that we tunnel through
msossh1 to motley.
4. As earlier, add a ’motley’ alias to your .alias file as a shortcut for ssh -Y motley.
Page 8 of 25
mso.anu.edu.au/~chfeder/teaching/astr_4004_8004/material/mM4_10048_pdfs/EXTREME_
hdf5_plt_cnt_0050_dens.pdf_ln_data
2. Now copy the data file to motley using scp (’secure copy’):
> scp EXTREME_hdf5_plt_cnt_0050_dens.pdf_ln_data motley:
Note that you can use the tab key to auto-complete the rest of the awkward filename. If
you just provided the initial few letters ’EX’ and hit the tab key, the shell will automati-
cally complete the rest of the filename. Note also that when you start > scp and then
hit the tab key twice, you will be given a list of possible options for files in this directory.
This is a very useful auto-completion function of the shell. Once you executed the scp
command, you should see the file being transferred to motley. Note that the colon behind
’motley’ refers to your home directory on motley. If you want to copy the file somewhere
else, you just have to expand the path to the destination folder on motley after the colon.
2.9 Using rsync to create backups and to copy large data sets; use
of tarballs
1. Download the tarball (a set of files bundled together in one compressed file) from http://
www.mso.anu.edu.au/~chfeder/teaching/astr_4004_8004/material/mM4_10048_pdfs/
EXTREME_pdfs.tar.gz
Page 9 of 25
2. Create a new directory on your local computer called ’astro comp’ in your home directory:
> mkdir astro_comp
3. Now move EXTREME pdfs.tar.gz into this new directory:
> mv EXTREME_pdfs.tar.gz astro_comp/
...and change into astro comp/
4. Decompress and un-tar the tarball, using:
> tar -zxvf EXTREME_pdfs.tar.gz
This should generate 71 files. Do you remember how to check the total number of files in
astro comp/?
5. Now let’s make a complete backup of the directory astro comp/ and all the files in it by
sending a copy to motley.anu.edu.au. First change back into your home directory on your
local machine. Then do:
> rsync -av --stats astro_comp/ motley:astro_comp/
This generates a complete 1:1 copy of your local directory astro comp/ in your motley’s
home directory with the same directory name there. Please have a look at the status
summary output when rsync returns from doing its job.
6. We could have used scp (see earlier) for pushing a copy of all the files to the remote
server, but rsync can be used to backup data without having to transfer/copy all the
files every time, but instead it only transfers changes (changed files). For instance, if you
now run the same rsync command from above again, you will see that no files will be
transferred (because nothing changed in your local copy). If, however, you were to modify
one or more of the files in astro comp/ and you do the rsync again, only the modified
files will be transferred. This is really useful also if you have many many big files to
transfer to/from a computer or between different hosts, because if some of the files can’t
be transferred within a day or so and/or the ssh connection closes for some reason, at
least rsync will continue from the same point where it stopped to synchronise the folders
and files, instead of starting the copy process from scratch. BTW: an important part of
astronomical computing is to make regular backups of your files :)
1. Make a Bash script called nohup script.sh that prints the current time every second
for the next hour. Use a bash loop and date to print the current date and time. Let the
code pause for 1 second within each loop increment, by using sleep 1.
2. Start the script to see if it works. It should print the date and time to the screen every
second. Note that you can stop the script by pressing Ctrl-C.
3. Now start the script, but redirect the stdout and stderr to a file shell.out. While the
script is running in one shell, open another shell and look into the file. You should see
that every second a new line is appended with the current date and time to the end of
the file.
Page 10 of 25
4. Start the same script, but this time with nohup. Simply place nohup in front of the
command that starts the script and & after the command. nohup is a wrapper for any
command that you want to start such that it does not hang up (nohup means ’no hang
up’) when you log out. The final & at the end of a command line is used to start that
process in the background, which means that you get the shell back when you hit enter,
but the program keeps running in the background.
5. Use tail -f shell.out to show the end of the output file. It also follows any changes
to the file (the -f option). You should see how the file grows and how every second, the
date and time is appended as a new line to the file.
6. Now log out of the MSO server. Since we have started the script with nohup, it should
not hang up after we logged out, but instead will keep running on the server even though
we are logged out. So, let’s login again and see if the script kept on writing the time to
the file while we were away.
7. Caution: if you start something with nohup, it will keep running unless it’s killed by the
user or an admin. We made the script so it would stop after one hour, but let’s simply kill
the script by hand right now. To achieve this, we use the ps command, which shows all
running processes on the host. Let’s first add a customised version of ps to our .alias
file to make it an easy task to show all our own processes (there is a whole bunch of other
processes that are also running simultaneously on the server, but we don’t want to list
all of them; just our own). Add the following alias to your aliases:
alias myps=’ps -elf |grep $USER’
and log out and log back in for this change to take effect or simply source .alias in
your home directory. Do you remember what the |-symbol (pipe) does?
8. Now do > myps in the shell. This should list your current active processes (see the pipe
to grep $USER). Find the process ID that belongs to the running script that we started
with nohup and kill the job using:
> kill -9 [PID]
where you have to replace ’[PID]’ with the process ID of the job to be killed. Alternatively,
you can use
> killall ’bash’
which kills jobs based on their name. This will kill all your running bash scripts.
9. Note: You’d normally want to use nohup for jobs/scripts that take very long to run, for
example over night or even longer and you don’t want to keep an active shell open to the
remote host while the script is running. This is when nohup is really useful. However, we
should consider that other users might be running jobs or that a user could be physically
sitting at that computer/host at the same time and trying to do some work. So a nice
thing in this case is to use nice, which means that your job will only use CPU time (or
compute power) when it is available. So in case the machine is under heavy load and
running many processes at the same time, if you started your script with nohup nice
./script.sh &, then your job won’t block the jobs of other users so much, because it
waits for times when the machine is not under heavy work load.
Page 11 of 25
3 Plotting and fitting with gnuplot, and making movies
3.1 Simple plots of data from ASCII text files
Gnuplot is a powerful plotting tools that can create plots from analytic functions and from
tabulated data (e.g., from ASCII text files). Gnuplot is very useful for exploring data, plotting
them in different ways, etc., even if that data is on a remote server. It can also be used to
produce publication-quality figures, however, that requires tweaking a lot and making scripts,
some of which we’ll do here. Ultimately, IDL or python produce the ’nicest’ plots with somewhat
less effort, but making plots look ’perfect’ always takes some fiddling with plot command
options, etc., whichever programming language or package you are using.
1. Let’s make some simple plots first. Change into your $HOME/astro comp/ on the MSO
server (e.g., motley). Now look into EXTREME_hdf5_plt_cnt_0050_dens.pdf_ln_data
with more or less or cat (the file should still be there from the previous lecture or you can
download it again). You will see that there is a header in the file with 10 lines (showing
the mean, standard deviation and other statistical moments of the distribution function),
followed by an empty line and then 4 columns with data in them.
2. Now go back to the shell and start gnuplot. First, plot column 1 against column 3 in the
data file. Use this command:
gnuplot> plot "EXTREME_hdf5_plt_cnt_0050_dens.pdf_ln_data" using 1:3
Note that you can also use auto-completion of the file name in gnuplot by hitting the tab
key, just the same as in the shell. Ok, so this should show the density PDF in that file,
which should look pretty close to a Gaussian.
3. Now replace ”plot” with ”p” and ”using” with ”u”. This should do exactly the same as
before, but is a bit more compact, i.e., no need to write out each gnuplot command, you
can simply use the first letter in most cases and it will do the job.
4. Now plot the y-axis logarithmically. Do this with
gnuplot> set log y
and next enter the plot command from before again. Note that you can bring up the
most recent last commands again, simply by using the up-arrow key; this should bring up
the last few commands used. The same works in the Bash shell, which is very useful in
case you made a mistake when typing the command, you can bring up the previous one
instantly and correct only the bits of the command that didn’t work or that you want to
change—for example if you wanted to keep everything the same, but instead plot data
from one of the other files in the directory.
5. Now plot on top of the previous plot, the respective data columns from file with number
*0060*. Do this simply by adding to the end of the previous gnuplot command:
, "EXTREME_hdf5_plt_cnt_0060_dens.pdf_ln_data" using 1:3
This should now display two curves (or set of points) plotted on top of one another. You
can add more lines/curves simply by appending more to the plot command. For example,
if we wanted to plot now also a constant line at y = 10−3 , we’d simply add ’, 1e-3’ to
the end of the previous gnuplot command line and we should see the horizontal line at
y = 10−3 .
6. You can change the x and y axis labels with:
gnuplot> set xlabel "log-density contrast"
gnuplot> set ylabel "PDF"
...then plot again and see if the axis labels have changed.
Page 12 of 25
7. Ok, so this is all fine, but it usually doesn’t look very nice. Gnuplot is really good for mak-
ing a quick plot of some data, but making it pretty needs a bit more tweaking. Now let’s
see what can be done to make nicer figures, generate gnuplot scripts and automatically
write postscript figures to a file.
Page 13 of 25
3.4 3D plots with gnuplot
1. Let’s make a simple 3D plot of the previous figure in gnuplot. Use:
splot "EXTREME_hdf5_plt_cnt_0050_dens.pdf_ln_data" u 1:2:3
This shows the same density PDF as before, but now as a 3D plot. Note that you can
turn the plot around by dragging it with your mouse.
2. Note that the 2nd column in the data file does not contain any useful data (it’s all zeros),
so it does not contain any information in the 2nd axis of the plot. Let’s replace the 2nd
column with the 4th column of the data file (which contains the cumulative distribution
function):
splot "EXTREME_hdf5_plt_cnt_0050_dens.pdf_ln_data" u 1:4:3
This will now show a true 3D plot; i.e., the data in the 4th column of the file is now
plotting along the 2nd axis of the 3D plot.
3. One can do more advanced things, for example changing the point types and adding a
colour bar, e.g., by appending to the end of the previous line:
splot ... u 1:4:3 with points palette pointsize 1 pointtype 6
Page 14 of 25
7. You can play the movie file with ffplay. However, it won’t play well over the network,
so I recommend to copy it to your local computer (scp or rsync; see earlier) and play it
directly on your computer rather than in a window over the network.
8. So this is ok, but looks a bit pixelated. For a nicer movie, we have to increase the bit
rate by adding the option -b:v 5000k to ffmpeg. This will greatly increase the bit rate
and thus the quality of the movie output.
9. There are lots more advanced options in ffmpeg, for example cropping or extracting
frames from movies and changing the encoder. It is one of the most powerful movie
conversion/processing tools.
Page 15 of 25
5.1 IDL startup
1. Login to motley and make a new directory IDL/ in your home dir:
> mkdir IDL
2. Download the IDL startup package prepared for you: http://www.mso.anu.edu.au/
~chfeder/teaching/astr_4004_8004/material/IDL_startup_package.tar.gz
and copy it to motley into the new IDL/ directory.
3. Unpack the tarball. This will create subdirectories ASTROLIB/, MPFIT/, and textoidl/,
as well as three files: idlstartup, setcolors.pro, constants.pro.
4. ASTROLIB is a useful astronomy IDL library, MPFIT is an IDL non-linear fitting package,
and textoidl is a Latex-to-IDL string conversion library that lets you use Latex syntax
in IDL to make Greek letters, sub- and super-scripts and special symbols like you are
used to in Latex.
5. The idlstartup file is useful, because it controls the way IDL starts up (similar to how
.bashrc is run every time you start a new Bash session, idlstartup is run every time you
start IDL). In our case, it defines paths and automatically runs the script constants.pro,
which defines useful physical constants (the use of which, we will see below).
6. In order for IDL to know where to look for idlstartup, add this line to your .bashrc:
export IDL_STARTUP=${HOME}/IDL/idlstartup
This will make sure that idlstartup is executed every time you start IDL.
Page 16 of 25
6 Python
Python is now one of the most popular interpreted languages, and the wealth of packages
available in Python makes it very well suited for many programming tasks, including scientific
computing.
5. For latex support in matplotlib, we also need texlive, which you will have to install on
your computer; or if you are on an MSO server, you can add to your .bashrc:
# Environment Variables for texlive
MYTEXDIR=/usr/local/texlive/2022
if [ -d $MYTEXDIR ]
then
MANPATH=$MYTEXDIR/texmf-dist/doc/man:$MANPATH
INFOPATH=$MYTEXDIR/texmf-dist/doc/info:$INFOPATH
PATH=$MYTEXDIR/bin/x86_64-linux:$PATH
export MANPATH INFOPATH PATH
fi
6. You might also have to add the following line, which is where python3 is installed in the
MSO system: export PATH=$PATH:/pkg/linux/anaconda3/bin
7. There are currently a few different versions of python installed on MSO servers. The most
recent one is python3.8, so you will need to call your scripts explicitly with this one or
define an alias.
8. Finally, I wrote a package to do some common tasks in python, which is called cfpack.
You can install this package via pip install --user cfpack from PyPI (https://
pypi.org/project/cfpack).
Page 17 of 25
6.2 Simple tasks in python
1. Try and do the same tasks as listed in §5.2 for IDL, but now in python. Make use of
what’s defined in the cfpack.constants. You can add to this file later, if you regularly
require certain constants or conversions, so you can make calculations very quick and
easy.
2. Clarify anything you do not understand about the python startup and basic structure
(for example, defs, scope, decorators, etc...).
3. This is a good opportunity to create a git version-controlled copy of your python directory.
Please go to §7 for more info.
Page 18 of 25
6.5 Re-binning and beam convolution
1. Let’s add some beam smearing in order to simulate the effect of looking at these data
through a telescope with a finite beam resolution.
2. First we can use the re-binning function provided in cfpack. The functions rebin and
congrid (originally defined in IDL), can be used to make re-gridded versions of 2D arrays,
for example, to produce lower-resolution versions of the previous images. The congrid
function works for arbitrary grid interpolations. See if you can understand how the rebin
function works.
3. However, neither rebin and congrid simulate Gaussian beam smoothing; they just re-
grid a 2D array. Let’s implement a Gaussian beam convolution based on the python
function scipy.ndimage.gaussian_filter. Your own function, which ultimately calls
this scipy function for Gaussian smoothing should take the un-convolved map as an input,
as well as the Gaussian beam full-width-half-maximum (FWHM) as an input, and return
the beam-smeared image. Try with different FWHM values. What is the unit of the
FWHM?
4. Show the beam-smeared image(s) as before and compare the non-smoothed with the
beam-smoothed images.
5. Add mytools.py to your python git repository, if you haven’t done that already.
3. Now compute the discrete Fourier transform of the data, fˆ(k), where k = 2π/x is the
wave number (you can use numpy.fft).
4. Plot the power in fˆ. What does this power spectrum tell us? What is the data structure?
Have a look at numpy.fft.fftfreq for the wavenumber grid.
5. Now filter the high-frequency component with a top-hat filter, compute the inverse Fourier
transform of the Fourier-filtered data and plot it.
6. Create a similar test in 2D, using the function
again with 200 bins in x and y. You can use the cfpack.get 2d coords function to get
2D grids for the x and y coordinates.
Page 19 of 25
7. Display the 2D maps (suggest to make use of the earlier function we made to show
maps). Apply different filters to the data and investigate the inverse Fourier transform
of the filtered data.
8. Apply this to the column-density maps that we have analysed earlier, e.g.,
EXTREME proj xy 000300. See what can happen to the data when you apply different
cutoff wave numbers k = [(2π/x)2 + (2π/y)2 ]1/2 for the top-hat filter. Does this remind
you of something?
9. In general: what happens to the mean (average) of the data when you apply a Fourier
filter?
6.8 Averaging data, making plots with error bars, and fitting
1. Average all the three column density PDFs over time to produce a time-averaged PDF
with error bars.
2. Produce a plot of the PDF including error bars.
3. Fit the time-averaged PDF with a Gaussian function (for instance with
scipy.optimize.curvefit or better by using lmfit). Over-plot the fit.
4. Note that curvefit and lmfit currently only support errors (uncertainties) in the data
direction (y axis), but not in the direction of the variable (x axis). How could you create
a fit method that takes errors in both axes into account?
Page 20 of 25
6.9 Monte Carlo error propagation
Uncertainty analysis of measurements and derived quantities in astrophysics is extremely impor-
tant, and published measurement results should always have a proper error analysis associated
with them. Here we learn how to do Monte Carlo (MC) error propagation, which goes beyond
standard analytic error propagation and can handle non-Gaussian distributions for propagating
uncertainties.
a2
c= . (5)
b2
2. First, calculate the uncertainty of c using standard analytic methods of error propagation.
3. Now write a program that does the MC error propagation. First make Gaussian random
numbers (module random.gauss) and define a and b based on these Gaussian random
distributions. Then define c based on a and b.
4. Now plot the PDFs of a, b, and c. What is special about the PDF of c? Try also a log-y
axis version.
5. Compute the mean and standard deviation of c based on the PDF of c and compare to the
analytic estimate. Make sure to implement bin-centered binning for the numerical inte-
gration of the PDF, in order to recover the mean and standard deviation more accurately
than from the staggered bins.
6. Get the mode (most probable value) of c and the 16th and 84th percentile values.
7. Explore what happens when you replace a with a = 1.0 ± 0.5. Think about the interpre-
tation of writing ± (standard deviation) when quoting errors/uncertainties, relative to
the mathematically correct range for values of c.
8. Try changing the sample size of the Gaussians and the number of bins used for the PDFs,
in order to study numerical convergence of the results.
Page 21 of 25
measure the LOS velocity of the gas, as well as its dispersion (e.g., due to turbulence). The
lines are further broadened due to thermal broadening.
Here we want to learn how to fit such spectral line profiles, to obtain the position (velocity)
and width (dispersion) of the spectral line. Moreover, since there can be multiple line compo-
nents (from a mix of different gas dynamics occurring along the LOS), we want to be able to
fit multiple such lines. Most commonly, the individual line profiles are described by a Gaussian
function with mean and standard deviation, as well as a normalisation constant,
(v − v0 )2
I0
I(v) = p exp − , (6)
2πσv2 2σv2
where I(v) is the intensity of the line as a function of the velocity v (spectral position or velocity
channel), v0 is the mean (due to Doppler shifting), σv is the velocity dispersion (due to thermal
or turbulent broadening or both), and I0 is the normalisation constant of the Gaussian line
profile.
1. First, let’s create a grid of velocities and define a Gaussian profile I(v) with v0 = 0, σv = 1
and I0 = 1. Try different number of bins for the velocity grid; i.e., make the number of
bins a parameter of your script/function. Plot the profile.
2. Now fit this profile with the same Gaussian function, i.e., Eq. (6), and compare the input
parameters with the resulting fit parameters. The scipy.optimize.curve fit function
provides a simple interface for doing least-squares fitting, but I suggest to use the lmfit
package, which provides the ability to do composite models (combinations of functions,
such the sum of Gaussians), and some added functionality and outputs, such as the
reduced χ2 , etc.
3. Add Gaussian noise to the line (in a real observation, noise can come from various sources
such as the telescope itself, or the atmosphere through which a line is observed, or the
electronics). Re-plot, and re-fit the noisy line. Play with the amount of noise and see if
the fit still works.
4. Add multiple line profiles together, where each line profile is shifted by some amount
(different v0 ) and may also have different σv and/or intensity I0 . Re-plot this and play
with the parameters of the multiple components (suggest to start with 2 components,
then add a 3rd, etc). Depending on the number of Gaussians, try to fit the combined line
profile and see if you can create a fitting code that automatically detects the number of
Gaussian components that were used to create the combined line profile and that performs
a combined fit from which the individual components (their v0 ’s, σv ’s, and I0 ’s) can be
recovered. Try the same with noise added to the spectrum.
7 Version control
7.1 Basics of version control
1. Imagine you work on a code development project or you write a paper and you’d like to
keep track of changes and earlier versions of your code/paper. A neat way to achieve this
is to use version control software/tools.
2. Popular version control frontends (partially free or commercial, if you want repositories
to be private or shared by a large number of developers) are provided by services such as
https://bitbucket.org or https://github.com. These are primarily webpages that
Page 22 of 25
allow you to share your code with others, browse the source code and keep track of
changes. For bigger projects, you can also establish teams that work on the different
pieces/modules of the same code.
3. To get started, you need to create an account with https://bitbucket.org. I suggest
to use your ANU email address when signing up, which gives you access to unlimited
private repository users (otherwise, you’d have to pay for that).
4. A key element of these version control systems is that they keep their own files inside the
directory(ies) of your code, in order to store and update changes – basically to keep an
entire history of what’s been going on with the code; who made changes, what changes,
and when. They also allow you to revert to previous versions in case some bugs slipped
into the code or something broke at some stage.
5. In order to start a versioned code, you will need to install a version control system or
version control software. Some of the first bigger ones were Concurrent Versions System
(cvs) and Subversion (svn). Nowadays Mercurial (hg) and Git (git) are popular version
control systems. Here we will focus on git with some examples.
Page 23 of 25
2. To do this, we can create an account on https://bitbucket.org and start a new repos-
itory or import the existing repository from the previous steps. Note that if you sign up
with your ANU email address, your bitbucket account will automatically be an academic
account, which means that you can add an unlimited number of users to private reposi-
tories and won’t have to pay for it – otherwise it costs something like USD $2 per user
when exceeding 5 users :-( ...so better take advantage of being part of the Uni!
3. Once configured correctly, i.e., providing the correct URL and paths, such that your
computer knows that it should upload changes in the local repository to the bitbucket or
github server, you can simply type
> git push
to push all the changes in the local copy to the server. This will then allow you to browse
the code online, to share it and to view changes to the version-controlled code in an
internet browser (basically, it’s just a nicer view of what you can get with > git status
and > git diff on your local working copy). You can also upload your public key (see
§2.6) in your bitbucket account, so you don’t have to type your bitbucket password to do
commits, pulls, pushes, etc.
4. Having the code on the server will also allow you to share it more easily with others.
There are different options to do this, for example, cloning a repository, or forking or
branching a repository. We won’t go into details of that, but essentially, this allows one
to get a copy of the repository and continue working on it within their own local copy,
such that the global source repository is not damaged. However, changes by another user
that are deemed useful for the global copy of the repository can be merged in by issuing
a so-called pull-request.
5. In summary, when you develop code, it is a good idea to use a version control system.
What seems awkward at first is actually extremely useful once you get into trouble with
keeping track of changes based on your own strategies (e.g., keeping many earlier copies
of the same file/code). Version control software provides a standardised way of keeping
different versions of a code or simply a bunch of files that undergo regular changes.
7.4 Overleaf
Overleaf is a useful collaborative latex writing environment. Since 2023, ANU has an enterprise
agreement with overleaf, which means that you can use your ANU account (UID, SSO) to
access the advanced features of overleaf at no cost (such as track changes). More info here:
https://www.overleaf.com/edu/anu.
Overleaf supports git. In case, you want to write offline and with your own latex envi-
ronment, you can do so and then upload/sync to overleaf. You can also edit on overleaf and
then git pull to your local working copy. This is useful for backup and offline work on latex
documents managed by overleaf.
Page 24 of 25
8.2 Automation of operating system functions
One can more generally automate any mouse or keyboard interaction in Mac OS or Windows
(not sure about Linux, but probably yes) with the pyautogui module.
Page 25 of 25