Programmation Météo en Python
Programmation Météo en Python
Climate Science
A Programmer’s Guide
Sebastian Engelstaedter
This book is for sale at
http://leanpub.com/data-analysis-and-visualisation-in-climate-sciences
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean
Publishing process. Lean Publishing is the act of publishing an in-progress ebook
using lightweight tools and many iterations to get reader feedback, pivot until you
have the right book and build traction once you do.
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Concept of Local and Remote Machines . . . . . . . . . . . . . . . . 2
1.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Climate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Climate Data Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Data Use Licences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Accessing Climate Data . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Types of Climate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5.1 Analyses and Reanalyses Products . . . . . . . . . . . . . 7
2.5.2 Climate and NWP Model Output . . . . . . . . . . . . . . 8
2.5.3 Point observations . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Data File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.1 Plain Text and ASCII . . . . . . . . . . . . . . . . . . . . . 9
2.6.2 Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.3 GRIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.4 netCDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.5 PP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3. Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Introduction to Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Linux Distributions . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Desktop versus Server . . . . . . . . . . . . . . . . . . . . . 13
CONTENTS
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
CONTENTS i
Preface
I first ventured into the world of environmental data analysis in the early 2000s at
the end of my undergraduate degree when I was working as a part-time research
assistant at the Max Planck Institute for Biogeochemistry (MPI-BGC) in my then
home town Jena in Germany. I was fortunate enough to have had attended an IDL
programming course as part of my geophysics studies programme. While this gave
me a head start I still had to overcome some unforeseen challenges.
The data I was supposed to work with were saved on one of the MPI-BGC Unix
servers on which also the number crunching was supposed to be done, meaning
I had to deal with the technical challenges of setting up and configuring software
associated with logging on to a remote server. Once I managed to log on I was greeted
by the next challenge - the Unix command line. Coming from a Microsoft Windows
desktop background I was used to click my way through icons, menus and buttons to
get things done. There were no buttons on the Unix command line, I had to learn Unix
commands to move around within the directory tree, access files and start programs.
While I first thought this way of working is quite time consuming and tedious I
learned to appreciate the many advantages of the command line later on. In addition,
I had to learn how to automate tasks by using shell scripts (yet another programming
language), how to use command line tools such as Climate Data Operators (CDO)
and I had to manipulate and analyse climate data saved in file formats unknown to
me which required learning how to use a few more software tools.
Overcoming those obstacles was frustratingly time consuming. Often, I felt lost and
most certainly was close to giving up at many occasions. The only books available at
the time were written on special subjects and were difficult to follow at a beginner
level as they went far beyond what I needed to know often using terminology I
was not familiar with. The only way to learn the skills I needed was to either
find an unfortunate colleague who was willing to spend some of his own precious
research time explaining to me how to do things or to spend endless frustrating hours
searching internet forums for solutions.
Some progress has been made in recent years. Many software packages have
improved in terms of robustness and functionality and they tend to be better docu-
mented these days. There has also been a significant shift in the use of code-based
data analysis software from commercial products such as IDL or Matlab to open
CONTENTS ii
source products such as Python or R. In addition, interest in the field of climate and
environmental sciences increased significantly in recent years as a result of climate
change having become a permanent point on the political agenda, being passionately
debated on social media and being featured in high-profile documentaries (e.g., An
Inconvenient Truth) and Hollywood blockbusters (The Day After Tomorrow). From
personal experience I can also say that the number of students who want to learn
climate computing skills has been slowing but steadily increasing over the last few
years.
The motivation for writing this book was to make it easier for the next generation of
students and young researchers to get started with the analysis and visualisation of
climate and environmental data. They often work on time limited projects or short
term contracts which do not allow them to spend much time on learning new skills.
The book will hopefully allow them to focus on the science questions they are trying
to answer instead of spending weeks and months learning how to code and how to
use the tools needed. The book will guide the student through all steps from setting
up a work environment, understanding the data they are working with, doing the
number crunching and creating graphical output in high quality publishable format.
CONTENTS iii
Acknowledgements
This book would not have seen the light of day without the help and support of
many people and institutions. First, I would like to thank all the scientist, colleagues
and friends from whom I learned so much over the years. There are too many to
name but I would like to specifically say thanks to Ina Tegen, Arnaud Desitter, Gil
Lizcano and Mark New who all contributed significantly to improvements in my
coding abilities. I would also like to thank my long-term supporter, mentor, colleague
and friend Richard Washington without whom I would not be where I am today. And
last, I would like to thank everyone who help getting this book out by helping with
proof-reading chapters and graphical design including Thomas Caton Harrison and
Esther-Miriam Wagner.
In addition, I would like to thank the staff of the IT Office of the School of Geography
and the Environment, University of Oxford, for sharing their knowledge and for
building and maintaining a high-performance computational research cluster as well
as Jesus College, Oxford, for their financial support. I would also like to thank The
Woolf Institute in Cambridge for their hospitality during challenging times.
Also, a special thanks to Emma Heyn¹ and Marcus Wachsmuth their contributions
to the cover art and design.
¹https://www.emmaheyn.com
1. Introduction
The field of climate and environmental sciences has been constantly growing in
importance over the last few decades mainly due to an increased need to understand
how the climate system works and how it may change in the future as a result of
men-made climate change. Similar to a craftsman who as part of his apprenticeship
needs to learn what tools are available to work wood and hone his skills in how to use
them, young researchers need to learn what software is available, what the software
can do and how to use it.
Climate and environmental data come in a variety of formats depending on the
nature of the data and the preference of the scientists or organisations who compiled
or generated them. The analysis and subsequent visualisation of such data requires
a good understanding of the file formats and data structures as well as the tools that
can be used to manipulated these data, to calculate statistics and to visualise the
output.
The material presented in this book is based on more than 15 years of experience
working in the field of climate sciences and teaching climate data analysis and
visualisation courses to students at all levels at Oxford University. The aim of
this book is to introduce students to the technical background, set of tools and
programming skills required to successfully analyse climate datasets and produce
scientific output in publishable format.
and processing power is not essential (e.g., for exploratory research) then all of the
software packages discussed in this book can also be installed on local computers or
laptops as long as they are running a Linux operating system. All software packages
introduced in this book are freely available for research purposes.
This book provides an introduction to different types of climate data and the main
data formats in which they are being made available with a specific focus on the
most commonly used formats such as comma-separated values (CSV) and Network
Common Data Form (netCDF) files. The nature of gridded data will be discussed as
well as ways to explore the content of netCDF files. Students will learn how to work
on the Unix command line, how to use Climate Data Operators (CDO) to manipulate
climate data saved in netCDF file format, how to calculate climate statistics and
how to visualise the output using the Python programming language. Many code
examples will help in the learning process. Additional tools and techniques will be
discussed which will help with the data analysis and visualisation tasks including
how to deal with long-running processing jobs and which graphical output formats
should be used (bitmap vs. vector graphics).
For many of the subjects and software packages covered in this book (e.g., Unix,
CDO, Python) detailed in-depth user guides, tutorials and books exist. The focus
of this book is to integrate the different tools that have been shown to work well
in climate computing into a single framework that allows to create a seamless work
flow from understanding and analysing data through to computational data analysis
and the publication of high-quality graphical output in an efficient way.
While the collection of tools and techniques presented here have been shown to
work well for most climate computing tasks it should be noted that there are many
roads to get from A to B and there is no doubt that scientists around the world have
created data analysis environments and programming solutions that differ from what
is presented here. Where appropriate references to additional tools or solutions are
given.
computers are normally owned by the user or provided by the work place and are
generally referred to as local computers or local machines.
It is also assumed that the climate data analysis and visualisation tasks will be
performed on a server or server cluster running Unix/Linux. A server can be thought
of as a more powerful computer with extended disk arrays attached for storage. A
server may also be referred to as a remote server or remote machine because they
tend to be located physically in a different place from your local machine such as
in a different building or in a research centre somewhere else in the world. When
multiple servers are combined to create a more powerful setup then this is refereed
to as a server cluster or a computational research cluster.
In general, the local machine is used to connect to the remote server. This means that
it is possible to work from anywhere in the world as long as a reasonably fast and
stable internet connection is available.
1.3 Software
The software that is required on the local machine that allows to connect to the
remote server differs between operating systems. The software will be introduced
in Section 3.2 and will be discussed separately for each operating system. Every
aspect of climate computing discussed in this book can be achieved using open source
software.
The administrator rights for the installation of software on local machines will very
likely lie with either the user or with IT office of the institution that provides the
computer (e.g., department or research centre). In the latter case, it may be necessary
to contact the IT administrator to install the required software.
With regards to software on the remote server, users will have no or only very
limited control over the software installed and have to rely on the remote server
system administrator. However, it is very likely that most, if not all, software is
already installed on the remote server if that server is frequently used for climate
data analysis.
In exceptional circumstances, where a remote server is not available or accessible
and the climate data to be analysed are small enough then a local machine (PC or
laptop) may be used for data analysis. While Python can be installed on any operating
Introduction 4
system (Windows, OS X or Linux) some of the other software such as CDO, ncdump
or ncview works best on a Linux system.
2. Climate Data
2.1 Climate Data Overview
The term climate data as used throughout this book may refer to any observational
or numerically simulated dataset within the field of climate and environmental
sciences. Climate data may come from any realm of the Earth’s system including
the atmosphere, oceans, land ecosystems and cryosphere. In general, climate data
show how a variable changes in space or time or both. Space here may refer to
the spatial (horizontal) domain, the vertical domain (e.g., atmospheric or oceanic
vertical profiles) or the 3-dimensional (3D) space encompassing both the horizontal
and vertical domain.
Weather stations or fixed scientific instruments tend to provide measurements from
a single point location. Their general purpose is to see how a variable changes over
time at that specific location. In contrast, climate model output and many satellite-
derived data products are available on a time-varying spatial grid, allowing analysis
of both spatial and temporal variability. Some observations may also include both
space and time-varying dimensions, one example being aircraft measurements.
There are many reasons why one might prefer a gridded spatially consistent
dataset over point observations. One such reason may be the need to fill in gaps
between observational sites to better study spatial patterns. For instance, gridded data
products generated by the Climate Research Unit (CRU¹) at the University of East
Anglia are derived by filling in the gaps between point observations using statistical
methods.
Another reason for the need of gridded data products is that weather prediction
models require observations on a regular grid as input. The generation of such
gridded observational datasets is done routinely at some of the large data centres
and meteorological agencies in the world (e.g., ECMWF, NCEP, NASA) which use
data assimilation techniques, statistical methods and modelling as part of the gridded
data generation.
¹www.cru.uea.ac.uk
Climate Data 6
Never blindly trust data regardless of where they were sourced from. Errors
creep in quite easily for a multitude of reasons. Check for errors by visually
inspecting raw data and by creating simple test plots. The brain is quite
good a processing visual information and spotting things that are not quite
right. Use common sense.
However, more often than not, data files are too large to be sent by email which
means other methods for retrieving the data need to be provided. Nowadays, most
national and international data centres have web portals for browsing data products,
many of which come with the functionality to do temporal and spatial sub-setting
in order to give users an easier way to handle large datasets.
Some institutions make data available on a FTP (file transfer protocol) server. FTP
servers may be accessible through a web browser allowing manual file downloads or
via other computational resources allowing data to be downloaded programmatically
(e.g., Bash or Python scripts).
In some cases, especially when a large download would be unpractical, data may
be sent physically on storage devices such as USB (universal serial bus) flash drives,
external hard drives, CD-ROM (compact disc read-only memory) or DVD (digital
versatile disc).
With rapid advances in the technology used to accommodate large datasets such as
these, it is likely that new data acquisition methods will be developed in future.
fields over extended periods (usually decades) using a ‘frozen’ (fixed) version of
the assimilation and model code. These products are called reanalyses. Reanalysis
products also contain model-generated fields that are not based on observations.
Because reanalyses are internally consistent over time they are often used to study
climate processes and variability from the recent past. Some of the latest reanalysis
products and their properties are listed in Table 2.5.1.1.
Table 2.5.1.1: Some of the reanalyses commonly used in climate science and modelling.
specific delimiter. If the delimiter is a comma then the file is referred to as a comma-
separated values (CSV) file. ASCII files often have a header at the beginning of the
file that provides information about where the data files were generated, what the
data units are and how the data that follow are organised (e.g., column headers).
File extensions vary but include .txt for general text files, .asc for ASCII files and
.csv for comma-separated values files. In some cases no file extensions is included
in the filename.
2.6.2 Binary
Binary files often have the file extension .bin or .dat or do not have a file extension at
all. A binary file can contain any type of data encoded in binary form. When a binary
file is opened with a text editor a typically unintelligible mess of characters and
symbols is displayed. Binary files can be viewed using a hex editor. Many software
packages have libraries or packages that provide functionality to read binary files.
Pure binary files have become less common in climate sciences over the last two
decades.
The Met Office provides some of its model data in a specifically developed binary
format known as PP files. These are discussed in more detail in Section 2.6.5.
Big endian vs little endian…
2.6.3 GRIB
The Gridded Binary (GRIB) format is a self-describing data format that is widely used
to store model and satellite data in climate sciences. The GRIB format is standardised
by the World Meteorological Organization’s Commission for Basic Systems. There
are two GRIB Editions: GRIB1 and GRIB2. GRIB files usually have the file extension
.grb or grb2. The main advantage of the GRIB data format is that file sizes are smaller
relative to normal binary files.
In some cases it is useful to convert GRIB files to netCDF for more convenient file
manipulation. Table 2.6.3.1 lists some of the tools that may be useful for converting
files in GRIB format to files to netCDF format. The output should be checked
carefully as the conversion process may have some unexpected results especially
when it comes to the internal organisation of the netCDF file.
Climate Data 11
Table 2.6.3.1: Tools for generic conversion from GRIB to netCDF file format.
Tool Description
NCL NCL script ncl_convert2nc
Xconv/Convsh Can read GRIB1/2, can write netCDF
CDO cdo -f nc copy ifile.grb ofile.nc
2.6.4 netCDF
The netCDF file format is a self-describing data format developed at Unidata³. The
netCDF file format is one of the most common file formats in climate science
and most analytical software packages for climate data analysis have functions for
reading files in netCDF format. Many climate and numerical weather prediction
models will make their data available in netCDF format. To work with netCDF files
it is essential that the netCDF libraries are installed on the in the operating system
that is used to analyse the data. The file extension for netCDF files is .nc.
Different versions of the netCDF file format exist. Version 3 (netCDF-3) is known
as the classic format. Version 4 (netCDF-4) uses the HDF5⁴ format with some
restrictions.
2.6.5 PP
(https://artefacts.ceda.ac.uk/badc_datadocs/um/umdp_F3-UMDPF3.pdf)
PP format is big endian.
³http://www.unidata.ucar.edu/software/netcdf
⁴https://en.wikipedia.org/wiki/Hierarchical_Data_Format
3. Unix
3.1 Introduction to Unix
Like Windows or Mac OS X, Unix/Linux is a computer operating system. The
difference between Unix and Linux can be a bit confusing as both terms are often
used interchangeably. Unix operating systems are mainly commercial in nature
whereas Linux is an open source freely available clone of Unix.
Linux operating systems are also found on some mobile devices such as mobile
phones (e.g., Android) and tablets.
Distribution Based on
CentOS¹ Red Hat Enterprise Linux
Elementary OS² Ubuntu
Fedora³ Red Hat
Linux Mint⁴ Ubuntu
openSUSE⁵ Slackware Linux
Ubuntu⁶ Debian
¹www.centos.org
²www.elementary.io
³www./getfedora.org
⁴www.linuxmint.com
⁵www.opensuse.org
⁶ubuntu.com
Unix 13
Figure 3.3.1: HPC cluster at the School of Geography and the Environment, Oxford University.
Figure 3.2.1: Example of a common networking scenario for remote server access. Commands are
sent from the laptop via SSH connection to the server through the VPN tunnel (green arrow). X11
Forwarding facilitates graphical information to be send back to the laptop display if required (purple
arrow).
The software required to facilitate the four steps outlined above depends on the
operating system installed on the local machine. The list of software shown in Table
3.2.1 is just a recommendation but includes software that has been shown to work
well.
Table 3.2.1: Software recommendations for accessing a remote server (¹Free, ²Commercial, ³Cisco-
compatible VPN client).
Table 3.2.1: Software recommendations for accessing a remote server (¹Free, ²Commercial, ³Cisco-
compatible VPN client).
File transfer between the local and remote machine is covered in Section 3.3.5.
In order to connect to a remote server from a Windows OS the Secure Shell (SSH)
client PuTTY ¹³ needs to be installed. PuTTY is a freely available widely used SSH
client. Download the appropriate Windows installer or binary executable file from
the webpage and install PuTTY on the local machine. Configure PuTTY following
the steps outlined below.
¹²http://www.xquartz.org
¹³https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
Unix 22
1. Start PuTTY.
2. Go to Session and enter the server name under Host Name.
3. Select Connection Type SSH.
4. Go to Connection –> SSH –> X11 and check the box Enable X11 forwarding.
5. Go back to Session, give this configuration a name in the field Saved Sessions
and click the Save button.
6. The name under which this configuration was saved should now always be
listed on the right-hand side of main Session window every time PuTTY is
opened.
1. Open PuTTY.
2. Select the name the configuration was saved under on the right-hand side of
the main PuTTY Session window.
3. Click the Load button.
4. Click the Open button (as a short-cut just double-click on the configuration
name).
5. A terminal window will open. Enter username and password when prompted.
Upon first login to a remote server the following prompt may appear.
Confirm this by typing yes. This message will not appear on subsequent logins.
Mac OS X is an operating system that is very similar to Linux under the hood.
This makes the communication between the local machine and the remote server
straightforward. Access the remote server from Mac OS X following the steps
outlined below.
Unix 23
1. Open a command line interface (CLI) by starting the program iTerm, iTerm2,
Terminal or xterm.
2. Use the command below to connect to the remote server. ssh starts a Secure
Shell connection. -Y enables X11 forwarding.
ssh -Y jsmith@linux.ouce.ox.ac.uk
Note that in some Mac OS X terminal windows the cursor does not change
while the password is entered. This often leads to students thinking that
they cannot enter their password.
Upon first login to a remote server the following prompt may appear.
Confirm this by typing yes. This message will not appear on subsequent logins.
ssh -Y jsmith@linux.ouce.ox.ac.uk
Upon first login to a remote server the following prompt may appear.
Confirm this by typing yes. This message will not appear on subsequent logins.
The command prompt may look slightly different depending on which ter-
minal application was used to connect to the remote server. The appearance
of the command prompt may also be modified by the user.
In the reminder of the book a dollar sign at the beginning of code examples will be
used to represent the Unix command prompt.
Unix 25
Once a command has been entered hitting the Enter key will execute the command.
Some commands will produce output in text form that is displayed in the terminal
window. Other commands may open graphical windows. Some commands will
complete very quickly, while others will take longer to complete. Once Unix has
finished processing the command the command prompt will be available again.
The up and down arrow keys can be used to scroll backwards and forwards through
the command history. This can be very useful when commands including long paths
need to be (re-)entered.
The tab key can be used in most Unix systems to auto-complete commands, file
names and paths. Hitting the tab button twice in a row first auto-completes and then
presents possible options for completion if a full auto-complete of the command, file
name or path was not possible as multiple options are available.
An SSH session can be terminated by executing the command ‘exit’ or by closing
the terminal window by using the mouse. In both cases the SSH connection to the
remote server will be terminated.
How to copying and pasting text inside the terminal depends on the
terminal application used. If the standard way of using Ctrl + C (copy)
and CTRL + V (paste) does not work then the try the following. Highlight
the text to copy with the mouse (double-click to highlight whole word or
line), then use a single click of the right mouse button (or sometimes the
mouse wheel) to paste the highlighted text onto the command line.
echo $SHELL
As a beginner it is suggested to stick with the Bash shell. However, to try a different
shell type csh for the C shell, tcsh for the TENEX C shell, or ksh for the Korn shell.
More information about shells can be found on Wikipedia’s Unix Shell¹⁴ page.
A home directory will have been created by the system administrator as part of the
Unix account setup. The home directory (also referred to as user area) is where files
can be saved and sub-directories can be created by the user. For example, in Figure
3.3.3.1 two home directories were created as /home/rjones and /home/jking. When
logging on to the server users will be directed automatically to the root of their home
directory.
Unix 28
The directory tree structure may vary slightly from the example shown in Figure
3.3.3.1 depending on the server setup. To show the full path to the root of the home
directory the pwd (present working directory) command can be used. For the user
rjones the pwd command would return the following (see Figure 3.3.3.1).
/home/rjones
3.3.4 Quota
There are limits as to how much data can be stored in the user area. The user’s initial
quota will have been set by the system administrator when their user account was
created. The size of this quota can be found by using the quota command or by asking
the system administrator. The quota command returns details about the user’s quota
in tabular form. Table 3.3.4.1 provides details about the information presented in each
column.
Headers 2 to 5 show the block quota. This refers to the actual amount of space used
on the system in blocks wherein one block equals 1 KB. Headers 6 to 9 show the file
quota. This refers to the number of files and directories on the system.
# Header Explanation
1 Filesystem Name of the file system for which quota information
is displayed.
5 grace The amount of time left to get back below block’s soft
quota.
# Header Explanation
8 limit The hard quota for files, this limit cannot be exceed.
9 grace The amount of time left to get back below your file’s
soft quota.
A copy of the freely available WinSCP¹⁵ software can be obtained and installed on
the Windows machine. It is recommended to use the Norton Commander Interface
which provides a split window with the local drive on the left-hand side and the home
directory on the server on the right-hand side. Files and directories can be dragged
and dropped between your local machine and the server. If the Norton Commander
¹⁵http://sourceforge.net/projects/winscp
Unix 30
Interface is not the default setting then this can be changed through the main menu
Options Preferences –> Environment –> Interface –> Commander.
The Linux account login details can be entered in the WinSCP start-up window.
Similar to PuTTY the login details can be saved for easier access using the Save
button. After successful login the Norton Commander Interface will show the local
drive on the left-hand side and the home directory on the remote server on the right-
hand side. Files and directories can copy, move or delete on both sides. The mouse
can be used to drag and drop files and directories either from the local drive to the
remote server and vice versa.
3.3.5.2 File Transfer on the Command Line for Mac OS X and Linux
If a client with a GUI such as FileZilla is not available on Mac OS X and Linux
operating systems then files can be copied between the local machine and the remote
server by using the scp (secure copy) command. No matter which direction a file or
directory is copied the general syntax of the scp command is always the same and
looks like the following.
The source and destination can be either the local drive or the remove drive
depending on which direction a file or directory is being transferred. The general
syntax for the remote server is as follows (note the colon at the end).
username@servername:
The following are working examples of copying files from the local machine to the
remote server (Example 1) as well as from the remote server to the local machine
(Example 2).
Example 1
Example 2
scp jking@linux.ouce.ox.ac.uk:data/data.csv ./
The construction and use of relative and full paths is discussed in more
detail in Section 3.4.4.
The following instructions for mapping a remote network drive a based on Windows
10.
During the process of mapping the remote drive the username and password are
likely to be requested. The mapping process may take a few seconds to complete.
Once the above steps completed successfully the remote drive will now be accessible
from the desktop (folder icon) or via the Finder pane on the left-hand side.
Unix 33
First, a mount point has to be created on the local machine. The mount point is an
empty directory usually placed in the /media directory. The mount point has to be
created only once. To create a mount point named ldrive (e.g., linux drive) in the
/media directory the following command can be used.
mkdir /media/ldrive
Once a mount point exists the sshfs (Secure SHell FileSystem) command can be used
to mount (synonymous here for to map) the home directory located on the remote
server on the local machine. Assuming the username is jsky, the server name is
linux.ox.ac.uk, the full path to the home directory on the remote server is /home/jsky/
and the local mount point is /media/ldrive then the following command can be used
to mount the remote home directory on the local machine.
The -o idmap=user option is used here to set the user/group ID mapping to user.
The jsky@linux.ox.ac.uk:/home/jsky/ sets the full path to the home directory on the
remote server and /media/ldrive is the local mount point. The -o nonempty option
avoids the problem of the getting a mountpoint is not empty error message.
Files and directories saved in the home directory on the remote server should now
be accessible via the local mount point directory.
To unmount the mounted remote server home directory the following command can
be used.
fusermount -u /media/ldrive
The general syntax of Unix commands is in the form of the main command (indicated
in the following code example as cmd) potentially followed by some options, a source
and a destination. The main command, option, source and destination need to be
separated by at least one single space.
The most basic form of a Unix command is just the main command on its own as a
single word. For instance, the ls command on its own generates a simple list of the
content of the current directory.
ls
Most commands allow options to be added that modify how the command behaves.
The options start either with one hyphen (-) for the short version of the option (single
letter) or with two hyphens (--) for the long version of the option (single word).
Multiple options can be passed to the main command by combining the letters of the
short version options preceded by a single hyphen. For example, adding the options
-ltr to the ls command generates an extended (long) list (-l) of the directory content
sorted by time (-t) and the list being displayed in reversed order (-r) so that the
newest file is at the bottom of the list.
ls -ltr
cat text.txt
Some commands also expect an output file name or directory as a destination, for
instance, when copying or renaming files or directories. The following command
copies the file test.txt into a directory called output whereby ‘cp’ is the main
command, test.txt is the source and output is the destination.
Unix 35
cp test.txt my_output/
man ls
The output from the above command may look similar to the following.
Example output from the man ls command.
1 LS(1) User Commands \
2 LS(1)
3
4 NAME
5 ls - list directory contents
6
7 SYNOPSIS
8 ls [OPTION]... [FILE]...
9
10 DESCRIPTION
11 List information about the FILEs (the current directory by default). \
12 Sort entries alphabetically if none of
13 -cftuvSUX nor --sort is specified.
14
15 Mandatory arguments to long options are mandatory for short options too.
16
17 -a, --all
18 do not ignore entries starting with .
19
20 -A, --almost-all
21 do not list implied . and ..
22
23 --author
24 with -l, print the author of each file
25 ...
Unix 36
Use the up and down arrow keys to scroll through the man pages. Type q to terminate
the man pages and get back to the Unix command prompt.
Line 1 shows the name of the main command followed by the section number in
brackets. The section number identifies the section of manual the pages come from.
Each section corresponds to a specific set of commands. For instance, Section 1 is for
User Commands (see the manual pages for the man command itself for more details).
The reminder of the manual pages provides information about the command in
several sub-sections. The NAME section (line 4) provides a short description of
the command. The SYNOPSIS section (line 7) provides the general syntax of the
command. The DESCRIPTION section (line 10) provides a more detailed description
of the command including a list of the available options. Some information can be
obtained at the end of the man pages from the AUTHOR, REPORTING BUGS and
COPYRIGHT sections.
The manual pages can also be found in the form of webpages. A simple web search
for something like man pages ls will return many results including this page¹⁶.
A text editor is used for writing plain text files. Text editors should not
be confused with word processing software such as Microsoft Word or
LibreOffice which are used for writing rich text files whereby text is
saved in the software’s own specific binary format (e.g., .doc or .odt file
extensions).
For coding purposes a text editor should feature syntax highlighting and be customis-
able in terms of background and font colour (bright versus dark) as well as fonts and
font size as a minimum.
Some editors such as Atom, gedit or jEdit open a GUI whereas other editors such as
nano, vi or vim open inside the terminal window (also referred to as screen-based
or screen-orientated editors). Some editors such as Emacs feature both options. A
non-exhaustive list of frequently used text editors is shown in Table 3.4.3.1
To start the Emacs editor from the Unix command line in screen-based
mode use the -nw (no window) option followed by the filename (emacs -nw
myfile.txt).
To edit a text file located in the home directory on the server either a text editor
installed on the local machine or one that is installed on the server can be used. A
text editor installed on the local machine can only be used if the home directory
located on the server is mapped on the local machine. If that is not the case then
a text editor installed on the server should be used. Either way, a GUI-based text
editor should only be used if a fast and stable internet connection is available. For
slow internet connections a server-side screen-based text editor allows for a more
Unix 38
uninterrupted workflow.
Text editors vary in terms of their functionality and ease of use. While basic editors
such as gedit, jEdit or nano may be easy and intuitively to use, more feature-
reach and customisable editors such as Atom, Emacs or vim tend to be preferred by
analytical programmers (see Editor War¹⁷ for Emacs/vim pros and cons and historical
background).
For the use of Emacs or vim it is advisable to take the time and go through
one of the many online tutorials to learn how to use these screen-based
text editors as they require the use of key board shortcuts. While this may
sound more complicated than it should be it does pay off in the long run.
To create or open a file from the server command line use the text editor’s name
followed by the filename as shown in the following example which shows the use of
jEdit to open a file named myfile.csv.
jedit myfile.csv
After opening a GUI-based editor from the command line the command
line will not be available as the command that opened the editor only
terminates once the GUI is closed. One solution is to add a space and
the ampersand symbol (&) at the end of the command which instructs the
system that the command should run in the background. Another solution
is to just open a second terminal window.