Intro To Scripting With Bash
Intro To Scripting With Bash
Charles Jahnke
Research Computing Services
Information Services & Technology
Topics for Today
● Introductions
● Basic Terminology
● How to get help
● Command-line vs. Scripting
● Variables
● Handling Arguments
● Standard I/O, Pipes, and Redirection
● Control Structures (loops and If statements)
● SCC Job Submission Example
Research Computing Services
Research Computing Services (RCS)
A group within Information Services & Technology at Boston University provides
computing, storage, and visualization resources and services to support research
that has specialized or highly intensive computation, storage, bandwidth, or
graphics requirements.
● Research Computation
● Research Visualization
● Research Consulting and Training
Breadth of Research on the Shared Computing Cluster (SCC)
Me
● Contact: [email protected]
You
● Using Linux?
[username@scc1 ~]$
● Provides:
○ Built-in commands
○ Programming control structures
○ Environment variables
Script
● A text file containing a series of commands that an interpreter (like shell) can
read and run.
Interpreter
● A program that runs commands without compiling (directly from text)
Bash
The name of the most common shell interpreter, it’s language, and syntax.
SYNOPSIS This text is a brief description of the features that are present in the
bash [options] [file] Bash shell (version 4.2, 28 December 2010).
COPYRIGHT This is Edition 4.2, last updated 28 December 2010, of 'The GNU Bash
Bash is Copyright (C) 1989-2011 by the Free Software Reference Manual', for 'Bash', Version 4.2.
Foundation, Inc.
Bash contains features that appear in other popular shells, and some
DESCRIPTION features that only appear in Bash. Some of the shells that Bash has
Bash is an sh-compatible command language interpreter borrowed concepts from are the Bourne Shell ('sh'), the Korn Shell
that executes commands read from the standard input or ('ksh'), and the C-shell ('csh' and its successor, 'tcsh'). The
from a file. Bash also incorporates useful features from following menu breaks the features up into categories based upon which
the Korn and C shells (ksh and csh). one of these other shells inspired the feature.
Bash is intended to be a conformant implementation of the This manual is meant as a brief introduction to features found in
Shell and Utilities portion of the IEEE POSIX specifica- Bash. The Bash manual page should be used as the definitive reference
tion (IEEE Standard 1003.1). Bash can be configured to on shell behavior.
be POSIX-conformant by default.
* Menu:
Bash “help”
scc1 $ help forhelp
● Bash comes with built in help functionality GNU bash,
help:
for: for
helpNAME
version
[-dms]
[in WORDS
4.2.46(2)-release
[pattern......]
] ; do COMMANDS;
(x86_64-redhat-linux-gnu)
done
TheseDisplay
Execute
shell commands
informationareabout
for defined
each builtin
member
internally.
incommands.
a list. Type `help' to see this list.
○ Just type “help”
Type `help name' to find out more about the function `name'.
Use The
`info
Displays
`for'
bash'
brief
loop
to executes
find out a
summaries more
of
sequence
builtin
about ofthe
commands.
commands
shell infor
If
general.
PATTERN
each member
is in a
Use list
`man of
specified,
-k'items.
orgives
`info'If
detailed
to find
`in WORDS
help
out
...;'
more
on all
is
about
commands
not commands
present,
matching
not in
then PATTERN,
`in
this
"$@"'
list.
is
● Read deeper into help chapters by otherwise For
assumed. the each
list element
of help intopics
WORDS,is NAME
printed.
is set to that element, and
A star
the(*)
COMMANDS
next to are
a executed.
name means that the command is disabled.
searching specific keywords Options:
job_spec
Exit
-d Status:
[&] output short description
historyfor [-c]each
[-dtopic
offset] [n] >
○ “help [keyword]” (( Returns
expression
-m the))
displayof
status usage
the last
in pseudo-manpage
ifcommand
COMMANDS;
executed.
then
format
COMMANDS; [>
for
. filename
((:
-sfor (([arguments]
exp1;
output exp2;
onlyexp3
a short
));
jobs
usage
do [-lnprs]
COMMANDS;
synopsis
[jobspec
done
for each
...]topic
o> matching
: Arithmetic
PATTERNfor loop. kill [-s sigspec | -n signum >
[ arg... ] let arg [arg ...]
[[ Equivalent
expression to
Arguments: ]] local [option] name[=value] .>
aliasPATTERN
[-p]
(( EXP1
[name[=value]
Pattern specifiying
)) ... ]> logout
a help [n]
topic
● “Help help” bg [job_spec
bindExit
while ((
[-lpvsPVS]
...]EXP2 )); do
Status:COMMANDS
mapfile [-n count] [-O origin>
[-m keymap] [-> popd [-n] [+N | -N]
● “Help for” ...Returns success
given.
.. done
(( EXP3
unless
)) PATTERN is not found or an invalid option is
https://www.gnu.org/software/bash
Command-line vs. Scripting
Recap of Command Line vs Script Definitions
Command-line
● Has a prompt
● Not saved
● One line at a time
● The text based way to interact with a computer
Script
● No prompt
● Is a file
● Still runs one line at a time
● Runs all the lines in file without interaction
Example CLI Task: Organize some downloaded data
[username@scc1 ~]$ cd /projectnb/scv/jpessin/introToBashScripting_sampleScripts/cli_script
[username@scc1 cli_script]$ ls data
LICENSE sample1.chr1.bam sample1.chr4.bam sample2.chr1.bam sample2.chr4.bam sample3.chr1.bam sample3.chr4.bam
README sample1.chr2.bam sample1.chr5.bam sample2.chr2.bam sample2.chr5.bam sample3.chr2.bam sample3.chr5.bam
report.html sample1.chr3.bam sample1.log sample2.chr3.bam sample2.log sample3.chr3.bam sample3.log
[username@scc1 cli_script]$ cd data
[username@scc1 data]$ mkdir sample1
[username@scc1 data]$ mv sample1.chr*.bam > sample1
-bash: sample1: Is a directory
[username@scc1 data]$ mv sample1.chr*.bam sample1/
[username@scc1 data]$ cd sample1/
[username@scc1 sample1]$ ls sample1.* > sample1.fileset.txt
[username@scc1 sample1]$ less sample1.fileset.txt
[username@scc1 sample1]$ mv sample1.fileset.txt ../
[username@scc1 sample1]$ cd ..
[username@scc1 data]$ ls
LICENSE sample1 sample2.chr1.bam sample2.chr4.bam sample3.chr1.bam sample3.chr4.bam
README sample1.fileset.txt sample2.chr2.bam sample2.chr5.bam sample3.chr2.bam sample3.chr5.bam
report.html sample1.log sample2.chr3.bam sample2.log sample3.chr3.bam sample3.log
Example CLI Task (cont.)
[username@scc1 data]$ ls
LICENSE sample1 sample2.chr1.bam sample2.chr4.bam sample3.chr1.bam sample3.chr4.bam
README sample1.fileset.txt sample2.chr2.bam sample2.chr5.bam sample3.chr2.bam sample3.chr5.bam
report.html sample1.log sample2.chr3.bam sample2.log sample3.chr3.bam sample3.log
[username@scc1 data]$ mkdir sample2
[username@scc1 data]$ mv sample2.chr*.bam sample2
[username@scc1 data]$ mkdir sample3
[username@scc1 data]$ mv sample3.chr*.bam sample3
[username@scc1 data]$ ls
LICENSE report.html sample1.fileset.txt sample2 sample2.log sample3.fileset.txt sample4 sample4.log
README sample1 sample1.log sample2.fileset.txt sample3 sample3.log sample4.fileset.txt
[username@scc1 data]$ mkdir logs
[username@scc1 data]$ mv sample*.log logs/
[username@scc1 data]$ rm LICENSE
rm: remove regular empty file 'LICENSE'? y
[username@scc1 data]$ rm README
rm: remove regular empty file 'README'? y
[username@scc1 data]$ ls
logs sample1 sample2 sample3 sample4
report.html sample1.fileset.txt sample2.fileset.txt sample3.fileset.txt sample4.fileset.txt
Command-line Interface
● Difficult to read
● One-directional / Non-reproducible
○ What did I do last time?
○ What should someone do next time?
● Manual
● Potentially error-prone
● Wasn’t really that fast
Write a Script Instead
reorgData.sh
#!/bin/bash scc1 $ ls data
# Take datadir from input LICENSE sample1.chr5.bam sample2.log
datadir=$1 README sample1.log sample3.chr1.bam
report.html sample2.chr1.bam sample3.chr2.bam
cd $datadir
sample1.chr1.bam sample2.chr2.bam sample3.chr3.bam
# Detect number of samples sample1.chr2.bam sample2.chr3.bam sample3.chr4.bam
numSamples=$(ls sample*.bam | cut -d. -f1 | uniq | wc -l)
sample1.chr3.bam sample2.chr4.bam sample3.chr5.bam
# Reorg sample files into sample dirs sample1.chr4.bam sample2.chr5.bam sample3.log
for sampleNum in $(seq 1 $numSamples); do
mkdir sample$sampleNum
mv sample$sampleNum*.chr*.bam sample$sampleNum/ scc1 $ bash reorgData.sh data/
ls sample$sampleNum > sample$sampleNum.filelist.txt
done scc1 $ ls data
# Organize Logs logs sample1 sample2 sample3
mkdir logs report.html sample1.files sample2.files sample3.files
mv sample*.log logs/
scc1 $ ls -l
Files can be made “executable” on their own. drwxr-sr-x 6 cjahnke scv 32768 Jun 1 2:36 data
-rw-r--r-- 1 cjahnke scv 453 Jun 1 2:37 reorgData.sh
To do this, we need to:
scc1 $ chmod +x reorgData.sh
scc1 $
Variables
Environment Variables
scc1 $ echo $USER
● Contain environment configuration cjahnke
○ Typically for the shell, but other programs can scc1 $ echo $PWD
set their own.
/usr3/bustaff/cjahnke
● Created automatically when logged in.
scc1 $ echo $HOSTNAME
● Scope is global scc1
○ Other programs can read/use them to know
how to behave.
scc1 $ env
MODULE_VERSION_STACK=3.2.10
XDG_SESSION_ID=c8601
HOSTNAME=scc1
TERM=xterm
SHELL=/bin/bash
● Type “env” to see the full list. HISTSIZE=1000
TMPDIR=/scratch
SSH_CLIENT=128.197.161.56 55982 22
...
scc1 $ myvar=foo
scc1 $ echo $myvar
Shell Variables foo
scc1 $ myvar=bar
scc1 $ echo $myvar
● A character string to which a user bar
● lowercase
○ Effective for simple scripts, hard to read if names are complicated (e.g. $mynewvar).
● Under_scores
○ Common alternative to spaces (e.g. $my_new_var). Bash does not accept hyphens.
● camelCase
○ Capitalization patterns are concise and easy enough to read (e.g $myNewVar).
Using variables: The dollar sign and quotes
scc1 $ hi=Hello
● No quote
○ Simple. Bash shell interprets variable scc1 $ echo $hi
● Escape Special Character (“\”) Hello
○ The “$” is special and indicates a variable in
Bash. The “\” escapes special behavior and scc1 $ echo \$hi
instructs bash to treat it as a character. $hi
● Single Quote
scc1 $ echo '$hi'
○ Literal. Exactly the contents.
$hi
● Double Quote
○ Interpreted. Allows variable expansion. scc1 $ echo "$hi"
Hello
Using Variables: Strings, spaces, and quotes
scc1 $ hello0=Hello World
-bash: World: command not found
scc1 $ echo $hello0
Hello
Spaces are special too
scc1 $ hello1=Hello\ World
● We can escape (“\”) the special behavior
scc1 $ echo $hello1
● Or we can quote the string. Hello World
○ Single or double quotes are effectively the
same if there is nothing to be interpreted. scc1 $ hello2='Hello World'
scc1 $ echo $hello2
Hello World
scc1 $
Handling Arguments
Command-line Arguments in Bash
The command used to start a bash script passes the command information to the
script as variables when it runs. This information is accessed through numbered
variables where the “#” is the index of the information.
Note: only 9 arguments are captured; after that, you need to be creative.
Simple Command Line Argument Example
cli_arg.sh Terminal
#!/bin/bash scc1 $
* What they are actually used for is entirely dependent on the program
TERMINAL
written files
Keyboard
Process 1
STDOUT
2 Display
STDERR
Standard Out & Standard Error
scc1 $ man
scc1 $ man What manual page do you want?
● Example:
[cjahnke@scc1 ~]$ cat sample.vcf | cut -f1,2,7 | sort -k3
#CHROM POS ID REF ... #CHROM POS FILTER #CHROM POS FILTER
3 14370 rs6054257 G ... 3 14370 PASS 1 1110696 PASS
2 17330 . T ... 2 17330 q10 3 1230237 PASS
1 1110696 rs6040355 A ... 1 1110696 PASS 3 14370 PASS
3 1230237 . T ... 3 1230237 PASS 6 1234567 PASS
6 1234567 microsat1 GTCT ... 6 1234567 PASS 2 17330 q10
Redirection
● The “>” symbol redirects the standard output (default) of a command to a file.
Redirection Description
COMMAND < filename Input - Directs a file
COMMAND << stream Input - Directs a stream literal
COMMAND <<< string Input - Directs a string
COMMAND > filename Output - Writes output to file (will “clobber”)
COMMAND >> filename Output - Appends output to file
● Example:
[cjahnke@scc1 ~]$ cat sample.vcf | cut -f1,2,7 | sort -k3 > sorted.txt
Many characters use or modify this behavior
● A < file Use the contents of file as input for A
● B > file Create a new file and write the standard out of B there (overwrites)
● C >> file If file exists append standard out of C to file, if file does not exist create it
● D 2> file Create a new file and write the standard err of D there
● E &> file Combined the standard error and standard out and write to file
● F|G Use the standard out of F as the standard in of G
● H |& K Combine the standard out and err of H and use as the standard in of K
● M | tee file Write the standard out of M to both the terminal and to file
scc1 $
For Loop (In Practice)
Let’s iterate on something more interesting
scc1 $ bash forloop1.sh a b c
● Input Items can be called with $@ a
b
c
#!/bin/bash
scc1 $ bash forloop1.sh a "b c" d
# This loop iterates over input items
a
for input in "$@"; do b c
echo "$input" d
done
For Loop (In Practice)
#!/bin/bash scc1 $ bash forloop2.sh ~/bash
forloop2.sh
# This script takes one argument, a
# directory, and prints the basename of
# contents. /usr3/bustaff/cjahnke/bash
forloop1.sh
echo $0 forloop2.sh
echo "" myscript.sh
echo $1
https://google.github.io/styleguide/shell.xml#Loops
Conditional Constructs
● test “[[ .. ]]”
○ Evaluates expression inside brackets and returns 0 (TRUE) or 1 (FALSE)
● if
○ Executes commands following conditional logic.
● case
○ Selectively execute commands corresponding to pattern matching.
○ Like if/then statements, but usually used for parsing inputs and determining flow.
● select
○ Used for creating user input/selectable menus, executes commands on selection.
● Arithmetic “(( .. ))”
○ Will perform arithmetic. Use caution, precision can be tricky.
Tests “[[ .. ]]”
Double square brackets return an exit status of scc1 $ [[ 1 == 1 ]] ; echo $?
0 (true) or 1* (false) depending on the 0
evaluation of the conditional expression inside.
scc1 $ [[ 1 == 2 ]] ; echo $?
1
● Standard Test
○ [[ expression ]]
scc1 $ [[ ! cow == dog ]]; echo $?
● Negative Test
0
○ [[ ! expression ]]
● AND Test
scc1 $ [[ 1 == 2 && cow == cow ]]; echo $?
○ [[ expression1 && expression2 ]]
1
● OR Test
○ [[ expression1 || expression2 ]]
scc1 $ [[ 1 == 1 || cow == dog ]]; echo $?
0
* Anything >=1 is considered false. Programs may have many possible exit codes. 0 is success, everything else is a descriptive error.
If Statement (Simple)
● An “if“ statement executes commands
based on conditional tests.
$ cp /projectnb/scv/bash_examples.tar .
$ tar xf bash_examples.tar
$ cd bash_examples
$ ls
done
done
returns:
Scripting in bash makes many many things much easier, like putting this
sentence together.
SCC Job Submission Example
using a loop to submit jobs on SCC with names.
$ for file in *_1.txt; do echo "$file" >>
step 1 create a file with the names
filenames.txt; done
$ cat filenames.txt
AG_1.txt
aA_1.txt
ab_1.txt
ac_1.txt
ad_1.txt
af_1.txt
ag_1.txt
ah_1.txt
ai_1.txt
aj_1.txt
order_1.txt
outof_1.txt
using a loop to submit jobs on SCC with names.
$ for file in *_1.txt; do echo "$file" >>
step 1 create a file with the names
filenames.txt; done
$ cat filenames.txt
step 2 get the number of filenames AG_1.txt
aA_1.txt
ab_1.txt
ac_1.txt
ad_1.txt
af_1.txt
ag_1.txt
ah_1.txt
ai_1.txt
aj_1.txt
order_1.txt
outof_1.txt
$ wc -l filenames.txt
12 filenames.txt
using a loop to submit jobs on SCC with names.
#!/bin/bash -l
step 1 create a file with the names
#$ -P tutorial
step 2 get the number of filenames
value1=$(cat "$1")
step 3 create a submission script that value2=$(cat "$2")
accepts inputs (remember to chmod +x)
valueNew=$(( $value1 + $value2 ))
step 5 submit
Getting Help
How to Get Help
Support Website
● http://rcs.bu.edu (http://www.bu.edu/tech/support/research/)
Upcoming Tutorials:
● http://rcs.bu.edu/tutorials
Email Direct:
● [email protected]
Questions?
http://rcs.bu.edu/eval