0% found this document useful (0 votes)
106 views13 pages

Sed Awk Grep Bash

The document provides information about training sessions offered by the FAS Research Computing group at Harvard in Spring 2017. The training series covers topics ranging from introductory sessions on Odyssey, Harvard's research computing environment, to more advanced topics like Unix commands, regex, sed, awk, and bash scripting. Details are provided about specific sessions on Extended Unix, Modules and Software, Choosing Resources Wisely, and Parallel Job Workflows. The document also provides objectives and overviews of regex, grep, sed, and basic Unix commands to help participants learn about searching, pattern matching, and stream editing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views13 pages

Sed Awk Grep Bash

The document provides information about training sessions offered by the FAS Research Computing group at Harvard in Spring 2017. The training series covers topics ranging from introductory sessions on Odyssey, Harvard's research computing environment, to more advanced topics like Unix commands, regex, sed, awk, and bash scripting. Details are provided about specific sessions on Extended Unix, Modules and Software, Choosing Resources Wisely, and Parallel Job Workflows. The document also provides objectives and overviews of regex, grep, sed, and basic Unix commands to help participants learn about searching, pattern matching, and stream editing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Spring

2017

Extended Unix: sed, awk, grep, and


bash scripting basics
Scott Yockel, PhD
Harvard - Research Computing

What is Research Computing?


Faculty of Arts and Sciences (FAS) department that handles non-
enterprise IT requests from researchers. (Contact HUIT for most
Desktop, Laptop, networking, printing, and email issues.)
•  RC Primary Services:
–  Odyssey Supercomputing Environment
–  Lab Storage
–  Instrument Computing Support
–  Hosted Machines (virtual or physical)
•  RC Staff:
–  20 staff with backgrounds ranging from systems administration to
development-operations to Ph.D. research scientists.
–  Supporting 600 research groups and 3000+ users across FAS, SEAS,
HSPH, HBS, GSE.
–  For bio-informatics researchers the Harvard Informatics group is closely
tied to RC and is there to support the specific problems for that domain.

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 1
Spring 2017

FAS Research Computing


https://rc.fas.harvard.edu

Intro to Odyssey
Thursday, February 2nd 11:00AM – 12:00PM NWL 426

Intro to Unix
Thursday, February 16th 11:00AM – 12:00PM NWL 426

Extended Unix
Thursday, March 2nd 11:00AM – 12:00PM NWL 426

FAS Research Computing will be offering a Spring


Training series beginning February 2nd. This series will Modules and Software
include topics ranging from our Intro to Odyssey Thursday, March 16th 11:00AM – 12:00PM NWL 426
training to more advanced job and software topics.

In addition to training sessions, FASRC has a large Choosing Resources Wisely


offering of self-help documentation at 
 Thursday, March 30th 11:00AM – 12:00PM NWL 426
https://rc.fas.harvard.edu.

We also hold office hours every Wednesday from Troubleshooting Jobs


12:00PM-3:00PM at 38 Oxford, Room 206. 
 Thursday, April 6th 11:00AM – 12:00PM NWL 426
https://rc.fas.harvard.edu/office-hours

For other questions or issues, please submit a ticket on Parallel Job Workflows on Odyssey
the FASRC Portal https://portal.rc.fas.harvard.edu
 Thursday, April 20th 11:00AM – 12:00PM NWL 426

Or, for shorter questions, chat with us on Odybot 

https://odybot.rc.fas.harvard.edu Registration not required — limited seating.

https://rc.fas.harvard.edu 3

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 2
Spring 2017

Unix Command-Line Basics


•  Understanding the Terminal and Command-line:
•  STDIN, STDOUT, STDERR, |
•  env, ssh, exit, man, clear
•  Working with files/directories:
•  ls, mkdir, rmdir, cd, pwd, cp, rm, mv
•  scp, rsync, SFTP
•  Viewing files contents:
•  less
•  Searching with REGEXP – stdin/files:
•  *
•  Basic Linux System Commands:
•  which

Objectives
•  Unix commands for searching
–  REGEX
–  grep
–  sed
–  awk
•  Bash scripting basics
–  variable assignment
•  integers
•  strings
•  arrays
–  for loops

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 3
Spring 2017

REGEX - Regular Expression


•  Pattern matching for a certain amount of text
–  Single character: O
•  Odybot isn’t human
–  Character sets: [a-z]
•  Odybot isn’t human
–  Character sets: [aei]
•  Odybot isn’t human
–  Character sets: [0-9]
•  Odybot isn’t human
–  Non printable characters
•  \t : tab
•  \r : carriage return
•  \n : new line (Unix)
•  \r\n : new line (Windows)
•  \s : space
7

REGEX - Regular Expression


•  Pattern matching for a certain amount of text
–  Special Characters
•  . period or dot: match any character (except new line)
•  \ backslash: make next character literal
•  ^ caret: matches at the start of the line
•  $ dollar sign: matches at the end of line
•  * asterisk or star: repeat match
•  ? question mark: preceding character is optional
•  + plus sign:
•  ( ) parentheses: create a capturing group
•  [ ] square bracket: sequence of characters
–  also seen like [[:name:]] or [[.az.]]
•  { } curly brace: place bounds
–  {1,6}

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 4
Spring 2017

grep - GNU REGEX Parser


•  grep is a line by line parser of stdin and by default
displays matching lines to the regex pattern.
•  syntax:
–  using stdin: cat file | grep pattern
–  using files: grep pattern file
•  common options:
–  c : count the number of occurrences
–  m # : repeat match # times
–  R : recursively through directories
–  o : only print matching part of line
–  n : print the line number
–  v : invert match, print non-matching lines

sed - stream editor


•  sed takes a stream of stdin and pattern matches and
returns to stdout the replaced text.
–  Think amped-up Windows Find & Replace.
•  syntax:
–  using stdin: cat file | sed ‘command’
–  using files: sed ‘command’ file
–  common uses:
•  4d : delete line 4
•  2,4d : delete lines 2-4
•  2w foo : write line 2 to file foo
•  /here/d : delete line matching here
•  /here/,/there/d : delete lines matching here to there
•  s/pattern/text/ : switch text matching pattern
•  s/pattern/text/g: switch text matching pattern globally
•  /pattern/a\text : append line with text after matching pattern
•  /pattern/c\text : change line with text for matching pattern
10

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 5
Spring 2017

sed - Examples
•  Take the time to create abc.txt file below and try out examples
abc abc
def mno
ghi pqr
jkl sed ‘2,4d’ abc.txt stu
mno vwx
pqr yz
stu
vwx
yz

abc 123
def def
ghi ghi
sed ‘s/abc/123/’ abc.txt
jkl jkl
mno mno
pqr pqr
stu stu
vwx vwx
yz yz
11

Objectives
•  Unix commands for searching
–  REGEX
–  grep
–  sed
–  awk
•  Bash scripting basics
–  variable assignment
•  integers
•  strings
•  arrays
–  for loops

12

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 6
Spring 2017

awk
•  command/script language that turns text into records and fields
which can be selected to display as kind of an ad hoc database.
With awk you can perform many manipulations to these fields or
records before they are displayed.
•  syntax:
–  using stdin: cat file | awk ‘command’
–  using files: awk ‘command’ file
•  concepts:
–  Fields:
•  fields are separated by white space, or by regex FS.
•  The fields are denoted $1, $2, ..., while $0 refers to the entire line.
•  If FS is null, the input line is split into one field per character.
–  Records:
•  records are separated by \n (new line), or by regex RS.

13

awk
•  A pattern-action statement has the form:
pattern {action}

•  A missing {action} means print the line


•  A missing pattern always matches.

•  Pattern-action statements are separated by newlines or semicolons.


There are three separate action blocks:

BEGIN {action}
{action}
END {action}

14

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 7
Spring 2017

Simple awk example


alpha beta gamma
alpha.txt delta epsilon phi

awk ‘{print $1}’ alpha.txt alpha


delta

awk ‘{print $1, $3}’ alpha.txt alpha gamma


delta phi

15

awk - built in variables


•  The awk program has some internal environment variables that are
useful (more exist and change upon platform)
–  NF – number of fields in the current record
–  NR – ordinal number of the current record
–  FS – regular expression used to separate fields; also settable by option -Ffs
(default whitespace)
–  RS – input record separator (default newline)
–  OFS – output field separator (default blank)
–  ORS – output record separator (default newline)
alpha beta gamma
delta epsilon phi

alpha,gamma
awk '{OFS=",";print $1, $3}' alpha.txt delta,phi

lph
awk -Fa ‘{print $2}' alpha.txt epsilon phi
16

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 8
Spring 2017

awk - statements
•  An action is a sequence of statements. A statement can be one of
the following:
–  if (expression) statement [ else statement ]
–  while (expression) statement
–  for (expression ; expression ; expression) statement
–  for (var in array) statement
–  do statement while (expression)

alpha beta gamma


delta epsilon phi

epsilon
awk '{if (NR > 1) print $2}' alpha.txt

alpha beta gamma


awk '{if ($1 == "alpha") print}' alpha.txt
17

awk - variables
•  Using variables:
–  You can use the stock $1, $2, $3, … fields and set them to variables in the action
block.
alpha beta gamma
delta epsilon phi

awk '{if (NR == 1) a=$1; else b=$1}END{print a, b}' alpha.txt

alpha delta

awk '{if ($1 == "alpha") a=123; else b=456}


END{print a " + " b}' alpha.txt

123 + 456

awk '{if ($1 == "[a-z]") ; sum+=1}END{print "Total: " sum}' alpha.txt

Total: 2

18

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 9
Spring 2017

awk - mathematics
The operators in AWK,

+ addition, - subtraction, * multiplication, / division, and % modulus.

Assignment = += -= *= /= %= ^=.
•  Both absolute assignment (var = value) and operator-assignment
(the other forms) are supported.

Trigonomic function: cos(), sin(),


Roots: sqrt()

19

awk - formatted printing


•  awk accepts all standard printf statements
•  syntax: printf(“format”,expression list)
ps S -o pid,nlwp,%mem,rss,vsz,%cpu,cputime,args --forest -u $USER |\
awk '{pmem+=$3;rss+=$4;vsz+=$5; print $0}
END{printf("MEM SUM: %4.1f%% %3.1fGB %3.1fGB \n", pmem,rss/1028/1028,vsz/
1024/1024)}'

PID NLWP %MEM RSS VSZ %CPU TIME COMMAND


27536 1 0.0 2052 99920 0.0 00:00:00 sshd: syockel@pts/86
27548 1 0.0 2044 120932 0.3 00:00:00 \_ -bash
22905 1 0.0 1252 106100 0.0 00:00:00 \_ /bin/bash ./ps.sh
22908 1 0.0 1156 122668 6.0 00:00:00 \_ ps S -o pid,nlwp,
22909 1 0.0 896 105956 0.0 00:00:00 \_ awk {pmem+=$3;rss
26570 1 0.0 2008 99920 0.0 00:00:00 sshd: syockel@pts/81
26587 1 0.0 2052 120932 0.0 00:00:00 \_ -bash
24831 1 0.0 5088 149524 0.0 00:00:00 \_ vim user_chk.sh
MEM SUM: 0.0% 0.0GB 0.9GB
printf created END text

20

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 10
Spring 2017

Objectives
•  Unix commands for searching
–  REGEX
–  grep
–  sed
–  awk
•  Bash scripting basics
–  variable assignment
•  integers
•  strings
•  arrays
–  for loops

21

Shell Script Basics


•  To take advantage of cluster compute, you can predefine your
commands in a shell script file to be executed by a job scheduler.
–  bash: bourne again shell
–  csh: c-like shell
–  zsh: shell for modern times

#!/bin/bash sha-bang line defines the shell

# Setting vars # defines comments the remain line out


var1=input.txt
dir1=test.d Assign variables using “ = “ as either string or integer

# Executing commands
echo “Var 1 is set to: $var1” Use a variable with “$”
cd $dir1
pwd

22

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 11
Spring 2017

Shell Script Basics


•  If string contains whitespace, it must be included in double quotes.

#!/bin/bash

# Setting vars
var1=“1.txt 2.txt 3.txt 4.txt” string variable

# For loop
for i in $var1 ; do looping through each element in the string
echo $i
done

23

Shell Script Basics


•  Bash allows array variables

#!/bin/bash

j=0
for i in {01..05} ; do { } defines a range
j=$((j+1)) increment j
alpha[$j]=$i use j to index alpha array
echo ${alpha[*]} print all elements of alpha array
done

24

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 12
Spring 2017

Questions ???

Scott Yockel, PhD SIGHPC: BigData


Harvard - Research Computing Supercomputing’16

h-ps://rc.fas.harvard.edu/training/
spring-2017/ 13

You might also like