0% found this document useful (0 votes)
83 views

UNIX II:grep, Awk, Sed: October 30, 2017

This document provides an overview of common Unix tools for file searching and manipulation - grep, sed, and awk. It describes how grep can be used to search for patterns in files using regular expressions. Sed is introduced as a stream editor that can perform search and replace operations on files. Finally, awk is summarized as a programming language useful for text file manipulation and able to perform floating point math. Various commands, operators, and built-in variables of awk are defined.

Uploaded by

rishimahajan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

UNIX II:grep, Awk, Sed: October 30, 2017

This document provides an overview of common Unix tools for file searching and manipulation - grep, sed, and awk. It describes how grep can be used to search for patterns in files using regular expressions. Sed is introduced as a stream editor that can perform search and replace operations on files. Finally, awk is summarized as a programming language useful for text file manipulation and able to perform floating point math. Various commands, operators, and built-in variables of awk are defined.

Uploaded by

rishimahajan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIX

II:grep, awk, sed


October 30, 2017
File searching and manipulation
• In many cases, you might have a file in which you
need to find specific entries (want to find each case
of NaN in your datafile for example)

• Or you want to reformat a long datafile (change


order of columns, or just use certain columns)

• Can be done with writing python or other scripts,


today will use other UNIX tools
grep: global regular expression print
• Use to search for a pattern and print them
• Amazingly useful! (a lot like Google)
grep
Basic syntax: >>> grep <pattern> <inputfile>

>>> grep Oklahoma one_week_eq.txt


2017-10-28T09:32:45.970Z,35.3476,-98.0622,5,2.7,mb_lg,,133,0.329,0.3,us,us1000ay0b,2017-10-28T09:47:05.040Z,"11km WNW of Minco,
Oklahoma",earthquake,1.7,1.9,0.056,83,reviewed,us,us
2017-10-28T04:08:45.890Z,36.2119,-97.2878,5,2.5,mb_lg,,41,0.064,0.32,us,us1000axz3,2017-10-28T04:22:21.040Z,"8km S of Perry,
Oklahoma",earthquake,1.4,2,0.104,24,reviewed,us,us

2017-10-27T18:39:28.100Z,36.4921,-98.7233,6.404,2.7,ml,,50,,0.41,us,us1000axpz,2017-10-28T02:02:23.625Z,"33km NW of Fairview,
Oklahoma",earthquake,1.3,2.6,,,reviewed,tul,tul

2017-10-27T10:00:07.430Z,36.2851,-97.506,5,2.8,mb_lg,,25,0.216,0.19,us,us1000axgi,2017-10-27T19:39:37.296Z,"19km W of Perry,
Oklahoma",earthquake,0.7,1.8,0.071,52,reviewed,us,us

2017-10-25T15:17:48.200Z,36.2824,-97.504,7.408,3.1,ml,,25,,0.23,us,us1000awq6,2017-10-25T21:38:59.678Z,"19km W of Perry,
Oklahoma",earthquake,1.1,5,,,reviewed,tul,tul
2017-10-25T11:05:21.940Z,35.4134,-97.0133,5,2.5,mb_lg,,157,0.152,0.31,us,us1000awms,2017-10-27T21:37:47.660Z,"7km ESE of McLoud,
Oklahoma",earthquake,1.7,2,0.117,19,reviewed,us,us
2017-10-25T01:50:53.100Z,36.9748,-99.4244,8.115,2.9,ml,,197,,0.64,us,us1000awir,2017-10-26T00:52:01.343Z,"23km NE of Buffalo,
Oklahoma",earthquake,2,7.6,,,reviewed,tul,tul
2017-10-24T23:18:09.000Z,35.3787,-98.0931,7.72,2.7,ml,,91,,0.49,us,us1000awhe,2017-10-26T00:47:37.010Z,"13km W of Union City,
Oklahoma",earthquake,2.4,5.7,,,reviewed,tul,tul

2017-10-23T15:57:10.890Z,36.6565,-97.8019,5,2.6,mb_lg,,39,0.2,0.15,us,us1000avxp,2017-10-23T18:30:47.642Z,"17km SSW of Medford,


Oklahoma",earthquake,1.2,1.8,0.132,15,reviewed,us,us
grep
• Lots of useful options available (read the man
page!)
• -w : look for a whole word
• -i : ignore case
• -v : omit matching lines
• -c: provide a count of matching lines
grep

What is a regular
expression?
Regular Expression
• Set of characters that specify a pattern

• Makes changing and searching for text easy just from the
command line.

• Regular expressions are accepted input for grep, sed, awk,


perl, vim and other unix commands.

• It’s all about syntax…. (and because it’s UNIX, it’s a little
cryptic)

• http://www.regular-expressions.info/quickstart.html
Simple Regular Expression Symbols
Generally a good idea to surround regular expression with single quotes on command line
to protect it from being interpreted by the shell.

• . (period) --- matches any single character


• B --- matches uppercase B
• b --- matches lowercase b
• * --- matches zero or more occurrences of preceding
character
• ^ --- goes to beginning of a line
• Example – search a file where # is used to comment lines
>>> grep ^# filename
Will pull out all the lines where # is the first character in line
• $ --- end of the line
Simple Regular Expression Symbols
• \ --- looking for a symbol

• [] --- matches member of the range within the brackets

• [^] --- matches anything except what’s in the bracket

• Non-printable characters:
• \t : for a tab character
• \r : for carriage return
• \n : for new line
• \s : for a white space
Sed – stream editor
• Command line tool for editing files line by line,
largely used for substitution
• Like grep for searching, but can replace found
pattern with something else
• Want to change every instance of mb to ml in my
file?
>>> sed s/mb/ml filename
Sed
>>> sed s/mb/ml filename

• Basic structure for substitution:


• s --- is the command that indicates substitution
• delimiter
• Can be anything you want, slash (/) is common, so is _ or :
• But if you need to search something that has a / will need to quote the slash
using backslash \
>>> sed 's/\/usr\/local\/bin/\/usr\/bin' file
Will change /usr/local/bin to /usr/bin for lines in file that contain /usr/local/bin
• regular expression or pattern to search for
• replacement

• If want to do a search and replace globally (in entire file), put “/g”
at end. Otherwise it will replace only the first instance found on
each line
>>> sed 's/\/usr\/local\/bin/\/usr\/bin/g' file
• Sed uses regular expressions, same as grep
awk
• Programming language available on most Unix-like OS
• Developed in 1970s (name comes from first letters of
last names of developers)
• Useful for manipulating text files
• One of the most useful unix tools you can develop
• Also able to do floating point math
• Structured as a sequence of patterns and then
actions do perform when patterns are found
• Used on text files: columns = fields; lines = records
awk vs nawk vs gawk
• Different versions exist
• awk – original
• nawk – “new awk”, version used on Macs as “awk”
• gawk – GNU awk, standard on linux, compatible
with awk and nawk. Can access this on Macs as
well – use “gawk” or set an alias for it

• A few minor differences in syntax between versions


Using awk
• Can call it from the command line:
>>> awk [options] ‘{commands}’ variables infile
>>> awk –f scriptfile variables infile

• Or create an executable awk script


• File contains:
#!/usr/bin/awk
some set of commands
>>> chmod +x test.awk
>>> ./test.awk
awk and text
• awk commands are applied to every record (=line) of a
file

• it is designed to separate the data in each line into a


field (=column)

• essentially, each field becomes a member of an array so


that the first field is $1, second field $2, third field $3 …

• $0 refers to the entire record


awk: Field separators
• the default field separator is one or more white
spaces
$1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11
1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 ehb
• the field separator may be modified by resetting
the FS built in variable
• Example:

Separator is “:”, so reset it.


awk - print
• One of the most common commands used in awk
scripts is print

• awk is not sensitive to white space in the commands


>>> awk –F”:” ‘{ print $1 $3}’ /etc/passwd
nobody-2

• two solutions to this


>>> awk –F”:” ‘{ print $1 “ “ $3}’ /etc/passwd
>>> awk –F”:” ‘{ print $1, $3}’ /etc/passwd
nobody -2
• any string or numeric text can be explicitly output
using “”

Assume a starting file like so:


1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x

>>> awk '{print "latitude:",$9,"longitude:",$10,"depth:",$11}’ earthquake.txt


latitude: -1.698 longitude: 98.298 depth: 15.0
latitude: 9.599 longitude: 92.802 depth: 30.0
latitude: 4.003 longitude: 94.545 depth: 20.0
1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x

• you can specify a newline in two ways


>>> awk '{print "latitude:",$9; print "longitude:",$10}’ earthquake.txt
>>> awk '{print "latitude:",$9”\n”,”longitude:",$10}’ earthquake.txt

latitude: -1.698
longitude: 98.298
awk and if
• If statements are very useful in awk:
awk and math
• Big advantage – it does floating point math (remember bash does not)
• it stores all variables as strings, but when math operators are applied, it converts the
strings to floating point numbers if the string consists of numeric characters
• All basic arithmetic is left to right associative

• + : addition
• - : subtraction
• * : multiplication
• / : division
• % : remainder or modulus
• ^ : exponent
• other standard C programming operators
• Assignment operators
• = : set variable equal to value on right
• += : set variable equal to itself plus the value on right
• -= : set variable equal to itself minus the value on right
• *= : set variable equal to itself times the value on right
• /= : set variable equal to itself divided by value on right
• %= : set variable equal to the remainder of itself divided by the value on the right
• ^= : set variable equal to the itself to the exponent following the equal sign
awk relational operators
• Returns 1 if true and 0 if false
• All relational operators are left to right associative

• < : test for less than


• <= : test for less than or equal to
• > : test for greater than
• >= : test for greater than or equal to
• == : test for equal to
• != : test for not equal
awk logical operators
• Boolean operators return 1 for true and 0 for false
• && : logical AND; tests that both expressions are true
• left to right associative

• || : logical OR ; tests that one or both of the expressions are


true
• left to right associative

• ! : logical negation; tests that expression is


true
Useful awk built-in variables
• FS: Field Separator (separates columns)
• NR: record number (line number)
• OFS : output field separator
• Default is whitespace
• ORS : output record separator
• Default is \n (newline)
• OFMT : output format for numbers
• NF : number of fields in the current record
Using variables in awk
• 1. Assign the shell variables to awk variables after the body
of the script, but before you specify the input file
awk '{print v1, v2, NF, NR}' v1=$VAR1 file1 v2=$VAR2 file2

Or

• 2. Use the -v switch to assign the shell variables to awk


variables.
awk -v v1=$VAR1 -v v2=$VAR2 '{print v1, v2}' input_file
More awk …
• Developing more complex programs in awk
• Use of for loops, while loops, if/then/else
• Format output
• Define functions
• Matching regular expressions

• Worth spending time exploring websites/books on


awk functionality – it will likely become one of your
most used tools.

You might also like