0% found this document useful (0 votes)

100 views26 pages

UNIX II:grep, Awk, Sed: October 30, 2017

This document provides an overview of common Unix tools for file searching and manipulation - grep, sed, and awk. It describes how grep can be used to search for patterns in files using regular expressions. Sed is introduced as a stream editor that can perform search and replace operations on files. Finally, awk is summarized as a programming language useful for text file manipulation and able to perform floating point math. Various commands, operators, and built-in variables of awk are defined.

Uploaded by

rishimahajan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views26 pages

UNIX II:grep, Awk, Sed: October 30, 2017

Uploaded by

rishimahajan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

UNIX

II:grep, awk, sed

October 30, 2017
File searching and manipulation
• In many cases, you might have a file in which you
need to find specific entries (want to find each case
of NaN in your datafile for example)

• Or you want to reformat a long datafile (change

order of columns, or just use certain columns)

• Can be done with writing python or other scripts,

today will use other UNIX tools
grep: global regular expression print
• Use to search for a pattern and print them
• Amazingly useful! (a lot like Google)
grep
Basic syntax: >>> grep <pattern> <inputfile>

>>> grep Oklahoma one_week_eq.txt

2017-10-28T09:32:45.970Z,35.3476,-98.0622,5,2.7,mb_lg,,133,0.329,0.3,us,us1000ay0b,2017-10-28T09:47:05.040Z,"11km WNW of Minco,
Oklahoma",earthquake,1.7,1.9,0.056,83,reviewed,us,us
2017-10-28T04:08:45.890Z,36.2119,-97.2878,5,2.5,mb_lg,,41,0.064,0.32,us,us1000axz3,2017-10-28T04:22:21.040Z,"8km S of Perry,
Oklahoma",earthquake,1.4,2,0.104,24,reviewed,us,us

2017-10-27T18:39:28.100Z,36.4921,-98.7233,6.404,2.7,ml,,50,,0.41,us,us1000axpz,2017-10-28T02:02:23.625Z,"33km NW of Fairview,
Oklahoma",earthquake,1.3,2.6,,,reviewed,tul,tul

2017-10-27T10:00:07.430Z,36.2851,-97.506,5,2.8,mb_lg,,25,0.216,0.19,us,us1000axgi,2017-10-27T19:39:37.296Z,"19km W of Perry,
Oklahoma",earthquake,0.7,1.8,0.071,52,reviewed,us,us

2017-10-25T15:17:48.200Z,36.2824,-97.504,7.408,3.1,ml,,25,,0.23,us,us1000awq6,2017-10-25T21:38:59.678Z,"19km W of Perry,
Oklahoma",earthquake,1.1,5,,,reviewed,tul,tul
2017-10-25T11:05:21.940Z,35.4134,-97.0133,5,2.5,mb_lg,,157,0.152,0.31,us,us1000awms,2017-10-27T21:37:47.660Z,"7km ESE of McLoud,
Oklahoma",earthquake,1.7,2,0.117,19,reviewed,us,us
2017-10-25T01:50:53.100Z,36.9748,-99.4244,8.115,2.9,ml,,197,,0.64,us,us1000awir,2017-10-26T00:52:01.343Z,"23km NE of Buffalo,
Oklahoma",earthquake,2,7.6,,,reviewed,tul,tul
2017-10-24T23:18:09.000Z,35.3787,-98.0931,7.72,2.7,ml,,91,,0.49,us,us1000awhe,2017-10-26T00:47:37.010Z,"13km W of Union City,
Oklahoma",earthquake,2.4,5.7,,,reviewed,tul,tul

2017-10-23T15:57:10.890Z,36.6565,-97.8019,5,2.6,mb_lg,,39,0.2,0.15,us,us1000avxp,2017-10-23T18:30:47.642Z,"17km SSW of Medford,

Oklahoma",earthquake,1.2,1.8,0.132,15,reviewed,us,us
grep
• Lots of useful options available (read the man
page!)
• -w : look for a whole word
• -i : ignore case
• -v : omit matching lines
• -c: provide a count of matching lines
grep

What is a regular
expression?
Regular Expression
• Set of characters that specify a pattern

• Makes changing and searching for text easy just from the
command line.

• Regular expressions are accepted input for grep, sed, awk,

perl, vim and other unix commands.

• It’s all about syntax…. (and because it’s UNIX, it’s a little
cryptic)

• http://www.regular-expressions.info/quickstart.html
Simple Regular Expression Symbols
Generally a good idea to surround regular expression with single quotes on command line
to protect it from being interpreted by the shell.

• . (period) --- matches any single character

• B --- matches uppercase B
• b --- matches lowercase b
• * --- matches zero or more occurrences of preceding
character
• ^ --- goes to beginning of a line
• Example – search a file where # is used to comment lines
>>> grep ^# filename
Will pull out all the lines where # is the first character in line
• $ --- end of the line
Simple Regular Expression Symbols
• \ --- looking for a symbol

• [] --- matches member of the range within the brackets

• [^] --- matches anything except what’s in the bracket

• Non-printable characters:
• \t : for a tab character
• \r : for carriage return
• \n : for new line
• \s : for a white space
Sed – stream editor
• Command line tool for editing files line by line,
largely used for substitution
• Like grep for searching, but can replace found
pattern with something else
• Want to change every instance of mb to ml in my
file?
>>> sed s/mb/ml filename
Sed
>>> sed s/mb/ml filename

• Basic structure for substitution:

• s --- is the command that indicates substitution
• delimiter
• Can be anything you want, slash (/) is common, so is _ or :
• But if you need to search something that has a / will need to quote the slash
using backslash \
>>> sed 's/\/usr\/local\/bin/\/usr\/bin' file
Will change /usr/local/bin to /usr/bin for lines in file that contain /usr/local/bin
• regular expression or pattern to search for
• replacement

• If want to do a search and replace globally (in entire file), put “/g”
at end. Otherwise it will replace only the first instance found on
each line
>>> sed 's/\/usr\/local\/bin/\/usr\/bin/g' file
• Sed uses regular expressions, same as grep
awk
• Programming language available on most Unix-like OS
• Developed in 1970s (name comes from first letters of
last names of developers)
• Useful for manipulating text files
• One of the most useful unix tools you can develop
• Also able to do floating point math
• Structured as a sequence of patterns and then
actions do perform when patterns are found
• Used on text files: columns = fields; lines = records
awk vs nawk vs gawk
• Different versions exist
• awk – original
• nawk – “new awk”, version used on Macs as “awk”
• gawk – GNU awk, standard on linux, compatible
with awk and nawk. Can access this on Macs as
well – use “gawk” or set an alias for it

• A few minor differences in syntax between versions

Using awk
• Can call it from the command line:
>>> awk [options] ‘{commands}’ variables infile
>>> awk –f scriptfile variables infile

• Or create an executable awk script

• File contains:
#!/usr/bin/awk
some set of commands
>>> chmod +x test.awk
>>> ./test.awk
awk and text
• awk commands are applied to every record (=line) of a
file

• it is designed to separate the data in each line into a

field (=column)

• essentially, each field becomes a member of an array so

that the first field is $1, second field $2, third field $3 …

• $0 refers to the entire record

awk: Field separators
• the default field separator is one or more white
spaces
$1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11
1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 ehb
• the field separator may be modified by resetting
the FS built in variable
• Example:

Separator is “:”, so reset it.

awk - print
• One of the most common commands used in awk
scripts is print

• awk is not sensitive to white space in the commands

>>> awk –F”:” ‘{ print $1 $3}’ /etc/passwd
nobody-2

• two solutions to this

>>> awk –F”:” ‘{ print $1 “ “ $3}’ /etc/passwd
>>> awk –F”:” ‘{ print $1, $3}’ /etc/passwd
nobody -2
• any string or numeric text can be explicitly output
using “”

Assume a starting file like so:

1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x

>>> awk '{print "latitude:",$9,"longitude:",$10,"depth:",$11}’ earthquake.txt

latitude: -1.698 longitude: 98.298 depth: 15.0
latitude: 9.599 longitude: 92.802 depth: 30.0
latitude: 4.003 longitude: 94.545 depth: 20.0
1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x

• you can specify a newline in two ways

>>> awk '{print "latitude:",$9; print "longitude:",$10}’ earthquake.txt
>>> awk '{print "latitude:",$9”\n”,”longitude:",$10}’ earthquake.txt

latitude: -1.698
longitude: 98.298
awk and if
• If statements are very useful in awk:
awk and math
• Big advantage – it does floating point math (remember bash does not)
• it stores all variables as strings, but when math operators are applied, it converts the
strings to floating point numbers if the string consists of numeric characters
• All basic arithmetic is left to right associative

• + : addition
• - : subtraction
• * : multiplication
• / : division
• % : remainder or modulus
• ^ : exponent
• other standard C programming operators
• Assignment operators
• = : set variable equal to value on right
• += : set variable equal to itself plus the value on right
• -= : set variable equal to itself minus the value on right
• *= : set variable equal to itself times the value on right
• /= : set variable equal to itself divided by value on right
• %= : set variable equal to the remainder of itself divided by the value on the right
• ^= : set variable equal to the itself to the exponent following the equal sign
awk relational operators
• Returns 1 if true and 0 if false
• All relational operators are left to right associative

• < : test for less than

• <= : test for less than or equal to
• > : test for greater than
• >= : test for greater than or equal to
• == : test for equal to
• != : test for not equal
awk logical operators
• Boolean operators return 1 for true and 0 for false
• && : logical AND; tests that both expressions are true
• left to right associative

• || : logical OR ; tests that one or both of the expressions are

true
• left to right associative

• ! : logical negation; tests that expression is

true
Useful awk built-in variables
• FS: Field Separator (separates columns)
• NR: record number (line number)
• OFS : output field separator
• Default is whitespace
• ORS : output record separator
• Default is \n (newline)
• OFMT : output format for numbers
• NF : number of fields in the current record
Using variables in awk
• 1. Assign the shell variables to awk variables after the body
of the script, but before you specify the input file
awk '{print v1, v2, NF, NR}' v1=$VAR1 file1 v2=$VAR2 file2

• 2. Use the -v switch to assign the shell variables to awk

variables.
awk -v v1=$VAR1 -v v2=$VAR2 '{print v1, v2}' input_file
More awk …
• Developing more complex programs in awk
• Use of for loops, while loops, if/then/else
• Format output
• Define functions
• Matching regular expressions

• Worth spending time exploring websites/books on

awk functionality – it will likely become one of your
most used tools.

AWK Command in Unix
No ratings yet
AWK Command in Unix
6 pages
Abinitio Interview Questions
No ratings yet
Abinitio Interview Questions
10 pages
Shell Scripting
No ratings yet
Shell Scripting
33 pages
Ab Initio - V1.5
No ratings yet
Ab Initio - V1.5
162 pages
Some Questions For Ab Initio Certification
No ratings yet
Some Questions For Ab Initio Certification
3 pages
Factory Lecture 4 in 1
No ratings yet
Factory Lecture 4 in 1
185 pages
Senarios50 (Autosaved)
No ratings yet
Senarios50 (Autosaved)
61 pages
Basic SQL Queries (P-3)
No ratings yet
Basic SQL Queries (P-3)
18 pages
Ab Initio - V1.3
No ratings yet
Ab Initio - V1.3
37 pages
Psionics Augmented - Compilation 2
100% (5)
Psionics Augmented - Compilation 2
88 pages
Master - Linux - Monthly Assignemnts, Question Answers, Commands, 64 Interview Q-A
No ratings yet
Master - Linux - Monthly Assignemnts, Question Answers, Commands, 64 Interview Q-A
42 pages
Interview Quetions
No ratings yet
Interview Quetions
4 pages
Autosys Job Management - Unix User Guide
75% (4)
Autosys Job Management - Unix User Guide
454 pages
Teradata Utilities Breaking The Barriers
No ratings yet
Teradata Utilities Breaking The Barriers
128 pages
©2010, Cognizant Technology Solutions Confidential
No ratings yet
©2010, Cognizant Technology Solutions Confidential
31 pages
Ramana@scenarios
No ratings yet
Ramana@scenarios
15 pages
Ab Initio - V1.6
No ratings yet
Ab Initio - V1.6
50 pages
30 Examples For Awk Command in Text Processing
No ratings yet
30 Examples For Awk Command in Text Processing
25 pages
Linux Questions & Answers On Bash Shell Programming - Sanfoundry
No ratings yet
Linux Questions & Answers On Bash Shell Programming - Sanfoundry
9 pages
Home Work Chapter 1 To 7: Book: Business Logistics/Supply Chain Management Ronald H. Ballou
100% (1)
Home Work Chapter 1 To 7: Book: Business Logistics/Supply Chain Management Ronald H. Ballou
35 pages
26 Months Exp ETL-ABinitio Vidushi Gupta UPD
No ratings yet
26 Months Exp ETL-ABinitio Vidushi Gupta UPD
4 pages
Air & M Commands in Ab Initio
No ratings yet
Air & M Commands in Ab Initio
2 pages
Ab Initio FAQ S Part1
No ratings yet
Ab Initio FAQ S Part1
51 pages
Abinitio L2
No ratings yet
Abinitio L2
5 pages
Ab Initio
100% (1)
Ab Initio
128 pages
Inglés Gramática 3º Eso Sandra
No ratings yet
Inglés Gramática 3º Eso Sandra
7 pages
Important Abinitio
No ratings yet
Important Abinitio
2 pages
Dynamic File Generation
No ratings yet
Dynamic File Generation
24 pages
AB Initio Online Training By: PMR IT Solutions
No ratings yet
AB Initio Online Training By: PMR IT Solutions
9 pages
AbInitio Questions
No ratings yet
AbInitio Questions
2 pages
Resume On Ab-Initio
No ratings yet
Resume On Ab-Initio
2 pages
Welcome To Ab Initio
No ratings yet
Welcome To Ab Initio
36 pages
Abinitio Commands
No ratings yet
Abinitio Commands
2 pages
Unix Commands
100% (1)
Unix Commands
7 pages
Unix - Shell Scripting
100% (1)
Unix - Shell Scripting
96 pages
Unix Interview Questions Leave A Comment
No ratings yet
Unix Interview Questions Leave A Comment
6 pages
Unix Interview Questions
100% (10)
Unix Interview Questions
16 pages
Detailed UNIX Commands
No ratings yet
Detailed UNIX Commands
15 pages
19 Teradata Interview Questions and Answers For Experienced
No ratings yet
19 Teradata Interview Questions and Answers For Experienced
3 pages
Broadcast Replicate
No ratings yet
Broadcast Replicate
2 pages
Eme Faq
No ratings yet
Eme Faq
1 page
Documentary Requirement Checklist SGLG
100% (1)
Documentary Requirement Checklist SGLG
6 pages
Cisco SD-WAN: High Availability and Redundancy
No ratings yet
Cisco SD-WAN: High Availability and Redundancy
4 pages
What Is Apprenticeship
No ratings yet
What Is Apprenticeship
14 pages
Teradata Bteq, Mload, Fload, Fexport, Tpump and Sampling
No ratings yet
Teradata Bteq, Mload, Fload, Fexport, Tpump and Sampling
49 pages
Woodside Genealogy
100% (1)
Woodside Genealogy
59 pages
Abinitio Online Training
No ratings yet
Abinitio Online Training
6 pages
Learning Episode 12-"Selecting Non-Digital or Conventional Resources and Instructional Materials"
No ratings yet
Learning Episode 12-"Selecting Non-Digital or Conventional Resources and Instructional Materials"
6 pages
Ab Initio
No ratings yet
Ab Initio
17 pages
NLopt Tutorial - AbInitio
No ratings yet
NLopt Tutorial - AbInitio
13 pages
Newgen Management Trainee: Oracle Technical Orientation Program
No ratings yet
Newgen Management Trainee: Oracle Technical Orientation Program
41 pages
Abinitio
100% (1)
Abinitio
28 pages
Twitter Case
No ratings yet
Twitter Case
10 pages
Teradata Performance Tuning
No ratings yet
Teradata Performance Tuning
18 pages
M Command Faq
No ratings yet
M Command Faq
1 page
EME Developer Guide
No ratings yet
EME Developer Guide
2 pages
Key-Change Vs Key-Specifier in Ab Initio
No ratings yet
Key-Change Vs Key-Specifier in Ab Initio
1 page
Abinitio Intew
No ratings yet
Abinitio Intew
8 pages
Linux Shell
No ratings yet
Linux Shell
14 pages
Abintio 2
No ratings yet
Abintio 2
4 pages
AB-INITIO Developer: Learning Made Easy!
No ratings yet
AB-INITIO Developer: Learning Made Easy!
4 pages
Philippine National Police: Id Application Form (PNP Dependent)
No ratings yet
Philippine National Police: Id Application Form (PNP Dependent)
1 page
Teradata Basics Exam - Sample Question Set 1 (Answers in Italic Font)
No ratings yet
Teradata Basics Exam - Sample Question Set 1 (Answers in Italic Font)
5 pages
Re: Name The Air Commands in Ab Initio? Answer #
No ratings yet
Re: Name The Air Commands in Ab Initio? Answer #
4 pages
Namdeo Dhasal, A Poet of The Underworld
No ratings yet
Namdeo Dhasal, A Poet of The Underworld
5 pages
AB Initio Online Training Course Introduction To Abinitio
No ratings yet
AB Initio Online Training Course Introduction To Abinitio
7 pages
Thesis Dissertation Urology
100% (3)
Thesis Dissertation Urology
7 pages
2015 - Perusing Talara
No ratings yet
2015 - Perusing Talara
13 pages
AOP Lecture Sheet 04
No ratings yet
AOP Lecture Sheet 04
8 pages
Ab Initio Faqs - v03
No ratings yet
Ab Initio Faqs - v03
8 pages
Statement of Account: Date Narration Chq./Ref - No. Value DT Withdrawal Amt. Deposit Amt. Closing Balance
No ratings yet
Statement of Account: Date Narration Chq./Ref - No. Value DT Withdrawal Amt. Deposit Amt. Closing Balance
26 pages
Design and Modeling of Zvs Resonantsepic Converter For High Frequencyapplications
No ratings yet
Design and Modeling of Zvs Resonantsepic Converter For High Frequencyapplications
8 pages
Design For Assembly Thesis
100% (3)
Design For Assembly Thesis
6 pages
Agar Diferencial WL PDF
No ratings yet
Agar Diferencial WL PDF
2 pages
Top 25 Unix Interview Questions With Answers
No ratings yet
Top 25 Unix Interview Questions With Answers
8 pages
Percakapan Bhs Inggris Talking About Friend
No ratings yet
Percakapan Bhs Inggris Talking About Friend
4 pages
WESM-DRM-8.0 07jan2022 (Final)
No ratings yet
WESM-DRM-8.0 07jan2022 (Final)
76 pages
GEO CH 4 6th Worksheet
No ratings yet
GEO CH 4 6th Worksheet
4 pages
Grace Bible Church: Glorifying God by Making Disciples of Jesus Christ
No ratings yet
Grace Bible Church: Glorifying God by Making Disciples of Jesus Christ
20 pages
Optimization of MACD and RSI Indicators: An Empirical Study of Indian Equity Market For Profitable Investment Decisions
No ratings yet
Optimization of MACD and RSI Indicators: An Empirical Study of Indian Equity Market For Profitable Investment Decisions
13 pages
Rosenbloom 2013
No ratings yet
Rosenbloom 2013
4 pages
Alternative Proposal 20160912 - Mtentu (Rev.2)
No ratings yet
Alternative Proposal 20160912 - Mtentu (Rev.2)
17 pages
Adr Project Final PDF
No ratings yet
Adr Project Final PDF
5 pages
Xavier-Kuangchi Exemplary Alumni: Objectives
No ratings yet
Xavier-Kuangchi Exemplary Alumni: Objectives
5 pages
SNC1W Flame Test Lab
No ratings yet
SNC1W Flame Test Lab
4 pages
Mukta
No ratings yet
Mukta
1 page
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
From Everand
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
equitypress
3.5/5 (2)
ORACLE 12C Complete Self-Assessment Guide
From Everand
ORACLE 12C Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet

UNIX II:grep, Awk, Sed: October 30, 2017

Uploaded by

UNIX II:grep, Awk, Sed: October 30, 2017

Uploaded by

UNIX

II:grep, awk, sed

• Or you want to reformat a long datafile (change

• Can be done with writing python or other scripts,

>>> grep Oklahoma one_week_eq.txt

2017-10-23T15:57:10.890Z,36.6565,-97.8019,5,2.6,mb_lg,,39,0.2,0.15,us,us1000avxp,2017-10-23T18:30:47.642Z,"17km SSW of Medford,

• Regular expressions are accepted input for grep, sed, awk,

• . (period) --- matches any single character

• [] --- matches member of the range within the brackets

• [^] --- matches anything except what’s in the bracket

• Basic structure for substitution:

• A few minor differences in syntax between versions

• Or create an executable awk script

• it is designed to separate the data in each line into a

• essentially, each field becomes a member of an array so

• $0 refers to the entire record

Separator is “:”, so reset it.

• awk is not sensitive to white space in the commands

• two solutions to this

Assume a starting file like so:

>>> awk '{print "latitude:",$9,"longitude:",$10,"depth:",$11}’ earthquake.txt

• you can specify a newline in two ways

• < : test for less than

• || : logical OR ; tests that one or both of the expressions are

• ! : logical negation; tests that expression is

• 2. Use the -v switch to assign the shell variables to awk

• Worth spending time exploring websites/books on

You might also like