0% found this document useful (0 votes)
10 views19 pages

% Sed - N - e 1,50p' Datafile % Head - 50 Datafile: Linux Programming

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views19 pages

% Sed - N - e 1,50p' Datafile % Head - 50 Datafile: Linux Programming

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Linux Programming

% sed -n -e ‘1,50p’ datafile


% head -50 datafile

AWK
WHAT IS AWK?
 created by: Aho, Weinberger, and Kernighan
 scripting language used for manipulating data and generating reports
 versions of awk
 awk, nawk, mawk, pgawk, …
 GNU awk: gawk
What can you do with awk?
 awk operation:
 scans a file line by line
 splits each input line into fields
 compares input line/fields to pattern
 performs action(s) on matched lines
 Useful for:
 transform data files
 produce formatted reports
 Programming constructs:
 format output lines
 arithmetic and string operations
 conditionals and loops

CREC,Dept.Of MCA Page 71


Linux Programming

The Command:
awk

Basic awk Syntax


 awk [options] ‘script’ file(s)
 awk [options] –f scriptfile file(s)
Options:
-F to change input field separator
-f to name script file
Basic awk Program
 consists of patterns & actions:
pattern {action}
 if pattern is missing, action is applied to all lines
 if action is missing, the matched line is printed
 must have either pattern or action
Example:
awk '/for/' testfile
 prints all lines containing string “for” in testfile
BASIC TERMINOLOGY: INPUT FILE
 A field is a unit of data in a line
 Each field is separated from the other fields by the field separator
 default field separator is whitespace
 A record is the collection of fields in a line
CREC,Dept.Of MCA Page 72
Linux Programming

 A data file is made up of records


Example Input File

Buffers

 awk supports two types of buffers:


record and field
 field buffer:
 one for each fields in the current record.
 names: $1, $2, …
 record buffer :
 $0 holds the entire record
Some System Variables
FS Field separator (default=whitespace)
RS Record separator (default=\n)
NF Number of fields in current record
NR Number of the current record
OFS Output field separator (default=space)
ORS Output record separator (default=\n)
FILENAME Current filename
Example: Records and Fields
% cat emps

CREC,Dept.Of MCA Page 73


Linux Programming

Tom Jones 4424 5/12/66 543354


Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
% awk '{print NR, $0}' emps
1 Tom Jones 4424 5/12/66 543354
2 Mary Adams 5346 11/4/63 28765
3 Sally Chang 1654 7/22/54 650000
4 Billy Black 1683 9/23/44 336500
Example: Space as Field Separator
% cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
% awk '{print NR, $1, $2, $5}' emps
1 Tom Jones 543354
2 Mary Adams 28765
3 Sally Chang 650000
4 Billy Black 336500
Example: Colon as Field Separator
% cat em2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk -F: '/Jones/{print $1, $2}' em2
Tom Jones 4424

AWK SCRIPTS
 awk scripts are divided into three major parts:

CREC,Dept.Of MCA Page 74


Linux Programming

 comment lines start with #


awk Scripts
 BEGIN: pre-processing
 performs processing that must be completed before the file processing starts (i.e.,
before awk starts reading records from the input file)
 useful for initialization tasks such as to initialize variables and to create report
headings
 BODY: Processing
 contains main processing logic to be applied to input records
 like a loop that processes input data one record at a time:
 if a file contains 100 records, the body will be executed 100 times, one for
each record
 END: post-processing
 contains logic to be executed after all input data have been processed
 logic such as printing report grand total should be performed in this part of the
script
Pattern / Action Syntax

CREC,Dept.Of MCA Page 75


Linux Programming

Categories of Patterns

Expression Pattern types


 match
 entire input record
regular expression enclosed by ‘/’s
 explicit pattern-matching expressions
~ (match), !~ (not match)
 expression operators
 arithmetic
 relational
 logical
Example: match input record
% cat employees2

CREC,Dept.Of MCA Page 76


Linux Programming

Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk –F: '/00$/' employees2
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
Example: explicit match
% cat datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
% awk '$5 ~ /\.[7-9]+/' datafile
southwest SW Lewis Dalsass 2.7 .8 2 18
central CT Ann Stephens 5.7 .94 5 13
Examples: matching with REs
% awk '$2 !~ /E/{print $1, $2}' datafile
northwest NW
southwest SW
southern SO
north NO
central CT
% awk '/^[ns]/{print $1}' datafile
northwest
southwest
southern
southeast
northeast
north

ARITHMETIC OPERATORS
Operator Meaning Example
+ Add x+y
- Subtract x–y

CREC,Dept.Of MCA Page 77


Linux Programming

* Multiply x*y
/ Divide x/y
% Modulus x%y
^ Exponential x^y
Example:
% awk '$3 * $4 > 500 {print $0}' file

Relational Operators
Operator Meaning Example
< Less than x<y
<= Less than or equal x<=y
== Equal to x == y
!= Not equal to x != y
> Greater than x>y
>= Greater than or equal to x>=y
~ Matched by reg exp x ~ /y/
!~ Not matched by req exp x !~ /y/

Logical Operators
Operator Meaning Example
&& Logical AND a && b
|| Logical OR a || b
! NOT !a
Examples:
% awk '($2 > 5) && ($2 <= 15) {print $0}' file
% awk '$3 == 100 || $4 > 50' file

RANGE PATTERNS
 Matches ranges of consecutive input lines
Syntax:
pattern1 , pattern2 {action}
 pattern can be any simple pattern
 pattern1 turns action on
 pattern2 turns action off
Range Pattern Example

CREC,Dept.Of MCA Page 78


Linux Programming

AWK ACTIONS

AWK EXPRESSIONS
 Expression is evaluated and returns value
 consists of any combination of numeric and string constants, variables, operators,
functions, and regular expressions
 Can involve variables
 As part of expression evaluation
 As target of assignment
awk variables

CREC,Dept.Of MCA Page 79


Linux Programming

 A user can define any number of variables within an awk script


 The variables can be numbers, strings, or arrays
 Variable names start with a letter, followed by letters, digits, and underscore
 Variables come into existence the first time they are referenced; therefore, they do not
need to be declared before use
 All variables are initially created as strings and initialized to a null string “”
awk Variables
Format:
variable = expression
Examples:

% awk '$1 ~ /Tom/


{wage = $3 * $4; print wage}' filename
% awk '$4 == "CA" {$4 = "California"; print $0}' filename

awk assignment operators


= assign result of right-hand-side expression to
left-hand-side variable
++ Add 1 to variable
-- Subtract 1 from variable
+= Assign result of addition
-= Assign result of subtraction
*= Assign result of multiplication
/= Assign result of division
%= Assign result of modulo
^= Assign result of exponentiation
Awk example
 File: grades
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
 awk script: average
# average five grades
{ total = $2 + $3 + $4 + $5 + $6
avg = total / 5
print $1, avg }
 Run as:
awk –f average grades
Output Statements
print

CREC,Dept.Of MCA Page 80


Linux Programming

print easy and simple output


printf
print formatted (similar to C printf)
sprintf
format string (similar to C sprintf)
Function: print
 Writes to standard output
 Output is terminated by ORS
 default ORS is newline
 If called with no parameter, it will print $0
 Printed parameters are separated by OFS,
 default OFS is blank
 Print control characters are allowed:
 \n \f \a \t \\ …
print example
% awk '{print}' grades
john 85 92 78 94 88
andrea 89 90 75 90 86
% awk '{print $0}' grades
john 85 92 78 94 88
andrea 89 90 75 90 86
% awk '{print($0)}' grades
john 85 92 78 94 88
andrea 89 90 75 90 86
Redirecting print output
 Print output goes to standard output
unless redirected via:
> “file”
>> “file”
| “command”
 will open file or command only once
 subsequent redirections append to already open stream
print Example
% awk '{print $1 , $2 > "file"}' grades
% cat file
john 85
andrea 89
jasper 84
% awk '{print $1,$2 | "sort"}' grades
andrea 89

CREC,Dept.Of MCA Page 81


Linux Programming

jasper 84
john 85
% awk '{print $1,$2 | "sort –k 2"}' grades
jasper 84
john 85
andrea 89
% date
Wed Nov 19 14:40:07 CST 2008
% date |
awk '{print "Month: " $2 "\nYear: ", $6}'
Month: Nov
Year: 2008
printf: Formatting output
Syntax:
printf(format-string, var1, var2, …)
 works like C printf
 each format specifier in “format-string” requires argument of matching type
Format specifiers
%d, %i decimal integer
%c single character
%s string of characters
%f floating point number
%o octal number
%x hexadecimal number
%e scientific floating point notation
%% the letter “%”
Format specifier examples
Given: x = ‘A’, y = 15, z = 2.3, and $1 = Bob Smith

Printf Format Specifier What it Does

%c printf("The character is %c \n", x)


output: The character is A

CREC,Dept.Of MCA Page 82


Linux Programming

%d printf("The boy is %d years old \n", y)


output: The boy is 15 years old

%s printf("My name is %s \n", $1)


output: My name is Bob Smith

%f printf("z is %5.3f \n", z)


output: z is 2.300

Format specifier modifiers


 between “%” and letter
%10s
%7d
%10.4f
%-20s
 meaning:
 width of field, field is printed right justified
 precision: number of digits after decimal point
 “-” will left justify
sprintf: Formatting text
Syntax:
sprintf(format-string, var1, var2, …)
 Works like printf, but does not produce output
 Instead it returns formatted string
Example:
{
text = sprintf("1: %d – 2: %d", $1, $2)
print text
}
AWK BUILTIN FUNCTIONS
tolower(string)
 returns a copy of string, with each upper-case character converted to lower-case.
Nonalphabetic characters are left unchanged.
Example: tolower("MiXeD cAsE 123")
returns "mixed case 123"
toupper(string)
 returns a copy of string, with each lower-case character converted to upper-case.

CREC,Dept.Of MCA Page 83


Linux Programming

awk Example: list of products


103:sway bar:49.99
101:propeller:104.99
104:fishing line:0.99
113:premium fish bait:1.00
106:cup holder:2.49
107:cooler:14.89
112:boat cover:120.00
109:transom:199.00
110:pulley:9.88
105:mirror:4.99
108:wheel:49.99
111:lock:31.00
102:trailer hitch:97.95
awk Example: output
Marine Parts R Us
Main catalog
Part-id name price
======================================
101 propeller 104.99
102 trailer hitch 97.95
103 sway bar 49.99
104 fishing line 0.99
105 mirror 4.99
106 cup holder 2.49
107 cooler 14.89
108 wheel 49.99
109 transom 199.00
110 pulley 9.88
111 lock 31.00
112 boat cover 120.00
113 premium fish bait 1.00
======================================
Catalog has 13 parts
awk Example: complete
BEGIN {
FS= ":"
print "Marine Parts R Us"
print "Main catalog"
print "Part-id\tname\t\t\t price"

CREC,Dept.Of MCA Page 84


Linux Programming

print "======================================"
}
{
printf("%3d\t%-20s\t%6.2f\n", $1, $2, $3)
count++
}
END {
print "======================================"
print "Catalog has " count " parts"
}
awk Array
 awk allows one-dimensional arrays
to store strings or numbers
 index can be number or string
 array need not be declared
 its size
 its elements
 array elements are created when first used
 initialized to 0 or “”
Arrays in awk
Syntax:
arrayName[index] = value
Examples:
list[1] = "one"
list[2] = "three"
list["other"] = "oh my !"
Illustration: Associative Arrays
 awk arrays can use string as index

Awk builtin split function

CREC,Dept.Of MCA Page 85


Linux Programming

split(string, array, fieldsep)


 divides string into pieces separated by fieldsep, and stores the pieces in array
 if the fieldsep is omitted, the value of FS is used.
Example:
split("auto-da-fe", a, "-")
 sets the contents of the array a as follows:
a[1] = "auto"
a[2] = "da"
a[3] = "fe"
Example: process sales data
 input file:

 output:
 summary of category sales
Illustration: process each input line

CREC,Dept.Of MCA Page 86


Linux Programming

Illustration: process each input line

Summary: awk program

Example: complete program


% cat sales.awk
{

CREC,Dept.Of MCA Page 87


Linux Programming

deptSales[$2] += $3
}
END {
for (x in deptSales)
print x, deptSales[x]
}
% awk –f sales.awk sales
Awk control structures
 Conditional
 if-else
 Repetition
 for
 with counter
 with array index
 while
 do-while
 also: break, continue
if Statement
Syntax:
if (conditional expression)
statement-1
else
statement-2
Example:
if ( NR < 3 )
print $2
else
print $3
for Loop
Syntax:
for (initialization; limit-test; update)
statement
Example:
for (i = 1; i <= NR; i++)
{
total += $i
count++
}
for Loop for arrays
Syntax:

CREC,Dept.Of MCA Page 88


Linux Programming

for (var in array)


statement
Example:
for (x in deptSales)
{
print x, deptSales[x]
}
while Loop
Syntax:
while (logical expression)
statement
Example:
i=1
while (i <= NF)
{
print i, $i
i++
}
do-while Loop
Syntax:
do
statement
while (condition)
 statement is executed at least once, even if condition is false at the beginning
Example:
i=1
do {
print $0
i++
} while (i <= 10)
loop control statements
 break

CREC,Dept.Of MCA Page 89

You might also like