Awk Programming
Awk Programming
Programming
WHAT IS AWK?
created by: Aho, Weinberger, and Kernighan
scripting language used for manipulating data
and generating reports
versions of awk
awk,
awk operation:
scans
Useful for:
transform
data files
produce formatted reports
Programming constructs:
format
output lines
arithmetic and string operations
conditionals and loops
Options:
-F to change input field separator
-f to name script file
Example:
awk '/for/' testfile
prints all lines containing string for in testfile
6
default
BUFFERS
record buffer :
$0
NF
NR
OFS
ORS
10
4424
5346
1654
1683
5/12/66
11/4/63
7/22/54
9/23/44
543354
28765
650000
336500
543354
28765
650000
336500
11
4424
5346
1654
1683
5/12/66
11/4/63
7/22/54
9/23/44
543354
28765
650000
336500
12
AWK SCRIPTS
awk scripts are divided into three major parts:
14
AWK SCRIPTS
BEGIN: pre-processing
performs
15
AWK SCRIPTS
BODY: Processing
records
like a loop that processes input data one record at a
time:
contains
16
AWK SCRIPTS
END: post-processing
contains
17
18
CATEGORIES OF PATTERNS
19
match
input record
regular expression enclosed by /s
explicit pattern-matching expressions
~ (match), !~ (not match)
expression operators
entire
arithmetic
relational
logical
20
% cat employees2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk F: '/00$/' employees2
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
21
3.0
.98
34
western
Sharon Gray
5.3
.97
23
southwest SW
Lewis Dalsass
2.7
.8
18
southern
Suan Chin
5.1
.95
15
southeast SE
Patricia Hemenway
4.0
.7
17
eastern
TB Savage
4.4
.84
20
northeast NE
AM Main
5.1
.94
13
north
NO
Margot Weber
4.5
.89
central
CT
Ann Stephens
5.7
.94
13
WE
SO
EA
northwest NW
Lewis Dalsass
2.7
.8
18
central
Ann Stephens
5.7
.94
13
CT
22
23
ARITHMETIC OPERATORS
Meaning
Add
Subtract
Multiply
Divide
Modulus
Exponential
Example
x+y
xy
x*y
x/y
x%y
x^y
Example:
% awk '$3 * $4 > 500 {print $0}' file
Operator
+
*
/
%
^
24
RELATIONAL OPERATORS
Meaning
Less than
Less than or equal
Equal to
Not equal to
Greater than
Greater than or equal to
Matched by reg exp
Not matched by req exp
Example
x<y
x<=y
x == y
x != y
x>y
x>=y
x ~ /y/
x !~ /y/
Operator
<
<=
==
!=
>
>=
~
!~
25
LOGICAL OPERATORS
Meaning
Logical AND
Logical OR
NOT
Example
a && b
a || b
!a
Examples:
% awk '($2 > 5) && ($2 <= 15)
{print $0}' file
% awk '$3 == 100 || $4 > 50' file
Operator
&&
||
!
26
RANGE PATTERNS
Syntax:
pattern1 , pattern2 {action}
27
28
AWK ACTIONS
29
AWK EXPRESSIONS
30
AWK VARIABLES
A user can define any number of variables within
an awk script
The variables can be numbers, strings, or arrays
Variable names start with a letter, followed by
letters, digits, and underscore
Variables come into existence the first time they
are referenced; therefore, they do not need to be
declared before use
All variables are initially created as strings and
initialized to a null string
31
AWK VARIABLES
Examples:
% awk '$1 ~ /Tom/
{wage = $3 * $4; print wage}'
filename
% awk '$4 == "CA"
{$4 = "California"; print $0}'
filename
Format:
variable = expression
32
++
-+=
-=
*=
/=
%=
^=
33
AWK EXAMPLE
File: grades
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
awk script: average
# average five grades
{ total = $2 + $3 + $4 + $5 + $6
avg = total / 5
print $1, avg }
Run as:
awk f average grades
34
OUTPUT STATEMENTS
print
print easy and simple output
printf
print formatted (similar to C printf)
sprintf
format string (similar to C sprintf)
35
FUNCTION: PRINT
Writes to standard output
Output is terminated by ORS
ORS is newline
default
OFS is blank
default
\f \a \t \\
36
PRINT EXAMPLE
PRINT EXAMPLE
38
PRINT EXAMPLE
39
> file
>> file
| command
40
PRINT EXAMPLE
% awk '{print $1 , $2 > "file"}' grades
CSCI 330 - The UNIX System
% cat file
john 85
andrea 89
jasper 84
41
PRINT EXAMPLE
john 85
% awk '{print $1,$2 | "sort k 2"}' grades
jasper 84
john 85
andrea 89
42
PRINT EXAMPLE
% date |
awk '{print "Month: " $2 "\nYear: ", $6}'
Month: Nov
Year: 2008
% date
Wed Nov 19 14:40:07 CST 2008
43
works
like C printf
each format specifier in format-string requires
argument of matching type
44
FORMAT SPECIFIERS
decimal integer
single character
string of characters
floating point number
octal number
hexadecimal number
scientific floating point notation
the letter %
%d, %i
%c
%s
%f
%o
%x
%e
%%
45
%c
%d
%s
%f
Printf Format
Specifier
46
width
47
Works
Example:
{
tolower(string)
returns a copy of string, with each upper-case
character converted to lower-case. Nonalphabetic
characters are left unchanged.
49
103:sway bar:49.99
101:propeller:104.99
104:fishing line:0.99
113:premium fish bait:1.00
106:cup holder:2.49
107:cooler:14.89
112:boat cover:120.00
109:transom:199.00
110:pulley:9.88
105:mirror:4.99
108:wheel:49.99
111:lock:31.00
102:trailer hitch:97.95
50
Marine Parts R Us
Main catalog
Part-id name
price
======================================
101
propeller
104.99
102
trailer hitch
97.95
103
sway bar
49.99
104
fishing line
0.99
105
mirror
4.99
106
cup holder
2.49
107
cooler
14.89
108
wheel
49.99
109
transom
199.00
110
pulley
9.88
111
lock
31.00
112
boat cover
120.00
113
premium fish bait
1.00
======================================
Catalog has 13 parts
51
}
{
printf("%3d\t%-20s\t%6.2f\n", $1, $2, $3)
count++
}
END {
FS= ":"
print "Marine Parts R Us"
print "Main catalog"
print "Part-id\tname\t\t\t price"
print "======================================"
is output sorted ?
print "======================================"
print "Catalog has " count " parts"
}
52
AWK ARRAY
awk allows one-dimensional arrays
to store strings or numbers
index can be number or string
size
its elements
to 0 or
53
ARRAYS IN AWK
Examples:
list[1] = "one"
list[2] = "three"
Syntax:
arrayName[index] = value
ILLUSTRATION: ASSOCIATIVE
ARRAYS
55
Example:
split("auto-da-fe", a, "-")
sets the contents of the array a as follows:
a[1] = "auto"
a[2] = "da"
a[3] = "fe"
divides
56
output:
summary
of category sales
57
58
59
60
% cat sales.awk
{
deptSales[$2] += $3
}
END {
for (x in deptSales)
print x, deptSales[x]
}
% awk f sales.awk sales
61
Format:
delete array_name [index]
Example:
delete deptSales["supplies"]
62
Conditional
Repetition
for
with counter
with array index
while
if-else
do-while
also:
break, continue
63
IF STATEMENT
Syntax:
Example:
if ( NR < 3 )
print $2
else
print $3
if (conditional expression)
statement-1
else
statement-2
64
FOR LOOP
Syntax:
Example:
for (i = 1; i <= NR; i++)
{
total += $i
count++
}
65
Example:
for (x in deptSales)
{
print x, deptSales[x]
}
66
WHILE LOOP
Syntax:
Example:
i = 1
while (i <= NF)
{
print i, $i
i++
}
67
DO-WHILE LOOP
Syntax:
statement
while (condition)
i = 1
do {
print $0
i++
} while (i <= 10)
do
68
continue
skips rest of current iteration, continues with
next iteration
69
for (x = 0; x <
if ( array[x]
printf "%d ",
if ( array[x]
}
70
1
2
3
4
5
71
2008-10-01/1/68
2008-10-02/2/6
2007-10-03/3/4
2008-10-04/4/25
2008-10-05/5/120
2008-10-01/1/89
2007-10-01/4/35
2008-11-01/5/360
2008-10-01/1/45
2007-12-01/1/61
2008-10-10/1/32
72
BEGIN {
printf("id\tSensor\n")
printf("----------------------\n")
}
{
printf("%d\t%s\n", $1, $2)
}
73
BEGIN {
FS="/"
printf(" Date\t\tValue\n
printf("---------------------\n")
}
{
printf("%s
%7.2f\n", $1, $3)
}
74
BEGIN {
FS="/"
}
{
sum[$2] += $3;
count[$2]++;
}
END {
for (i in sum) {
printf("%d %7.2f\n",i,sum[i]/count[i])
}
}
75
sensor names
76
Remaining tasks:
nature of input data
use: number of fields in record
substitute
recognize
sort
readings
use: sort gr k 2
77
EXAMPLE: SENSE.AWK
NF > 1 {
name[$1] = $2
}
NF < 2 {
split($0,fields,"/")
sum[fields[2]] += fields[3];
count[fields[2]]++;
}
END {
for (i in sum) {
printf("%15s %7.2f\n", name[i],
sum[i]/count[i]) | "sort -gr -k 2"
}
}
78
Remaining tasks:
use: sort -gr
Substitute sensor id with sensor name
1. use:
join -j 1 sensor-data sensor-averages
Sort
2. within awk
79
#! /bin/bash
trap '/bin/rm /tmp/report-*-$$; exit' 1 2 3
cat << HERE > /tmp/report-awk-1-$$
BEGIN {FS="/"}
{
sum[\$2] += \$3;
count[\$2]++;
}
END {
for (i in sum) {
printf("%d %7.2f\n", i, sum[i]/count[i])
}
}
HERE
80
81
awk -f /tmp/report-awk-1-$$
sensor-readings |
sort > /tmp/report-r-$$
82
EXAMPLE: OUTPUT
Sensor Average
----------------------Winddirection 240.00
Temperature
59.00
Windspeed
30.00
Rainfall
6.00
Snowfall
4.00
83
split(\$0,fields,"/")
sum[fields[2]] += fields[3];
count[fields[2]]++;
}
84
END {
for (i in sum) {
printf("%15s %7.2f\n", name[i],
sum[i]/count[i])
}
}
HERE
echo "
Sensor Average"
echo "-----------------------"
awk -f /tmp/report-awk-3-$$ sensor-data
sensor-readings | sort -gr -k 2
/bin/rm /tmp/report-*$$
85