Unix Talk #2: AWK Overview Patterns and Actions Records and Fields Print vs. Printf
Unix Talk #2: AWK Overview Patterns and Actions Records and Fields Print vs. Printf
AWK overview
Patterns and actions
Records and fields
Print vs. printf
1
Introduction
2
awk
The first initials from the last names of each
of the authors, Aho, Weinberg and
Kernighan
Which awk are we tawking about?
– awk
– nawk – new awk ( on CS machines )
– gawk – GNU awk ( bart )
3
AWK syntax
awk ‘/pattern/’ file
awk ‘{action}’ file
awk ‘/pattern/ {action;}' file
cat file | awk ‘{action}’
Awk automatically reads in the file for you
line by line.
– No need to open/close file. (like in C or Java)
– pattern section FINDS LINES with that pattern
– action section does the actions you defined on
the lines it found
– The original file does not change.
4
Simple example
awk ‘{ print }’ fruit_prices
5
Simple example
awk ‘
/\$[0-9]*\.[0-9][0-9]*/ { print}
‘ fruit_prices
6
Action
Actions are specified by the programmers not just
print, delete, etc (p/d/s from sed). That is why it is
so awesome!
Actions consists of
– variable assignments,
– arithmetic and logic operators,
– decision structures,
– looping structures.
For example, print, if, while and for
awk ‘{print}’ filename
7
Execution types
format 1: awk ‘script’
– where INPUT must come from pipe or STDIN
– command | awk ‘script’
format 2: awk ‘script’ input1 input2 ... inputn
– where we supply input FILES as input1, input2, etc.
format 3: awk -f script_file input1...
(# in "script..." is comment)
8
Pattern
Types
– Regular expressions
– BEGIN
Do all the stuff BEFORE reading any input
– END
does all this stuff AFTER reading ALL input.
Pattern is optional
If no pattern is specified, the "action" will occur for EVERY
LINE one @ time.
awk ‘{Action}’ filename
awk '{print;}' names prints all lines
awk ‘BEGIN {print “The average grades”}’
9
Awk Regular Expression
Metacharacters
Supports
– ^, $, ., *, +, ?, [ABC], [^ABC],
– [A-Z], A|B, (AB)+, \, &
Not support
– Backreferencing, \( \)
– Repetition, \{ \}
10
awk ‘
BEGIN { actions ; }
/pattern/ { actions ; }
/pattern/ { actions ; }
END { actions ;}
‘ files
Execution steps:
1) If a BEGIN pattern is present, executes its actions
2) Reads an input line and parses it into fields
3) Compares each of the specified patterns against the input line,
if find a match, executes the actions. This step is repeated for
all patterns.
4) Repeats steps 2 and 3 while input lines are present
5) After the script reads all the input lines, if the END pattern is
present, executes its actions
11
Try This!
Place the following in the file tryawk1.awk
BEGIN { print "Starting to read input";
nLines = 0; }
/^.*$/ { nLines++; }
END { print “DONE: Total lines = “ nLines; }
– Run the command: cat tryawk1.awk |
awk –f tryawk1.awk
– Counts the # of lines in the input
nLines is a variable … note NO declaration, just use
print command prints a line of text, adds newline to
end of the line
12
Records and fields
awk has RECORDS (lines) and FIELDS
$0 represents the entire line of input
$1 represents the first field
Print just like echo
– Print $1 $2 # $1 concat $2
– Print $1, $2 # $1 OFS $2
cat fruit_prices
13
Examples
cat phones.data
John Robinson 234-3456
Yin Pan 123-4567
awk ‘{ print $1, $2, $3 }’ phones.data
John Robinson 234-3456
Yin Pan 123-4567
awk ‘{ print $2 “, ”, $1, $3 }’ phones.data
Robinson, John 234-3456
Pan, Yin 123-4567
awk ‘/^$/ { print x += 1 }’ phones.data
awk ‘/Mary/ { print $0 }’ phones.data
14
Examples (con’t)
ls -l | awk ‘
$6 == "Oct" { sum += $5 ; }
END { print sum ; }
‘
ls -l | awk -f block_use.awk
cat block_use.awk
$6 == "Oct" { sum += $5 ; }
END { print sum ; }
15
Taking Pattern-specific Actions
#!/bin/sh
awk ‘
/\$[1-9][0-9]*\.[0-9][0-9]*/ { print $0,”*”;}
/\$0\.[0-9][0-9]*/ { print ;}
‘ fruit_prices
16
Intrinsic variables
awk defines RECORDS (lines) and FIELDS
– FS, input field separator (default=space/tab)
– OFS, output field separator (default=space)
– ORS, Output record separator (default=newline)
– RS, Input record separator (default=newline)
– NR, number of the current record being processed
– NF, number of fields within current record
– FILENAME, awk sets this pattern to the name of the file
that it's currently reading. (If you have more than input
file, awk resets this pattern as it reads each file in turn.
17
How does awk work
awk ‘{print $1, $3}’ names
– Put a line of input to $0 based on RS
– The line is broken into fields based on FS and store
them in a numbered variable, starting with $1
– Prints the fields with print or others based on OFS to
separate fields
– After awk displays it output, it goes to next line and
repeat. The output lines are separated by ORS.
18
Changing the Input Field Separator
Manually resetting FS in a BEGIN pattern
– Forces you to hard code the value of the field separator
– BEGIN{FS=“:” ; }
– Example:
$ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd
19
Example
FirstName;LastName;Address;City;State;Zip;Phone
SSN:DOB:NumberOfDependents
HospitilizationCOde,DentalCode,LifeCOde
20
awk ‘BEGIN{OFS=“,”; FS=“;”}
{NR%3==1 {FS=“;”; #prepare
F=$1; L=$2; A=$3;…..}
NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…}
NR%3==0{FS=“,”;…;print F L A…}
}’ filename
21
Print vs. Printf.2
printf
– 1st argument is a string … the ‘format’
– Prints each character of the format
Upon reaching a %, the next few characters are a format
specifier
The next argument is printed according to the specifier
– Does not append a newline
– More control over appearance of output
– Consider
awk 'BEGIN { printf "%5.2f\n", 2/3; }'
Prints 0.67 (here, the represents a space)
%5.2f means print a fractional number (the ‘f’) in a field 5
characters wide, with 2 digits to the right of the decimal point.
22
Why Printf
printf - for formatting output of your
“print”
We have function print, why printf
– Printf allows us to FORMAT stuff.
– can FORCE printing of string
– Decimals
– whole numbers
– how many digits fall on either side of
decimal pt
– scientific notation
– make things line up nicely
23
printf
printf (format, what to print)
printf ( "%s", x)
– %s is a PLACEHOLDER for some OUTPUT.
– s is a specific type of output (string)
– ONE item (%s), must have ONE thing to print in the "what to print“
– format inside of quotes, followed by comma, followed by variables
outside the quotes to print.
24
Printf format
s = A character string
f = A floating point number
d or i= the integer part of a decimal number
g or e = scientific notation of a floating point
c = An ASCII character
if x=65 and I use this print statement
printf ( " s = %c ", x )
output is "s = A“
awk 'BEGIN{x=65; printf("char: %c\n", x)}'
25
Printf
More control:
– %wd
Print an integer out in a field of width w
If the number is smaller than w characters, print
leading spaces
Try awk 'BEGIN { printf "%10d\n", 10; }' /dev/null
– Try to add a ‘-’ immediately after the %
Left justifies the value in the field
26
Printf
%ws
– Print a string out in a field of width w
– Supply leading spaces as necessary
Place a ‘-’ immediately after the % to get left
justification
27
Printf
%w.df
– Prints the value out in a field of width w
– Places the decimal point d places from the right
end
– Place a ‘-’ immediately after the % to get left
justification
28
Printf examples
Apple 10 20 25
<---10----><-5-><-5-><-5->
awk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ file
awk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ file
awk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ file
29
Printf examples
Let’s put an average in there...
printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average )
Will provide RAW number ( as many decimals as the calculation
provides with 6 char’s to RIGHT of decimal)
printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average )
30
The OFMT variable
(stands for Output Formatting for
numbers)
A special awk variable
Control the printing of numbers when using
print function
awk ‘BEGIN{print 1.243434534;}’
awk ‘BEGIN{OFMT=“%.2f”; print
1.23344455;}’
31