Linux CMD AWK
Linux CMD AWK
Administration
CST 3523
AWK
3
AWK (cont’d)
• awk gets it’s input from
– Files
– Redirection and pipes
– Directly from a standard input
• There are several ways to run an awk program
– awk ‘program’ input_file(s)
• program and input_files are provided as command-line
arguments (program is enclosed in single quotation marks ‘ and ’)
– awk ‘program’
• program is a command-line argument; input is taken from
standard input (yes, awk can work as a filter!)
– awk -f program_file_name input_file(s)
• program is read from a file called program_file_name
4
AWK (cont’d)
• AWK recognizes/processes
– File which:
• Consists records - the lines of the file
• Operates on one record at a time
– Record
• Consists fields - words separated by any number of
spaces or tabs
– Field
• Field 1 is accessed with $1, field 2 with $2, …
• $0 refers to the whole record (whole line)
• AWK processes fields
– sed (to be discussed) processes lines
5
AWK (cont’d)
1
• File 2 3
• Record/line
• Field 1
2
3
4
5
6
.
.
.
6
Structure of an AWK Program
• awk program
– Sequence of statements
pattern { action }
pattern { action }
– Pattern (selector)
• Determines whether the action is to be executed
• Could be:
– Regular expressions
– Arithmetic relational expressions
– String-valued expressions
– Boolean combinations
9
AWK Variables
• $0 - Current record as a single variable
• $1, $2, … - Fields in the current record
• NF - Number of fields in the current record (no $ needed)
• NR – Record/line number of the current record
• FILENAME - name of current input file
• FS - Input field separator (space or TAB by default)
• RS - Input record separator (NEWLINE by default)
• OFS - Output field separator (space by default)
• ORS - Output record separator (NEWLINE by default)
Operators
• = assignment operator; sets a variable equal to a
value or string
• == equality operator; returns TRUE if both sides
are equal
• != not-equal operator
• && logical AND
• || logical OR
• ! logical NOT
• <, >, <=, >= relational operators
• +, -, /, *, %, ^
• String concatenation
AWK Variables (cont’d)
$1
• Record/line $2 $3
• Field
NR 1
NR 2
NR 3
NR 4
NR 5
NR 6
.
.
.
12
User Variables
• Should begin with a letter that can be followed alpha
numeric characters or underscore
• Keywords cannot be used as an AWK variable
• It is better to initialize AWK variables in BEGIN section
• Whether an AWK variable is to be treated as a number
or as a string depends on the context it is used in
• E.g., BEGIN {
total=0; }
{
item_no=$1; book=$2; book_amount=$3*$4;
total=total+book_amount;
print item_no," ", book,"\t","$"bookamount;
}
END { print "Total Amount = $" total; }
AWK Arrays
• AWK allows one-dimensional arrays to store
numbers or strings
– Index can be number or string (associative array)
arrayName[index] = value
arr[3]="value"
grade["Korn"]=40.3
17
Example of Actions
• print statement makes at least one line of output
$ gawk 'BEGIN { print "line one\n line
two\nline three" }'
or
$ gawk '{ print $1, $2 }'
• Print current record vs. empty line
{ print } vs. { print “”}, respectively
• Enclose strings in double quotation marks but not referred
fields
$ gawk '{ print $1 " and " $2 }'
• Use comma to separate variables to enforce spacing
$ gawk '{ print $1, $2 }'
vs 18
$ gawk '{ print $1 $2 }'
Selection with AWK
• AWK patterns are good for selecting specific lines
from the input for further processing
• Selection by Comparison
$2 >=5 { print }
• Selection by Computation
$2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) }
• Combinations of Patterns
$2 >= 4 || $3 >= 20
Data Validation with AWK
• Validating data is a common operation
• AWK is excellent at data validation
NF != 3 { print $0, “number of fields not equal to 3” }
$2 < 3.35 { print $0, “rate is below minimum wage” }
$2 > 10 { print $0, “rate exceeds $10 per hour” }
$3 < 0 { print $0, “negative hours worked” }
$3 > 60 { print $0, “too many hours worked” }
Regular Expressions in AWK
• Awk uses the regular expressions
– ^ $ - beginning of/end of string
– . - any character
– [abcd] - character class
– [^abcd] - negated character class
– [a-z] - range of characters
– (regex1|regex2) – alternation
– * - zero or more occurrences of preceding expression
– + - one or more occurrences of preceding expression
– ? - zero or one occurrence of preceding expression
Computing with AWK
• Counting is easy to do with Awk
$3 > 15 { emp = emp + 1}
END { print emp, “employees worked more than 15 hrs”}
• for
for (init; condition; increment) {
commands
}
• Example:
BEGIN {print "What is your first name and major? "
while (getline > 0)
print "Hi", $1 ", your major is", $2 "."
}
Using getline With File and Pipe
• getline with a file
BEGIN { while (getline <"emp.data" >0)
print $0 }