0% found this document useful (0 votes)
80 views32 pages

Linux CMD AWK

This document discusses the AWK programming language. AWK is a pattern scanning and processing language that works with text and numbers. It can generate reports, filter text, and perform other tasks. The document covers the basics of AWK including its structure, variables, arrays, patterns, actions, and examples of tasks it can perform like selection, validation, computation, and handling text.

Uploaded by

Jace Hill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views32 pages

Linux CMD AWK

This document discusses the AWK programming language. AWK is a pattern scanning and processing language that works with text and numbers. It can generate reports, filter text, and perform other tasks. The document covers the basics of AWK including its structure, variables, arrays, patterns, actions, and examples of tasks it can perform like selection, validation, computation, and handling text.

Uploaded by

Jace Hill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Task Automation in System

Administration

CST 3523

AWK

Including materials from Chapters 14 of


A Practical Guide to Linux® Commands, Editors,
and Shell Programming , 3rd Edition
AWK: Pattern-scanning and Processing
Language
• Data-driven paradigm (AWK, SED (to be discussed))
– Describes the data to be matched
– Describes processing to be performed on that data
Procedural (imperative) paradigm (C, Fortran, PL/SQL)
Sequence of steps
Object oriented paradigm (Java, C#)
All computations are carried by objects

• AWK works with text and numbers


– Generate reports
– Filter text
2
AWK (cont’d)
• AWK (Aho Weinberger and Kernighan)
– Originally implemented under UNIX as awk
• GNU implementation of awk
– gawk
• Fast (stripped-out) version of awk
– mawk
• gawk, awk, mawk
– Support the same set of core basic functions
– Take input from file(s), standard input, or over a network
– Outputs to STDOUT

3
AWK (cont’d)
• awk gets it’s input from
– Files
– Redirection and pipes
– Directly from a standard input
• There are several ways to run an awk program
– awk ‘program’ input_file(s)
• program and input_files are provided as command-line
arguments (program is enclosed in single quotation marks ‘ and ’)
– awk ‘program’
• program is a command-line argument; input is taken from
standard input (yes, awk can work as a filter!)
– awk -f program_file_name input_file(s)
• program is read from a file called program_file_name
4
AWK (cont’d)
• AWK recognizes/processes
– File which:
• Consists records - the lines of the file
• Operates on one record at a time
– Record
• Consists fields - words separated by any number of
spaces or tabs
– Field
• Field 1 is accessed with $1, field 2 with $2, …
• $0 refers to the whole record (whole line)
• AWK processes fields
– sed (to be discussed) processes lines
5
AWK (cont’d)
1
• File 2 3

• Record/line
• Field 1
2
3
4
5
6
.
.
.

6
Structure of an AWK Program
• awk program
– Sequence of statements
pattern { action }
pattern { action }
– Pattern (selector)
• Determines whether the action is to be executed
• Could be:
– Regular expressions
– Arithmetic relational expressions
– String-valued expressions
– Boolean combinations

– Action (what to do)


• Statement terminated by the newlines character
7
Structure of an AWK Program (cont’d)
• BEGIN segment
– Comprises actions executed only once BEGIN{action}
before the file is processed
pattern {action}
– BEGIN segment’s code is enclosed
with braces {} pattern {action}
• Pattern {action} pair(s) .
– Both pattern and action are optional,
but one or the other is required .
– Default pattern match to each record
– Default action is to print
.
• END segment pattern { action}
– Comprises actions executed only once
after the file is processed
END {action}
– END segment’s code is enclosed with
braces {} 8
Begin and End
• Special pattern
– BEGIN matches before the first input line is read
– END matches after the last input line has been read
• This allows for initial and wrap-up processing
BEGIN { print “NAME RATE HOURS”; print “” }
{ print }
END { print “total number of employees is”, NR }

9
AWK Variables
• $0 - Current record as a single variable
• $1, $2, … - Fields in the current record
• NF - Number of fields in the current record (no $ needed)
• NR – Record/line number of the current record
• FILENAME - name of current input file
• FS - Input field separator (space or TAB by default)
• RS - Input record separator (NEWLINE by default)
• OFS - Output field separator (space by default)
• ORS - Output record separator (NEWLINE by default)
Operators
• = assignment operator; sets a variable equal to a
value or string
• == equality operator; returns TRUE if both sides
are equal
• != not-equal operator
• && logical AND
• || logical OR
• ! logical NOT
• <, >, <=, >= relational operators
• +, -, /, *, %, ^
• String concatenation
AWK Variables (cont’d)
$1
• Record/line $2 $3

• Field
NR 1
NR 2
NR 3
NR 4
NR 5
NR 6
.
.
.

12
User Variables
• Should begin with a letter that can be followed alpha
numeric characters or underscore
• Keywords cannot be used as an AWK variable
• It is better to initialize AWK variables in BEGIN section
• Whether an AWK variable is to be treated as a number
or as a string depends on the context it is used in
• E.g., BEGIN {
total=0; }
{
item_no=$1; book=$2; book_amount=$3*$4;
total=total+book_amount;
print item_no," ", book,"\t","$"bookamount;
}
END { print "Total Amount = $" total; }
AWK Arrays
• AWK allows one-dimensional arrays to store
numbers or strings
– Index can be number or string (associative array)
arrayName[index] = value
arr[3]="value"
grade["Korn"]=40.3

– Does not need to declare:


• its size
• its elements
• array elements are created when first used (but could
be initialized to 0 or “”)
AWK Arrays (cont’d)
• AWK provides arrays for storing groups of related
data values
– Reverse - print input in reverse order by line

{ line[NR] = $0 } # remember each line


END { i = NR # print lines in reverse order
while (i > 0) {
print line[i]
i=i-1
}
}
Examples of Patterns
• BEGIN and END
• expressions
$3 < 100
$4 == “Asia”
• string-matching
/regex/ - e.g., /^.*$/
string - e.g., abc (literal)
• compound (&& and ||)
$3 < 100 && $4 == “Asia”
• && is a logical AND
• || is a logical OR
16
Examples of Patterns (cont’d)
• Range of lines/records (comma)
– NR == 10, NR == 20
• matches records 10 through 20 inclusive

• Patterns can take any of above forms


• For /regex/ and string
– Patterns match the first instance in the record

17
Example of Actions
• print statement makes at least one line of output
$ gawk 'BEGIN { print "line one\n line
two\nline three" }'
or
$ gawk '{ print $1, $2 }'
• Print current record vs. empty line
{ print } vs. { print “”}, respectively
• Enclose strings in double quotation marks but not referred
fields
$ gawk '{ print $1 " and " $2 }'
• Use comma to separate variables to enforce spacing
$ gawk '{ print $1, $2 }'
vs 18
$ gawk '{ print $1 $2 }'
Selection with AWK
• AWK patterns are good for selecting specific lines
from the input for further processing
• Selection by Comparison
$2 >=5 { print }

• Selection by Computation
$2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) }

• Selection by Text Content


$1 == “Susie”
/Susie/

• Combinations of Patterns
$2 >= 4 || $3 >= 20
Data Validation with AWK
• Validating data is a common operation
• AWK is excellent at data validation
NF != 3 { print $0, “number of fields not equal to 3” }
$2 < 3.35 { print $0, “rate is below minimum wage” }
$2 > 10 { print $0, “rate exceeds $10 per hour” }
$3 < 0 { print $0, “negative hours worked” }
$3 > 60 { print $0, “too many hours worked” }
Regular Expressions in AWK
• Awk uses the regular expressions
– ^ $ - beginning of/end of string
– . - any character
– [abcd] - character class
– [^abcd] - negated character class
– [a-z] - range of characters
– (regex1|regex2) – alternation
– * - zero or more occurrences of preceding expression
– + - one or more occurrences of preceding expression
– ? - zero or one occurrence of preceding expression
Computing with AWK
• Counting is easy to do with Awk
$3 > 15 { emp = emp + 1}
END { print emp, “employees worked more than 15 hrs”}

• Computing Sums and Averages is also simple


{ pay = pay + $2 * $3 }
END { print NR, “employees”
print “total pay is”, pay
print “average pay is”, pay/NR
}
Handling Text in AWK
• One major advantage of Awk is its ability to handle
strings as easily as many languages handle
numbers
• Awk variables can hold strings of characters as well
as numbers, and Awk conveniently translates back
and forth as needed
• This program finds the employee who is paid the
most per hour:
$2 > maxrate { maxrate = $2; maxemp = $1 }
END { print “highest hourly rate:”, maxrate, “for”, maxemp }
Control Flow Statements
• AWK provides several control flow statements for
making decisions and writing loops
• if-else
if (condition){
commands1
}
else {
commands2
}
– where commands1 and/or commands2 can be
multiple statements enclosed in curly braces { }s
– the else and associated commands2 are optional
Control Flow Statements (cont’d)
• while
while (condition) {
commands
}

• for
for (init; condition; increment) {
commands
}

– (;;) is an infinite loop


String Operations
• String Concatenation
– New strings can be created by combining old ones
{ names = names $1 “ “ }
END { print names }

• Printing the Last Input Line


– Although NR retains its value after the last input line
has been read, $0 does not
Built-In String Functions
Function Description

gsub(r, s) substitute s for r globally in $0, return


number of substitutions made
gsub(r, s, t) substitute s for r globally in string t, return
number of substitutions made
index(s, t) return first position of string t in s, or 0 if t is
not present
length(s) return number of characters in s

match(s, r) test whether s contains a substring matched


by r, return index or 0
sprint(fmt, expr-list) return expr-list formatted according to format
string fmt
Built-In String Functions (cont’d)
Function Description

split(s, a) split s into array a on FS, return number of


fields
split(s, a, fs) split s into array a on field separator fs,
return number of fields
sub(r, s) substitute s for the leftmost longest
substring of $0 matched by r
sub(r, s, t) substitute s for the leftmost longest
substring of t matched by r
substr(s, p) return suffix of s starting at position p

substr(s, p, n) return substring of s of length n starting at


position p
Getting Input From the Command Line
• getline function provides input capabilities
– Used to read input from either the current input or
from a file or pipe
– Returns 1 if a record was present, 0 if an end-of-file
was encountered, and –1 if some error occurred

• Example:
BEGIN {print "What is your first name and major? "
while (getline > 0)
print "Hi", $1 ", your major is", $2 "."
}
Using getline With File and Pipe
• getline with a file
BEGIN { while (getline <"emp.data" >0)
print $0 }

• getline with a pipe


BEGIN { while ("who" | getline) {
nr++}
print "There are", nr, "people logged on right
now.“ }
getline Function
Expression Sets

getline $0, NF, NR, FNR

getline var var, NR, FNR

getline <"file" $0, NF

getline var <"file" var

"cmd" | getline $0, NF

"cmd" | getline var var


Built-In Arithmetic Functions
Function Return Value
atan2(y,x) arctangent of y/x (-p to p)
cos(x) cosine of x, with x in radians
sin(x) sine of x, with x in radians
exp(x) exponential of x, ex
int(x) integer part of x
log(x) natural (base e) logarithm of x
rand() random number between 0
and 1
srand(x) new seed for rand()
sqrt(x) square root of x

You might also like