0% found this document useful (0 votes)
42 views16 pages

Introawk

Awk is a programming language useful for manipulating text-based data and performing computations on data. It reads input files line-by-line, splits each line into fields, and allows the user to run commands to test or transform the data on matches. Awk programs consist of pattern-action statements that are applied to each line, and features include variables, built-in functions, flow control, arrays, and the ability to write both short one-liners and longer scripts.

Uploaded by

saleempkp
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views16 pages

Introawk

Awk is a programming language useful for manipulating text-based data and performing computations on data. It reads input files line-by-line, splits each line into fields, and allows the user to run commands to test or transform the data on matches. Awk programs consist of pattern-action statements that are applied to each line, and features include variables, built-in functions, flow control, arrays, and the ability to write both short one-liners and longer scripts.

Uploaded by

saleempkp
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 16

Introduction to Awk

Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.

Awk
Works well on record-type data Reads input file(s) a line at a time Parses each line into fields Performs user-defined tests against each line, performs actions on matches

Other Common Uses


Input validation

Every record have same # of fields? Do values make sense (negative time, hourly wage > $100, etc.)?

Filtering out certain fields Searches


Who got a zero on lab 3? Who got the highest grade?

Many others (it's late)

Invocation
Can write little one-liners on the command line (very handy):
print the 3rd field of every line: $ awk '{ print $3 }' input.txt

Execute an awk script file:


$ awk f script.awk input.txt

Or, use this sha-bang as the first line, and give your script execute permissions:
#!/bin/awk -f

Form of an AWK program


AWK programs are entries of the form:
pattern { action } pattern some test, looking for a pattern
(regular expressions) or C-like conditions
if null, actions are applies to every line

action a statement or set of statements


if not provided, the default action is to print the

entire line, much like grep

Form of an AWK program


Input files are parsed, a record (line) at a time Each line is checked against each pattern, in order There are 2 special patterns:

BEGIN true before any records are read END true at end of input (after all records have been read)

Awk Features
Patterns can be regular expressions or C like conditions. Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed. Input lines are parsed and split into fields, which are accessed by $1,,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)

Variables
Not declared, nor typed No character type

Only strings and floats (support for ints)

$n refers to the nth field (where n is some integer value)


# prints each field on the line for( i=0; i<=NF; ++i ) print $i

Some Built-in Variables


FS the input field separator OFS the output field separator NF # of fields; changes w/each record NR the # of records read (so far). So, the current record #. $0 the entire input line

Example
Print those employees who actually worked $ awk $3>0 {print $1, $2*$3} emp.data
Kathy Mark Mary Susie 40 100 121 76.5 $ cat emp.data Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18

Example CSV file


$ cat students.csv smith,john,js12 jones,fred,fj84 bee,sue,sb23 fife,ralph,rf86 james,jim,jj22 cook,nancy,nc54 banana,anna,ab67 russ,sam,sr77 loeb,lisa,guitarHottie $ cat getEmails.awk #!/bin/awk -f

$ getEmails.awk students.csv john's email is: [email protected] fred's email is: [email protected] sue's email is: [email protected] ralph's email is: [email protected] jim's email is: [email protected] nancy's email is: [email protected] anna's email is: [email protected] sam's email is: [email protected] lisa's email is: guitarHottie@schoo

BEGIN { FS = "," } { printf( "%s's email is: %[email protected]\n", $2, $3 ); }

Example output separator


$ cat out.awk #!/bin/awk -f BEGIN { FS = ","; OFS = "-*-"; } { print $1, $2, $3; } $ out.awk students.csv smith-*-john-*-js12 jones-*-fred-*-fj84 bee-*-sue-*-sb23 fife-*-ralph-*-rf86 james-*-jim-*-jj22 cook-*-nancy-*-nc54 banana-*-anna-*-ab67 russ-*-sam-*-sr77 loeb-*-lisa-*-guitarHottie

Flow Control
Awk syntax is much like C Same loops, if statements, etc. AWK: Aho, Weinberger, Kernighan Kernighan and Ritchie wrote the C language

Associative Arrays
Awk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables.

Total[Sue] = 100;

It is possible to loop over all indices that have currently been assigned values.
for (name in Total) print name, Total[name];

Example using Associative Arrays


$ cat scores Fred 90 Sue 100 Fred 85 Sam 70 Sue 98 Sam 50 Fred 70 $ cat total.awk { Total[$1] += $2} END { for (i in Total) print i, Total[i]; }

$ awk -f total.awk scores


Sue 198 Sam 120 Fred 245

Useful one-liners
Line count:
awk 'END {print NR}'

grep
awk '/pat/'

head
awk 'NR<=10'

Add line #s to a file


awk '{print NR, $0}' awk '{ printf( "%5d %s", NR, $0 )}'

Many more. See the resources tab on the course webpage for links to more examples.

You might also like