A Practical Guide To Learning GNU Awk
A Practical Guide To Learning GNU Awk
com
A practical guide to
learning GNU Awk
OPENSOURCE.COM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ABOUT OPENSOURCE.COM
What is Opensource.com?
CONTRIBUTORS
Jim Hall
Lazarus Lazaridis
Dave Neary
Moshe Zadka
CHAPTERS
LEARN
What is awk? 5
PRACTICE
CHEAT SHEET
What is awk?
awk is known for its robust ability to process and interpret data from text files.
Printing a column
AWK IS A POWERFUL text-parsing tool for Unix and
Unix-like systems, but because it has
programmed functions that you can use to perform com-
In awk, the print function displays whatever you specify.
There are many predefined variables you can use, but some
mon parsing tasks, it’s also considered a programming of the most common are integers designating columns in a
language. You probably won’t be developing your next text file. Try it out:
GUI application with awk, and it likely won’t take the place
of your default scripting language, but it’s a powerful utility $ awk '{print $2;}' colours.txt
for specific tasks. color
What those tasks may be is surprisingly diverse. The best red
way to discover which of your problems might be best solved yellow
by awk is to learn awk; you’ll be surprised at how awk can red
help you get more done but with a lot less effort. purple
Awk’s basic syntax is: green
purple
awk [options] 'pattern {action}' file brown
brown
To get started, create this sample file and save it as colours.txt yellow
name color amount In this case, awk displays the second column, denoted by
apple red 4 $2. This is relatively intuitive, so you can probably guess that
banana yellow 6 print $1 displays the first column, and print $3 displays the
strawberry red 3 third, and so on.
grape purple 10 To display all columns, use $0.
apple green 8 The number after the dollar sign ($) is an expression, so
plum purple 2 $2 and $(1+1) mean the same thing.
kiwi brown 4
potato brown 9 Conditionally selecting columns
pineapple yellow 5 The example file you’re using is very structured. It has a row
that serves as a header, and the columns relate directly to
This data is separated into columns by one or more spac- one another. By defining conditional requirements, you can
es. It’s common for data that you are analyzing to be qualify what you want awk to return when looking at this
organized in some way. It may not always be columns data. For instance, to view items in column 2 that match “yel-
separated by whitespace, or even a comma or semico- low” and print the contents of column 1:
lon, but especially in log files or data dumps, there’s gen-
erally a predictable pattern. You can use patterns of data awk '$2=="yellow"{print $1}' colours.txt
to help awk extract and process the data that you want banana
to focus on. pineapple
Fields, records,
and variables in awk
In the second article in this intro to awk series, learn about
fields, records, and some powerful awk variables.
such as mawk, nawk, and the one that ships with most Li- raspberry red
nux distributions, GNU awk, or gawk. On most Linux dis-
tributions, awk and gawk are synonyms referring to GNU As does this one:
awk, and typing either invokes the same awk command.
See the GNU awk user’s guide [1] for the full history of tuxedo black
awk and gawk.
The first article in this series showed that awk is invoked Other separators are not treated this way. Assuming that the
on the command line with this syntax: field separator is a comma, the following example record con-
tains three fields, with one probably being zero characters long
$ awk [options] 'pattern {action}' inputfile (assuming a non-printable character isn’t hiding in that field):
Also, a rule can consist of only a pattern, in which case the The format argument (or format string ) defines how each of
entire record is written as if the action was { print }. the other arguments will be output. It uses format specifiers
Awk programs are essentially data-driven in that actions to do this, including %s to output a string and %d to output a
depend on the data, so they are quite a bit different from decimal number. The following printf statement outputs the
programs in many other programming languages. record followed by the number of fields in parentheses:
Arguably, there’s no advantage to having just one line in writing an awk script with more than one rule and at least
a script, but sometimes it’s easier to execute a script than one conditional pattern. If you want to try more functions
to remember and type even a single line. A script file also than just print and printf, refer to the gawk manual [3]
provides a good opportunity to document what a command online.
does. Lines starting with the # symbol are comments, which Here’s an idea to get you started:
awk ignores.
Grant the file executable permission: #!/usr/bin/awk -f
#
$ chmod u+x example2.awk # Print each record EXCEPT
# IF the first record contains "raspberry",
Run the script: # THEN replace "red" with "pi"
A guide to intermediate
awk scripting
Learn how to structure commands into executable scripts.
NR == 1 { $1 == "raspberry" {
print $0; gsub(/red/,"pi")
next; }
}
END command
$3 >= 8 { The END command, like BEGIN, allows you to perform
printf "%s\t%s\n", $0, "**"; actions in awk after it completes its scan through the text
file you are processing. If you want to print cumulative re- df -l | awk -f total.awk
sults of some value in all records, you can do that only
after all records have been scanned and processed. The used and available variables act like variables in many
The BEGIN and END commands run only once each. All other programming languages. You create them arbitrarily
rules between them run zero or more times on each record. and without declaring their type, and you add values to them
In other words, most of your awk script is a loop that is exe- at will. At the end of the loop, the script adds the records in
cuted at every new line of the text file you’re processing, with the respective columns together and prints the totals.
the exception of the BEGIN and END rules, which run before
and after the loop. Math
Here is an example that wouldn’t be possible without the As you can probably tell from all the logical operators and
END command. This script accepts values from the output of casual calculations so far, awk does math quite naturally.
the df Unix command and increments two custom variables This arguably makes it a very useful calculator for your
(used and available) with each new record. terminal. Instead of struggling to remember the rather un-
usual syntax of bc, you can just use awk along with its
$1 != "tempfs" { special BEGIN function to avoid the requirement of a file
used += $3; argument:
available += $4;
} $ awk 'BEGIN { print 2*21 }'
42
END { $ awk 'BEGIN {print 8*log(4) }'
printf "%d GiB used\n%d GiB available\n", 11.0904
used/2^20, available/2^20;
} Admittedly, that’s still a lot of typing for simple (and not so
simple) math, but it wouldn’t take much effort to write a fron-
Save the script as total.awk and try it: tend, which is an exercise for you to explore.
#!/bin/awk -f exit;
}
BEGIN {
# Loop through 1 to 10 For loops
There are two kinds of for loops in awk.
i=1; One kind of for loop initializes a variable, performs a test,
while (i <= 10) { and increments the variable together, performing commands
print i, " to the second power is ", i*i; while the test is true.
i = i+1;
} #!/bin/awk -f
exit;
} BEGIN {
for (i=1; i <= 10; i++) {
In this simple example, awk prints the square of whatever print i, " to the second power is ", i*i;
integer is contained in the variable i. The while (i <= 10) }
phrase tells awk to perform the loop only as long as the value exit;
of i is less than or equal to 10. After the final iteration (while i }
is 10), the loop ends.
Another kind of for loop sets a variable to successive indices
Do while loop of an array, performing a collection of commands for each
The do while loop performs commands after the keyword index. In other words, it uses an array to “collect” data from
do. It performs a test afterward to determine whether the a record.
This example implements a simplified version of the Unix The third column of the sample data file contains the num-
command uniq. By adding a list of strings into an array called ber of items listed in the first column. You can use an array
a as a key and incrementing the value each time the same and a for loop to tally the items in the third column by
key occurs, you get a count of the number of times a string ap- color:
pears (like the --count option of uniq). If you print the keys of
the array, you get every string that appears one or more times. #! /usr/bin/awk -f
For example, using the demo file colours.txt (from the
previous articles): BEGIN {
FS=" ";
name color amount OFS="\t";
apple red 4 print("color\tsum");
banana yellow 6 }
raspberry red 99 NR != 1 {
strawberry red 3 a[$2]+=$3;
grape purple 10 }
apple green 8 END {
plum purple 2 for (b in a) {
kiwi brown 4 print b, a[b]
potato brown 9 }
pineapple yellow 5 }
Here is a simple version of uniq -c in awk form: As you can see, you are also printing a header column in the
BEFORE function (which always happens only once) prior to
#! /usr/bin/awk -f processing the file.
NR != 1 { Loops
a[$2]++ Loops are a vital part of any programming language, and
} awk is no exception. Using loops can help you control how
END { your awk script runs, what information it’s able to gather,
for (key in a) { and how it processes your data. Our next article will cover
print a[key] " " key switch statements, continue, and next.
}
}
You have selected all records containing the letter p followed $ awk -e '$1 ~ /^r/ {print $0}' colours.txt
by either an e or an l. raspberry red 99
plum This searches for the group of characters Awk and stores it
kiwi in memory, represented by the special character &. Then it
potato substitutes the string for GNU &, meaning GNU Awk. The
pinenut 1 character at the end tells gensub() to replace the first
occurrence.
The reason both apple and pineapple were replaced with
nut is that both are the first match of their records. If the $ printf "Awk\nAwk is not Awkward" \
records were different, then the results could differ: | awk -e ' { print gensub(/(Awk)/, "GNU &",1) }'
GNU Awk
$ printf "apple apple\npineapple apple\n" | \ GNU Awk is not Awkward
awk -e 'sub(/apple/, "nut")'
nut apple There’s a time and a place
pinenut apple Awk is a powerful tool, and regex are complex. You might
think awk is so very powerful that it could easily replace
The gsub command substitutes all matching items: grep and sed and tr and sort [2] and many more, and
in a sense, you’d be right. However, awk is just one tool
$ printf "apple apple\npineapple apple\n" | \ in a toolbox that’s overflowing with great options. You
awk -e 'gsub(/apple/, "nut")' have a choice about what you use and when you use it,
nut nut so don’t feel that you have to use one tool for every job
pinenut nut great and small.
With that said, awk really is a powerful tool with lots of
Gensub great functions. The more you use it, the better you get to
An even more complex version of these functions, called know it. Remember its capabilities, and fall back on it occa-
gensub(), is also available. sionally so can you get comfortable with it.
The gensub function allows you to use the & character to
recall the matched text. For example, if you have a file with
the word Awk and you want to change it to GNU Awk, you Links
could use this rule: [1]
https://opensource.com/article/19/7/what-posix-richard-
stallman-explains
{ print gensub(/(Awk)/, "GNU &", 1) } [2]
https://opensource.com/article/19/10/get-sorted-sort
The apple is classified as: a fruit, pome As you can see, even though the script starts out with an
The banana is classified as: a fruit, berry explicit infinite loop with no end condition, the break function
The strawberry is classified as: [unclassified] ensures that the script eventually terminates.
The raspberry is classified as: a computer, pi
The grape is classified as: a fruit, berry Continue
The apple is classified as: a fruit, pome The continue function is similar to break. It can be used in
The plum is classified as: a fruit, drupe a for, while, or do-while loop (it’s not relevant to a switch
The kiwi is classified as: a fruit, berry statements, though). Invoking continue skips the rest of the
The potato is classified as: a vegetable, tuber enclosing loop and begins the next cycle.
The p
ineapple is classified as: a fruit, fused berries Here’s another good example from the GNU awk manual
(syncarp) to demonstrate a possible use of continue:
Break #!/usr/bin/awk -f
The break statement is mainly used for the early termination
of a for, while, or do-while loop or a switch statement. In a # Loop, printing numbers 0-20, except 5
loop, break is often used where it’s not possible to determine
the number of iterations of the loop beforehand. Invoking BEGIN {
break terminates the enclosing loop (which is relevant when for (x = 0; x <= 20; x++) {
there are nested loops or loops within loops). if (x == 5)
This example, straight out of the GNU awk manual [2], shows continue
a method of finding the smallest divisor. Read the additional printf "%d ", x
comments for a clear understanding of how the code works: }
print ""
#!/usr/bin/awk -f }
unbroken. Try the same code but with break instead to This sample uses next in the first rule to avoid the first line of
see the difference. the file, which is a header row. The second rule skips lines
when the color name is less than six characters long, but
Next it also saves that line in an array called skip, using the line
This statement is not related to loops like break and continue number as the key (also known as the index).
are. Instead, next applies to the main record processing cycle The third rule prints anything it sees, but it is not invoked if
of awk: the functions you place between the BEGIN and END either rule 1 or rule 2 causes it to be skipped.
functions. The next statement causes awk to stop processing Finally, at the end of all the processing, the END rule prints
the current input record and to move to the next one. the contents of the array.
As you know from the earlier articles in this series, awk Run the sample script on the colours.txt file from above
reads records from its input stream and applies rules to (and previous articles):
them. The next statement stops the execution of rules for
the current record and moves to the next one. $ ./next.awk colours.txt
Here’s an example of next being used to “hold” information banana yellow 6
upon a specific condition: grape purple 10
plum purple 2
#!/usr/bin/awk -f pineapple yellow 5
How awk processes text streams In the following example, every user whose shell is not /sbin/
Awk reads text from its input file or stream one line at a time nologin can be printed by preceding the block with a pattern
and uses a field separator to parse it into a number of fields. match:
awk '
BEGIN { FS=":" } ! /\/sbin\/nologin/ {print $1 }' BEGIN {
/etc/passwd FS=",";
template="email_template.txt";
Advanced awk: Mail merge output="acceptance";
Now that you have some of the basics, try delving deep- getline;
er into awk with a more structured example: creating a mail NR=0;
merge. }
A mail merge uses two files, one (called in this example
email_template.txt) containing a template for an email you The main function is very straightforward: for each line pro-
want to send: cessed, a variable is set for the various fields—firstname,
lastname, email, and title. The template file is read line by
From: Program committee <[email protected]> line, and the function sub is used to substitute any occur-
To: {firstname} {lastname} <{email}> rence of the special character sequences with the value of
Subject: Your presentation proposal the relevant variable. Then the line, with any substitutions
made, is output to the output file.
Dear {firstname}, Since you are dealing with the template file and a dif-
ferent output file for each line, you need to clean up and
Thank you for your presentation proposal: close the file handles for these files before processing the
{title} next record.
You want to read the CSV file, replace the relevant fields in # Close template and output file in advance of
the first file (skipping the first line), then write the result to next record
a file called acceptanceN.txt, incrementing N for each line close(outfile);
you parse. close(template);
Write the awk program in a file called mail_merge.awk. }
Statements are separated by ; in awk scripts. The first task
is to set the field separator variable and a couple of other You’re done! Run the script on the command line with:
variables the script needs. You also need to read and dis-
card the first line in the CSV, or a file will be created starting awk -f mail_merge.awk proposals.csv
with Dear firstname. To do this, use the special function
getline and reset the record counter to 0 after reading it. or
awk '!visited[$0]++' your_file > deduplicated_file • true if the occurrences are zero/empty string
• false if the occurrences are greater than zero
How it works
The script keeps an associative array with indices equal to the awk statements consist of a pattern-expression and an as-
unique lines of the file and values equal to their occurrences. sociated action [5].
For each line of the file, if the line occurrences are zero, then
it increases them by one and prints the line, otherwise, it just <pattern/expression> { <action> }
increases the occurrences without printing the line.
I was not familiar with awk, and I wanted to understand If the pattern succeeds, then the associated action is exe-
how this can be accomplished with such a short script (awk- cuted. If we don’t provide an action, awk, by default, prints
ward). I did my research, and here is what is going on: the input.
• The awk “script” !visited[$0]++ is executed for each line of An omitted action is equivalent to { print $0 }.
the input file.
• visited[] is a variable of type associative array [1] (a.k.a.
Map [2]). We don’t have to initialize it because awk will do Our script consists of one awk statement with an expression,
it the first time we access it. omitting the action. So this:
• The $0 variable holds the contents of the line currently be-
ing processed. awk '!visited[$0]++' your_file > deduplicated_file
• visited[$0] accesses the value stored in the map with a
key equal to $0 (the line being processed), a.k.a. the occur- is equivalent to this:
rences (which we set below).
• The ! negates the occurrences’ value: aw
k '!visited[$0]++ { print $0 }' your_file >
• In awk, any nonzero numeric value or any nonempty deduplicated_file
string value is true [3].
• By default, variables are initialized to the empty string [4], For every line of the file, if the expression succeeds, the line
which is zero if converted to a number. is printed to the output. Otherwise, the action is not execut-
• That being said: ed, and nothing is printed.
• If visited[$0] returns a number greater than zero, this
negation is resolved to false. Why not use the uniq command?
• If visited[$0] returns a number equal to zero or an emp- The uniq command removes only the adjacent duplicate
ty string, this negation is resolved to true. lines. Here’s a demonstration:
• The ++ operation increases the variable’s value (visit-
ed[$0]) by one.
$ cat test.txt
6 def
A
7 ghi
A
A 8 klm
B
B sort -uk2 sorts the lines based on the second column (k2
B option) and keeps only the first occurrence of the lines with
A the same second column value (u option).
A
C 1 abc
C 4 def
C 2 ghi
B 8 klm
B 5 xyz
A
$ uniq < test.txt sort -nk1 sorts the lines based on their first column (k1 op-
A
tion) treating the column as a number (-n option).
B
A
1 abc
C
2 ghi
B
4 def
A
5 xyz
Other approaches 8 klm
Using the sort command
We can also use the following sort [6] command to remove Finally, cut -f2- prints each line starting from the second
the duplicate lines, but the line order is not preserved. column until its end (-f2- option: Note the - suffix, which in-
structs it to include the rest of the line).
sort -u your_file > sorted_deduplicated_file
abc
Using cat, sort, and cut ghi
The previous approach would produce a de-duplicated file def
whose lines would be sorted based on the contents. Piping a xyz
bunch of commands [7] can overcome this issue: klm
one line. Establish what you want to do with one line, then test This establishes the file as an awk script that executes the
it (either mentally or with awk) on the next line and a few more. lines contained in the file.
You’ll end up with a good hypothesis on what your awk script The BEGIN statement is a special setup function provid-
must do in order to provide you with the data structure you want. ed by awk for tasks that need to occur only once. Defining
In this case, it’s easy to see that each field is separated by the built-in variable FS, which stands for field separator
a semicolon. For simplicity’s sake, assume you want to sort and is the same value you set in your awk command with
the list by the very first field of each line. --field-separator, only needs to happen once, so it’s includ-
Before you can sort, you must be able to focus awk on just ed in the BEGIN statement.
the first field of each line, so that’s the first step. The syntax
of an awk command in a terminal is awk, followed by rele- Arrays in awk
vant options, followed by your awk command, and ending You already know how to gather the values of a specific field
with the file of data you want to process. by using the $ notation along with the field number, but in
this case, you need to store it in an array rather than print
$ awk --field-separator=";" '{print $1;}' penguins.list it to the terminal. This is done with an awk array. The im-
Aptenodytes portant thing about an awk array is that it contains keys
Pygoscelis and values. Imagine an array about this article; it would
Eudyptula look something like this: author:"seth",title:"How to sort
Spheniscus with awk",length:1200. Elements like author and title and
Megadyptes length are keys, with the following contents being values.
Eudyptes The advantage to this in the context of sorting is that you
Torvaldis can assign any field as the key and any record as the value,
and then use the built-in awk function asorti() (sort by index)
Because the field separator is a character that has special to sort by the key. For now, assume arbitrarily that you only
meaning to the Bash shell, you must enclose the semicolon want to sort by the second field.
in quotes or precede it with a backslash. This command is Awk statements not preceded by the special keywords
useful only to prove that you can focus on a specific field. BEGIN or END are loops that happen at each record. This
You can try the same command using the number of another is the part of the script that scans the data for patterns and
field to view the contents of another “column” of your data: processes it accordingly. Each time awk turns its attention
to a record, statements in {} (unless preceded by BEGIN or
$ awk --field-separator=";" '{print $3;}' penguins.list END) are executed.
Miller,JF To add a key and value to an array, create a variable (in
Wagler this example script, I call it ARRAY, which isn’t terribly origi-
Bonaparte nal, but very clear) containing an array, and then assign it a
Brisson key in brackets and a value with an equals sign (=).
Milne-Edwards
Viellot { # dump each field into an array
Ewing,L ARRAY[$2] = $R;
}
Nothing has been sorted yet, but this is good groundwork.
In this statement, the contents of the second field ($2) are
Scripting used as the key term, and the current record ($R) is used
Awk is more than just a command; it’s a programming lan- as the value.
guage with indices and arrays and functions. That’s signifi-
cant because it means you can grab a list of fields you want The asorti() function
to sort by, store the list in memory, process it, and then print In addition to arrays, awk has several basic functions that
the resulting data. For a complex series of actions such as you can use as quick and easy solutions for common tasks.
this, it’s easier to work in a text file, so create a new file called One of the functions introduced in GNU awk, asorti(), pro-
sorter.awk and enter this text: vides the ability to sort an array by key (or index) or value.
You can only sort the array once it has been populated,
#!/usr/bin/awk -f meaning that this action must not occur with every new
record but only the final stage of your script. For this pur-
BEGIN { pose, awk provides the special END keyword. The inverse
FS=";"; of BEGIN, an END statement happens only once and only
} after all records have been scanned.
As you can see, the data is sorted by the second field. for (i = 1; i <= j; i++) {
This is a little restrictive. It would be better to have the printf("%s %s\n", SARRAY[i],ARRAY[SARRAY[i]])
flexibility to choose at runtime which field you want to use as }
your sorting key so you could use this script on any dataset }
and get meaningful results.
A gawk script
to convert smart quotes
I MANAGE a personal website and edit the web
pages by hand. Since I don’t have
many pages on my site, this works well for me, letting me
}
else {
# prev char is not a space
“scratch the itch” of getting into the site’s code. if (char == "'") {
When I updated my website’s design recently, I decided to printf("’");
turn all the plain quotes into “smart quotes,” or quotes that }
look like those used in print material: “” instead of "". else if (char == "\"") {
Editing all of the quotes by hand would take too long, printf("”");
so I decided to automate the process of converting the }
quotes in all of my HTML files. But doing so via a script or else {
program requires some intelligence. The script needs to printf("%c", char);
know when to convert a plain quote to a smart quote, and }
which quote to use. }
You can use different methods to convert quotes. Greg }
Pittman wrote a Python script [1] for fixing smart quotes in
text. I wrote mine in GNU awk (gawk) [2]. With that function, the body of the gawk script processes the
To start, I wrote a simple gawk function to evaluate a sin- HTML input file character by character. The script prints all
gle character. If that character is a quote, the function de- text verbatim when inside an HTML tag (for example, <html
termines if it should output a plain quote or a smart quote. lang="en">. Outside any HTML tags, the script uses the
The function looks at the previous character; if the previ- smartquote() function to print text. The smartquote() func-
ous character is a space, the function outputs a left smart tion does the work of evaluating when to print plain quotes
quote. Otherwise, the function outputs a right smart quote. or smart quotes.
The script does the same for single quotes.
function smartquote (char, prevchar) {
function smartquote (char, prevchar) { ...
# print smart quotes depending on the previous }
#
character otherwise just print the character as-is
BEGIN {htmltag = 0}
if (prevchar ~ /\s/) {
# prev char is a space {
if (char == "'") { # for each line, scan one letter at a time:
printf("‘");
} linelen = length($0);
else if (char == "\"") {
printf("“"); prev = "\n";
}
else { for (i = 1; i <= linelen; i++) {
printf("%c", char); char = substr($0, i, 1);
}
THE FOLLOWING
and details have been changed.
is based on a true story,
although some names
payment:jane:33
payment:pratyush:17
bought:john:60
payback:john:50
A long time ago, in a place far away, there was an
office. The office did not, for various reasons, buy in- Jane paid $33, Pratyush paid $17, John bought $60 worth of
stant coffee. Some workers in that office got together coffee, and the Coffee Corner paid John $50.
and decided to institute the “Coffee Corner.” Step 3: I was ready to write some code. The code would
process the members and payments and spit out an updated
A member of the Coffee Corner would buy some in- members file with the new debts.
stant coffee, and the other members would pay them
back. It came to pass that some people drank more #!/usr/bin/env --split-string=awk -F: -f
coffee than others, so the level of a “half-member”
was added: a half-member was allowed a limited The shebang (#!) line required some work! I used the env
number of coffees per week and would pay half of command to allow passing multiple arguments from the she-
what a member paid bang: specifically, the -F command-line argument to AWK
tells it what the field separator is.
Managing this was a huge pain. I had just read The Unix An AWK program is a sequence of rules. (It can also con-
Programming Environment and wanted to practice my AWK tain function definitions, but I don’t need any for the Coffee
[1] programming. So I volunteered to create a system. Corner.)
Step 1: I kept a database of members and their debt to the The first rule reads the members file. When I run the com-
Coffee Corner. I did it in an AWK-friendly format, where fields mand, I always give it the members file first, and the pay-
are separated by colons: ments file second. It uses AWK associative arrays to record
membership levels in the members array and current debt
member:john:1:22 in the debt array.
member:jane:0.5:33
member:pratyush:0.5:17 $1 == "member" {
member:jing:1:27 members[$2]=$3
debt[$2]=$4
The first field above identifies what kind of row this is total_members += $3
(member). The second field is the member’s name (i.e., }
their email username without the @). The next field is their
membership level (full=1 or half=0.5). The last field is their The second rule reduces the debt when a payment is
debt to the Coffee Corner. A positive number means they recorded.
owe money, a negative number means the Coffee Corner
owes them. $1 == "payment" {
Step 2: I kept a log of inputs to and outputs from the Coffee debt[$2] -= $3
Corner: }
Payback is the opposite: it increases the debt. This ele- The END pattern is special: it happens exactly once, when
gantly supports the case of accidentally giving someone too AWK has no more lines to process. At this point, it spits out
much money. the new members file with updated debt levels.
$1 == "payback" { END {
debt[$2] += $3 for (x in members) {
} printf "%s:%s:%s\n", x, members[x], debt[x]
}
The most complicated part happens when someone buys }
("bought") instant coffee for the Coffee Club’s use. It is
treated as a payment and the person’s debt is reduced by Along with a script that iterates over the members and
the appropriate amount. Next, it calculates the per-member sends a reminder email to people to pay their dues (for
fee. It iterates over all members and increases their debt, positive debts), this system managed the Coffee Corner for
according to their level of membership. quite a while.
$1 == "bought" {
debt[$2] -= $3 Links
per_member = $3/total_members [1] https://en.wikipedia.org/wiki/AWK
for (x in members) {
debt[x] += per_member * members[x]
}
}
Use this handy quick reference guide to the most commonly used features of GNU awk (gawk).
COMMAND-LINE USAGE REGULAR EXPRESSIONS
Run a gawk script using -f or include a short script right on the Common regular expression patterns include:
command line.
^ Matches start of a line
gawk -f file.awk file1 file2… $ Matches end of a line
or: . Matches any character, including newline
All program lines are some combination of a pattern and actions: [^abc] Negation; matches any character except a, b, or c
\. Use backslash (\) to match a special character (like .)
pattern {action}
where pattern can be: You can also use character classes, including:
(…) Grouping
BEGIN { FS = ":"; }
++ -- Increment and decrement
{ print "Hello world"; } ^ Exponents
Gawk does the work for you and splits input lines so you can && Logical AND
reference them by field. Use -F on the command line or set FS || Logical OR
to set the field separator. = += -= *= /= %= ^= Assignment
• Reference fields using $
• $1 for the first string, and so on
• Use $0 for the entire line
For example:
or:
You can use many common flow control and loop structures, substr(str, pos [, n])
including if, while, do-while, for, and switch.
Return the next n characters of the string str, starting at position pos.
if (i < 10) { print; } If n is omitted, return the rest of the string str.