Tawk
Tawk
tawk
Contents
1 tawk 1
1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.5 -s and -N Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Related Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.7 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.9 t2nfdump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.10 t2custom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.11 Writing a tawk Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.12 Using tawk Within Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.13 Using tawk With Non-Tranalyzer Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.14 Awk Cheat Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.15 Awk Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.16 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.17 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
b
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK
1 tawk
1.1 Description
This document describes tawk and its functionalities. tawk works just like awk, but provides access to the columns via
their names. In addition, it provides access to helper functions, such as host() or port(). Custom functions can be
added in the folder named t2custom where they will be automatically loaded.
1.2 Dependencies
gawk version 4.1 is required.
1.3 Installation
The recommended way to install tawk is to install t2_aliases as documented in README.md:
• Append the following line to ~/.bashrc (make sure to replace $T2HOME with the actual path, e.g.,
$HOME/tranalyzer2-0.9.1):
if [ -f " $T2HOME / scripts / t2_aliases " ]; then
. $T2HOME / scripts / t2_aliases # Note the leading ‘.’
fi
1.4 Usage
• To list the column numbers and names: tawk -l file_flows.txt
• To list the column numbers and names as 3 columns: tawk -l=3 file_flows.txt
• To save the original filename and filter used: tawk -c ‘FILTER’ file_flows.txt > file.txt
1 If the dnf command could not be found, try with yum instead
2 Brew is a packet manager for macOS that can be found here: https://brew.sh
1
Copyright © 2008–2024 by Tranalyzer Development Team
1.5 -s and -N Options 1 TAWK
• To extract all ICMP flows and the header: tawk ‘hdr() || $l4Proto == 1’ file_flows.txt > icmp.txt
• To extract all ICMP flows without the header: tawk -H ‘icmp()’ file_flows.txt > icmp.txt
• To extract the flow with index 1234: tawk ‘$flowInd == 1234’ file_flows.txt
• To extract all DNS flows and the header: tawk ‘hdr() || strtonum($dnsStat)’ file_flows.txt
• To consult the documentation for the functions ‘min’ and ‘max’: tawk -d min,max
• To consult the documentation for all the available functions: tawk -d all
• To consult the documentation for the variable ‘var’ with value 0x8a: tawk -V var=0x8a
• To decode all variables from tranalyzer2 log file (stdout): t2 -r file.pcap | tawk -L
• To create a PCAP with all packets from flow 42: tawk -x flow42.pcap ‘$flowInd == 42’ file_flows.txt
Note that an option not recognized by tawk is internally passed to awk/gawk. One of the most useful is the -v option
to set the value of a variable:
2
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.6 Related Utilities
1.6.2 lsx
Display columns with fixed width (default: 40), e.g., lsx file_flows.txt or lsx 45 file_flows.txt
1.6.3 sortu
Sort rows and count the number of times a given row appears, then sort by the most occurring rows. (Alias for sort
| uniq -c | sort -rn). Useful, e.g., to analyze the most occurring user-agents: tawk ‘{ print $httpUsrAg }’
FILE_flows.txt | sortu
sortup
Same as sortu, but display the relative percentage instead of the absolute count. e.g., to analyze the most occurring
user-agents: tawk ‘{ print $httpUsrAg }’ FILE_flows.txt | sortup
1.6.4 tcol
Display columns with minimum width, e.g., tcol file_flows.txt.
1.7 Functions
Collection of functions for tawk:
• Parameters between brackets are optional,
Function Description
hdr() Use this function in your tests to keep the header (column names).
3
Copyright © 2008–2024 by Tranalyzer Development Team
1.7 Functions 1 TAWK
Function Description
tuple6() Return the 6 tuple (source IP and port, dest. IP and port, proto, VLANID).
proto([p]) Return true if the protocol number appears in p (comma or semicolon separated).
Ranges may also be specified using a dash, e.g., proto("1-3").
If p is omitted, return the protocol number.
proto2str([p]) Return the string representation of the protocol number p.
If p is omitted, return the string representation of the protocol.
icmp([p]) Return true if the protocol is equal to 1 (ICMP).
igmp([p]) Return true if the protocol is equal to 2 (IGMP).
tcp([p]) Return true if the protocol is equal to 6 (TCP).
udp([p]) Return true if the protocol is equal to 17 (UDP).
rsvp([p]) Return true if the protocol is equal to 46 (RSVP).
gre([p]) Return true if the protocol is equal to 47 (GRE).
esp([p]) Return true if the protocol is equal to 50 (ESP).
ah([p]) Return true if the protocol is equal to 51 (AH).
icmp6([p]) Return true if the protocol is equal to 58 (ICMPv6).
4
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.7 Functions
Function Description
sctp([p]) Return true if the protocol is equal to 132 (SCTP).
tcpflags([val]) If val is specified, return true if the specified flags are set.
If val is omitted, return a string representation of the TCP flags.
valcontains(val,sep,item) Return true if one item of val split by sep is equal to item.
cvalcontains(val,item) Alias for valcontains(val, "_", item).
rvalcontains(val,item) Alias for valcontains(val, ";", item).
5
Copyright © 2008–2024 by Tranalyzer Development Team
1.7 Functions 1 TAWK
Function Description
strneq(val1,val2) Return true if val1 and val2 are not equal.
hasprefix(val,pre) Return true if val begins with the prefix pre.
hassuffix(val,suf) Return true if val finished with the suffix suf.
contains(val,txt) Return true if val contains the substring txt.
6
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.7 Functions
Function Description
aggr(fields[,val[,num]]) Perform aggregation of fields and store the sum of val.
fields and val can be tab separated lists of fields, e.g., $srcIP4"\t"$dstIP4.
Results are sorted according to the first value of val.
If val is omitted, the empty string or equal to "flows" or "packets"
(case insensitive), count the number of records (flows or packets).
If num is omitted or 0, return the full list.
If num > 0 return the top num results.
If num < 0 return the bottom num results.
aggrrep(fields[,val[,num[,ign_e[,sep]]]])
Perform aggregation of the repetitive fields and store the sum of val.
val can be a tab separated lists of fields, e.g., $numBytesSnt"\t"$numPktsSnt.
Results are sorted according to the first value of val.
If val is omitted, the empty string or equal to "flows" or "packets"
(case insensitive), count the number of records (flows or packets).
If num is omitted or 0, return the full list.
If num > 0 return the top num results.
If num < 0 return the bottom num results.
If ign_e is omitted or 0, consider all values, otherwise ignore empty values.
sep can be used to change the separator character (default: ";").
t2rsort(col[,num[,type]])
Sort the file in reverse order according to col.
(Multiple column numbers can be specified by using ";" as separator,
e.g., 1 ";" 2)
If num is omitted or 0, return the full list.
If num > 0 return the top num results.
If num < 0 return the bottom num results.
type can be used to specify the type of data to sort:
"ip", "num" or "str" (default is based on the first matching record).
t2sort(col[,num[,type[,rev]]])
Sort the file according to col.
(Multiple column numbers can be specified by using ";" as separator,
e.g., 1 ";" 2)
If num is omitted or 0, return the full list.
If num > 0 return the top num results.
If num < 0 return the bottom num results.
type can be used to specify the type of data to sort:
"ip", "num" or "str" (default is based on the first matching record).
If rev > 0, sort in reverse order (alternatively, use the t2rsort() function).
wildcard(expr) Print all columns whose name matches the regular expression expr.
7
Copyright © 2008–2024 by Tranalyzer Development Team
1.7 Functions 1 TAWK
Function Description
If expr is preceded by an exclamation mark, return all columns whose name
does NOT match expr.
json([s]) Convert the string s to JSON. The first record is used as column names.
If s is omitted, convert the entire row.
texscape(s) Escape the string s to make it LaTeX compatible.
bitshift(n,t[,d[,b]]) Shift a byte or of a list of bytes n to the left or right by a given number of bits t.
To shift to the left, set d to 0 (default), to shift to the right set d ̸= 0.
Set b to 16 to force interpretation as hexadecimal,
e.g., interpret 45 as 69 (0x45) instead of 45.
nibble_swap(n[,b]) Swap the nibbles of a byte or of a list of bytes n.
Set b to 16 to force interpretation as hexadecimal,
e.g., interpret 45 as 69 (0x45) instead of 45.
tobits(u,[b]) Convert the unsigned integer u to its binary representation.
Set b to 16 to force interpretation as hexadecimal,
e.g., interpret 45 as 69 (0x45) instead of 45.
diff(file[,mode]) Compare two files (file and the input), and print the name and number of
the columns which differ. The mode parameter can be used to control the
format of the output.
ffsplit([s[,k[,h]]]) Split the input file into smaller more manageable files.
The files to create can be specified as argument to the function (one comma
separated string). If no argument is specified, create one file per column
whose name ends with Stat, e.g., dnsStat, and one for
pwxType (pw).
If k > 0, then only print relevant fields and those controlled by h, a
comma separated list of fields to keep in each file, e.g., "srcIP,dstIP".
flow([f]) Return all flows whose index appears in f (comma or semicolon separated).
Ranges may also be specified using a dash, e.g., flow("1-3").
If f is omitted, return the flow index.
packet([p]) Return all packets whose number appears in p (comma or semicolon separated).
Ranges may also be specified using a dash, e.g., packet("1-3").
If p is omitted, return the packet number.
follow_stream(f[,of[,d[,pf[,r[,nc]]]]])
Return the payload of the flow with index f.
8
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.8 Examples
Function Description
of can be used to change the output format [default: 0]:
0: Payload only,
1: prefix each payload with packet/flow info,
2: JSON,
3: Reconstruct (pipe the output to xxd -p -r to reproduce the binary file).
d can be used to only extract a specific direction ("A" or "B")
[default: "" (A and B)].
pf can be used to change the payload format [default: 0]:
0: ASCII,
1: Hexdump,
2: Raw/Binary,
3: Base64.
r can be used to prevent the analysis of TCP sequence numbers
(no TCP reassembly and reordering).
nc can be used to print the data without colors.
1.8 Examples
Collection of examples using tawk functions:
Function Description
dnsZT() Return all flows where a DNS zone transfer was performed.
httpHostsURL([f]) Return all HTTP hosts and a list of the files hosted (sorted alphabetically).
If f > 0, print the number of times a URL was requested.
9
Copyright © 2008–2024 by Tranalyzer Development Team
1.9 t2nfdump 1 TAWK
Function Description
1.9 t2nfdump
Collection of functions for tawk allowing access to specific fields using a syntax similar as nfdump.
Function Description
ts() Start Time — first seen
te() End Time — last seen
td() Duration
pr() Protocol
sa() Source Address
da() Destination Address
sap() Source Address:Port
dap() Destination Address:Port
sp() Source Port
dp() Destination Port
pkt() Packets — default input
ipkt() Input Packets
opkt() Output Packets
byt() Bytes — default input
ibyt() Input Bytes
obyt() Output Bytes
flg() TCP Flags
mpls1() MPLS label 1
mpls2() MPLS label 2
mpls3() MPLS label 3
mpls4() MPLS label 4
mpls5() MPLS label 5
mpls6() MPLS label 6
mpls7() MPLS label 7
mpls8() MPLS label 8
mpls9() MPLS label 9
10
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.10 t2custom
Function Description
mpls10() MPLS label 10
mpls() MPLS labels 1–10
bps() Bits per second
pps() Packets per second
bpp() Bytes per packet
1.10 t2custom
Copy your own functions in this folder. Refer to Section 1.11 for more details on how to write a tawk function. To have
your functions automatically loaded, include them in the file t2custom/t2custom.load.
• Use uppercase letters and two leading and two trailing underscores for global variables
#!/usr/bin/env awk
#
# Function description
#
# Parameters:
# - arg1: description
# - arg2: description (optional)
#
# Dependencies:
# - plugin1
# - plugin2 (optional)
#
# Examples:
# - tawk ‘funcname()’ file.txt
# - tawk ‘{ print funcname() }’ file.txt
@include "hdr"
11
Copyright © 2008–2024 by Tranalyzer Development Team
1.12 Using tawk Within Scripts 1 TAWK
@include "_validate_col"
if (hdr()) {
if (__PRIHDR__) print "header"
} else {
print "something", $_locvar1, $colname2
}
}
• The input field separator can be specified with the -F option, e.g., tawk -F ‘,’ ‘program’ file.csv
• The row listing the column names, can start with any character specified with the -s option, e.g., tawk -s ‘#’
‘program’ file.txt
• All the column names must not be equal to a function or builtin name
• Valid column names must start with a letter (a-z, A-Z) and can be followed by any number of alphanumeric
characters or underscores
• If the column names are different from those used by Tranalyzer, refer to Section 1.13.1.
BEGIN {
_my_srcIP = non_t2_name_for_srcIP
_my_dstIP = non_t2_name_for_dstIP
...
}
Once edited, run tawk with the -i $T2HOME/scripts/tawk/my_vars option and the external column names will be
automatically used by tawk functions, such as tuple2(). For more details, refer to the my_vars file.
12
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.14 Awk Cheat Sheet
– Always use awk -F‘\t’ (or awkf/tawk) when working with flow files.
• Load libraries, e.g., tawk functions, with -i: awk -i file.awk ‘program’ file.txt
• To use external variables, use the -v option, e.g., awk -v name="value" ‘{ print name }’ file.txt.
13
Copyright © 2008–2024 by Tranalyzer Development Team
1.15 Awk Templates 1 TAWK
14
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.15 Awk Templates
15
Copyright © 2008–2024 by Tranalyzer Development Team
1.16 Examples 1 TAWK
}
}
NR > 1 {
print $srcIP4, $dstIP4
}
’ file.txt
– awkf ‘
NR == 1 {
for (i = 1; i <= NF; i++) {
col[$i] = i
}
}
NR > 1 {
print $col["srcIP4"], $col["dstIP4"];
}
’ file.txt
1.16 Examples
1. Pivoting (variant 1):
(a) First extract an attribute of interest, e.g., an unresolved IP address in the Host: field of the HTTP header:
tawk ‘aggr($httpHosts)’ FILE_flows.txt | tawk ‘{ print unquote($1); exit }’
(b) Then, put the result of the last command in the badguy variable and use it to extract flows involving this IP:
tawk -v badguy="$(!!)" ‘host(badguy)’ FILE_flows.txt
2. Pivoting (variant 2):
(a) First extract an attribute of interest, e.g., an unresolved IP address in the Host: field of the HTTP header, and
store it into a badip variable:
badip="$(tawk ‘aggr($httpHosts)’ FILE_flows.txt | tawk ‘{ print unquote($1);exit }’)"
(b) Then, use the badip variable to extract flows involving this IP:
tawk -v badguy="$badip" ‘host(badguy)’ FILE_flows.txt
3. Aggregate the number of bytes sent between source and destination addresses (independent of the protocol and
port) and output the top 10 results:
4. Sort the flow file according to the duration (longest flows first) and output the top 5 results:
5. Extract all TCP flows while keeping the header (column names):
16
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.17 FAQ
6. Extract all flows whose destination port is between 6000 and 6008 (included):
8. Extract all flows whose source IP is in subnet 192.168.1.0/24 (using host or net):
10. Extract all flows whose source IP is in subnet 192.168.1.0/24 (using ipinnet):
11. Extract all flows whose source IP is in subnet 192.168.1.0/24 (using ipinnet and a hex mask):
12. Extract all flows whose source IP is in subnet 192.168.1.0/24 (using ipinnet and the CIDR notation):
13. Extract all flows whose source IP is in subnet 192.168.1.0/24 (using ipinnet and a CIDR mask):
For more examples, refer to tawk -d option, e.g., tawk -d aggr, where every function is documented and comes with a
set of examples. The complete documentation can be consulted by running tawk -d all.
1.17 FAQ
1.17.1 Can I use tawk with non Tranalyzer files?
Yes, refer to Section 1.13.
1.17.2 Can I use tawk functions with non Tranalyzer column names?
Yes, edit the my_vars file and load it using -i $T2HOME/scripts/tawk/my_vars option. Refer to Section 1.13.1 for
more details.
17
Copyright © 2008–2024 by Tranalyzer Development Team
1.17 FAQ 1 TAWK
1.17.4 The row listing the column names start with a ‘#’ instead of a ‘%’. . . Can I still use tawk?
Yes, use the -s option to specify the first character, e.g., tawk -s ‘#’ ‘program’
1.17.6 Can I process a CSV (Comma Separated Value) file with tawk?
The simplest way to process CSV files is to use the --csv option. This sets the input and output separators to a comma
and considers the first row to be the column names.
tawk --csv ‘program’ file.csv
Alternatively, the input field separator can be changed with the -F option and the output separator with -O ‘,’ or -v
OFS=‘,’. Note that tawk expects the column names to be the last row starting with a ‘%’. This can be changed with the
-s and -N options (Section 1.5).
tawk -F ‘,’ -v OFS=‘,’ -s "" -N 1 ‘program’ file.csv
1.17.7 Can I produce a CSV (Comma Separated Value) file from tawk?
The output field separator (OFS) can be changed with the -O ‘fs’ or -v OFS=‘fs’ option. To produce a CSV file, run
tawk as follows: tawk -O ‘,’ ‘program’ file.txt or tawk -v OFS=‘,’ ‘program’ file.txt
1.17.8 Can I write my tawk programs in a file instead of the command line?
Yes, copy the program (without the single quotes) in a file, e.g., prog.txt and run it as follows:
tawk -f prog.txt file.txt
1.17.9 Can I still use column names if I pipe data into tawk?
Yes, you can specify a file containing the column names with the -I option as follows:
cat file.txt | tawk -I colnames.txt ‘program’
1.17.10 Can I use tawk if the row with the column names does not start with a special character?
Yes, you can specify the empty character with -s "". Refer to Section 1.5 for more details.
1.17.11 I get a list of syntax errors from gawk... What is the problem?
The name of the columns is used to create variable names. If it contains forbidden characters, then an error similar to the
following is reported.
gawk: /tmp/fileBndhdf:3: col-name = 3
gawk: /tmp/fileBndhdf:3: ^ syntax error
Although tawk will try to replace forbidden characters with underscore, the best practice is to use only alphanumeric
characters (A-Z, a-z, 0-9) and underscore as column names. Note that a column name MUST NOT start with a number.
18
Copyright © 2008–2024 by Tranalyzer Development Team
1 TAWK 1.17 FAQ
1.17.12 I get a function name previously defined error from gawk... What is the problem?
The name of the columns is used to create variable names. If a column is named after a tawk function or a builtin, then an
error similar to the following is reported.
gawk: In file included from ah:21,
gawk: from /home/user/tranalyzer2/scripts/tawk/funcs/funcs.load:8,
gawk: proto:36: error: function name ‘proto’ previously defined
In this case, you have two options. Either rename the column(s) in your file, e.g., proto → l4Proto or use tawk -t
option. With the -t option, Tawk tries to validate the column names by ensuring that no column names is equal to a
function name and that all column names used in the program exist. Note that this verification process can be slow.
1.17.13 Tawk cannot find the column names... What is the problem?
First, make sure the comment char (-s option) is correctly set for your file (the default is ‘%’). Second, make sure the
column names do not contain forbidden characters, i.e., use only alphanumeric and underscore and do not start with a
number. If the row with column names is not the last one to start with the separator character, then specify the line number
with the -N option as follows: tawk -N 3’ or tawk -s ’#’ -N 2. Refer to Section 1.5 for more details.
1.17.14 Wireshark refuses to open PCAP files generated with tawk -k option...
If Wireshark displays the message Couldn’t run /usr/bin/dumpcap in child process: Permission Denied.,
then this means that your user does not belong to the wireshark group. To fix this issue, simply run the following
command sudo gpasswd -a YOUR_USERNAME wireshark (you will then need to log off and on again).
1.17.15 Tawk reports errors similar to free(): double free detected in tcache 2
Tawk uses gawk -M option to handle IPv6 addresses. For some reasons, this option is regularly affected by bugs... If you
do not need IPv6 support, you can simply comment out line 653 in tawk:
OPTS=(
#-M -v PREC=256 # <-- Add the leading sharp (’#’) here
-v __PRIHDR__=$PRIHDR
-v __UNAME__="$(uname)"
)
19
Copyright © 2008–2024 by Tranalyzer Development Team