Shell Scripting and Awk - Day01
Shell Scripting and Awk - Day01
&
awk
CONFIDENTIAL
Agenda
Shell scripting
awk
Programming model
Records and fields
Pattern matching
Functions
System/built-in variables
Data manipulation and report Generation
CONFIDENTIAL
Training schedule
Day-1
Shell scripting
awk
CONFIDENTIAL
Shell scripting
CONFIDENTIAL
Ex.1
cat x1 vs cat < x1
cat x1 > x2 vs cat < x1 > x2
Ex.2
1. Login as a non root user and go to root - cd /
2. Find everything find .
3. Again, find everything, but redirect output to a file
find . > $HOME/x1
What is being shown on screen? Why?
4. Redirect o/p & error to different files:
find . > $HOME/x1 2>$HOME/x2
5. Send o/p & error to different files:
FTP
ftp -n -i ftp.FreeBSD.org <<END_SCRIPT
user anonymous
pass MyScretPassword
ls
bye
END_SCRIPT
Receive input from a file:
ftp n i my.ftp.server < FileContaingCommands
cat -n 07ExampleFTP.sh
1
#!/bin/sh
2
# Usage:
3
# 07ExampleFTP.sh machine file
4
# set -x
5
SOURCE=$1
6
FILE=$2
7
GETHOST="uname -n"
8
BFILE=`basename $FILE`
9
ftp -n $SOURCE <<EndFTP
10
ascii
11
user anonymous $USER@`$GETHOST`
12
get $FILE /tmp/$BFILE
13
EndFTP
Ex: Redirect output, as well as error to the location where output is being directed:
isql $DATBASE_NAME <<END_EXEC >/home/ymahajan/Log 2>&1
select
request_id ,
trade_date ,
from
trade_table
where
request_id = "$REQUEST_ID";
END_EXEC
Multiple SQLs:
sqlite3 TrainingDB
<<END_SQL
.mode column
.header on
select DATA AS DATA from t1;
Select * from orders;
END_SQL
-A single-column:
sqlite3 TrainingDB
<<END_SQL
.mode column
.header on
select DATA AS DATA
from t1;
END_SQL
-e "/MyOwnData/d"
-e
tee:
sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e
"/^$/d" -e "/MyOwnData/d"
> TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
---vs--sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e
"/^$/d" -e "/MyOwnData/d" | tee TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
Error Handling: Use redirection-operators to direct output to a file for further analysis:
SomeVariable=`sqlite3 TrainingDB
<<END_SQL 2>&1 | sed -e "s/-//g" | sed -e "/^$/d"
-e "/MyOwnData/d" | tee TMPLOG
.mode column
.header on
select ColNotPresent AS MyOwnData
from t1;
END_SQL
`
echo $SomeVariable;
grep ie error TMPLOG
if [ $? = 0 ]
echo "$0: INFORMIX SQL ERROR: Script $0 failed in $Query" | tee -a $OPC >> $LOG
exit $retcd
else
echo "$0: QUERY $Query SUCCESSFUL >> $LOG
fi
WHY did NOT we compare $0 to 0/1 directly after SQL query instead of doing an grep on TMPLOG and
then doing it?
-S
-t
terminal.
-r/-w/-x
-g
-u
-t
write
-O
-G
-N
f1 -nt f2
f1 -ot f2
f1 -ef f2
!
socket
Terminal-device, e.g. whether the stdin [ -t 0 ] or stdout [ -t 1 ] in a given script is a
File has read / write / execute permission
set-group-id (sgid) flag set on file or directory. If true, then any file created in this
directory will have direcotorys group ID
set-user-id (suid) flag set on file
sticky bit set (the t at the end of ls l o/p) - the save-text-mode flag is a special type of
file permission, if set, then file will be kept in cache-memory, if set to a file, then
permission will be restricted.
you are owner of file
group-id of file same as yours
file modified since it was last read
file f1 is newer than f2
file f1 is older than f2
files f1 and f2 are hard links to the same file
"not" -- reverses the sense of the tests above (returns true if condition absent).
File existence:
if [ -f /home/user11/Yogesh ]; then echo "File exists"; else echo "File NOT
present"; fi;
Directory existence:
if [ -d /home/user11/Yogesh ]; then echo 'This is not a file' ;fi;
Executable file:
if [ -x /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Wow, I can run this!
' ;fi;
Writeable file:
if [ -w /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Warning! This file could
be over-written!! ' ;fi;
File-test operators
Logical Operators:
! : NOT
-a : AND
-o : OR
Examples:
File empty or not?
if [ -f /home/user11/Yogesh.data a
-s /home/user11/Yogesh.data ];
then echo 'Some data in file' ;
fi;
-a
-a
];
file being written over!' ;
Functions
To be used within the script
Functionsor procedures
24
CONFIDENTIAL
Functions: scoping
Declare as:
function_name () {
list of commands
}
Invoke as:
function_name
function_name
1 b 3 other-arguments
CONFIDENTIAL
01
02
03
04
26
CONFIDENTIAL
awk
27
CONFIDENTIAL
Input_file
28
CONFIDENTIAL
29
CONFIDENTIAL
30
CONFIDENTIAL
cat file
Medicine,200
Grocery,500
Rent,900
Grocery,800
Medicine,600
Restaurent,300
<empty line>
31
CONFIDENTIAL
32
CONFIDENTIAL
cat -n awkscript02.awk
1
BEGIN{
33
2
IGNORECASE=1
3
print("--START--")
CONFIDENTIAL
- You can change the field separator with the -F option on the command line
echo a,b,c,d | awk -F, 'BEGIN { one = 1; two = 2 } { print $(one + two) }
- f vs F:
awk -F, -f awkScriptFile.awk inputDataFile.dat
A better option is to specify it in BEGIN:
BEGIN { FS = "," }
- FS = "\t
Tab, i.e. a single tab as the field separator
- FS = "\t+
Tabs one or more!
- FS = "[':\t]
Any of these three 1, : or tab could be present
- awk -F word[0-9][0-9][0-9] file
fields separated by 3 digits
34
CONFIDENTIAL
35
CONFIDENTIAL
36
CONFIDENTIAL
37
CONFIDENTIAL
Range pattern, pat1, pat2 : A pair of patterns separated by a comma, specifying a range of records.
The range includes both the initial record that matches pat1, and the final
record that
matches pat2
awk '$1 == "on", $1 == "off"
Everything b/w on and off inclusive
38
CONFIDENTIAL
39
CONFIDENTIAL
awk functions
Built-in functions:
C-like operations, and operators.
Arithmetic functions
int(), sqrt(), sin( ), cos( ), exp( ), atan2( ), sqrt( ), rand( ), srand()
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_91.html#SEC94
String functions
index(), length(), match(), split(),sprint(), sub(), gsub(),substr(),tolower(), toupper()
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html
file2
awk 'x = sqrt( $2+$3);{printf("%s %.2f %d %d %d", substr($1,3,3) ,x, $2,$3, $2+$3)}'
40
CONFIDENTIAL
file2
awk functions
datafile
1.2
3.4
5.6
7.8
9.10 11.12 -13.14 15.16
17.18 19.20 21.22 23.24
User-defined functions:
Define:
function myprint(num)
{
printf "%6.3g\n", num
}
File rev.ask
function rev(str)
{
if (str == "")
return ""
return (rev(substr(str, 2)) substr(str, 1, 1))
}
41
CONFIDENTIAL
awk Operators
Arithmetic operators:
^, **, -, +, *, / , %,
Comparison-operators: <, <=, >, >=, ==, !=, ~, !~, in
String Concatenation:
No explicite operator, simply write strings next to each other, e.g. print "Field number one: " $1
Assignment:
=
Increment/Decrement: ++, -- : both post and pre-fix
Regexp Operators:
\
Suppress special meaning of a character, e.g. \$ would match a $ and not something at end of a line
^
Beginning of a string
$
End of a string
.(Period)
Any single character
() Group regexp together, e.g. @(samp|code)\{[^}]+\} matches both @code{foo} and @samp{bar}.
*
Repeat as many times as possible, e.g. ph* - lookup for one p followed by 0 or more h, e.g. p, ph, phhh
+
Repeat at least once, e.g. p - lookup for one p followed by 1 or more h, i.e. ph, phh etc. but not p
?
Match once or not at all, e.g. fe?d matches fd or fed, but not feed
{n}/{n,},{n,m} Match exactly n / n or more / n to m e.g. wh{3}y whhhy, w{1,2}y - why, why, w{1,}y why, whhy, whhhy etc.
[] Bracket expression, match any one, e.g. [Yog] matches any one of the Y, o or g.
[^] Complimented bracket expression, e.g. [^Yog] match if it does not contain either of Y, or or g.
|
Alteration operator, e.g. ^P|[aeiouy] - either it starts with a P, or contains any of aeiouy
42
CONFIDENTIAL
43
CONFIDENTIAL
file2
cat file2
#track
chr11
61731756
61735132
chr12
6643584 6647537 GAPDH
chr11
18415935
18429765
chr12
21788274
21810728
chr22
24236564
24237409
chr4
6641817 6644470 MRFAP1
chr15
72491369
72523727
chr10
73576054
73611082
chr2
85132762
85133799
chr13
45911303
45915297
FTH1
+
LDHA
LDHB
MIF +
+
PKM PSAP
TMSB10
TPT1
Ref.Stack Exchange:
http://unix.stackexchange.com/questions/127471/using-awk-for-data-manipulation
44
CONFIDENTIAL
+
-
+
-
The -v option lets you assign a value to a variable before the awk program begins running (that is, before the
BEGIN action). For example, in
awk -v v1=10 -f prog datafile
45
CONFIDENTIAL
Report Generation-I
- Get employee names and salary:
awk '{print $2, $5}' employee.tx
employee.txt
100 Thomas Manager
Sales
$5,000
200 Jason
Developer Technology $5,500
300 Sanjay Sysadmin
Technology $7,000
400 Nisha
Manager
Marketing
$9,500
500Randy
DBA
Technology $6,000
..
46
CONFIDENTIAL
Report Generation - II
- An HTML report:
awk -f report02.awk -v v1=Technology employee.txt > abc.html
cat report02.awk:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
47
BEGIN {
title="Salary Report by awk"
print"<html>\n<title>"title"</title><body bgcolor=\"#aabbcc\">"
print"\n<table border=1><th colspan=3 align=centre>Salary Report"
print"for "v1" department</th>";
print "<tr><td>#</td><td>EName</td><td>Salary</td>"
totalSal=0
count=0
}
{
#if($4=="Technology")
if($4==v1) {
count++
print "<tr><td>"count"</td><td>"$2"</td><td>"$5"</td>"
totalSal+=$5
}
}
CONFIDENTIAL
Assignment
Create an HTML Report for states data input states.dat (file shared w/ all)
Print name of state / UT, Capital, and year (in which capital was established).
Skip the header and footer (first and last row) of file
Background of UTs should be red
Names of UTs contain words union territory - since youre using highlighting, dont print that part
States set background colour to blue
In the bottom, print counts of:
States
Uts
Email the report as an attachment to yourself
CONFIDENTIAL
Checklist - awk
01
02
03
04
05
06
49
CONFIDENTIAL
Questions?
Thank you
CONFIDENTIAL