Lecture4-Data-Files-Text-Processing-Formattng
Lecture4-Data-Files-Text-Processing-Formattng
In this other example, the field separator is a comma, and there are 5 fields
Jane,Bolden,1932,author,economics
John,Talbot,1945,poet,english
Video Lecture4-intro-and-wc command-4min
• Download the zip file data-temp.zip from Canvas -> Files -> zip files
• Unzip the file data-temp.zip and a directory called data-temp will be created
• List the content of data directory. You will find data files temp.dat, temp-clean.dat
and temp-clean1.dat
wc (options) filename
Try:
wc temp.dat
Use man page to find out which info the wc command provides
man wc
Explore the file: display the first 10 lines, and the last 10 lines of the file
Slides 2-4 video Lecture4-grep-5min
2
grep command
grep searches and prints lines that match one or more patterns
….pattern1…
….pattern2…
Ignore case
grep –i pattern filename
Slide 4 video Lecture4-sort-4min
4
sort command to sort lines
sort (options) filename
Useful options of the sort command:
-u to sort and remove duplicates
-knumber to sort lines based on a certain field number
-n to sort numerically (if you do not specify this, it will sort alphabetically)
-tsep to specify field separator (only if it is different than whitespaces)
If the field separator is one or more than one whitespaces (spaces, tabs) you do not need to
use option –t.
Try These
sort temp-clean.dat #sort alphabetically based on 1st field
sort –n temp-clean.dat #sort numerically based based on 1st field
cut –c 1 temp-clean.dat
• option –d"sep" –f n to cut specific field(s) by specifying field separator sep and field
number n. The field separator is also called a delimiter.
Try these
#field separator of temp-clean.dat is a blank character
#field separator of temp-clean1.dat is a :
Try this:
Make a file called file1 and write in it Hello
Make another file called file2 and in it write Unix
sed (stream editor) is a Unix utility that parses and transforms text
Example
var1=mass
var2=18.547
echo $var1 $var2 Kg
%type
%s string
%i or %d integer
%f float or real number
%e scientific notation or exponential
\n new line
Try these
printf "%f\n" 1.6547
printf "%e\n" 1250000
printf "%d\n" 2
printf "%s\n" Two
Slide 10 video Lecture4-printf-precision-3min
10
printf to format text: specify precision
printf "%[width].[precision]type" argument
Try these
printf "%.2f\n" 1.6547 #format float with 2 decimal places
printf "%.3f\n" 1.6547
printf "%.0f\n" 1.6547
#sometimes 1 is included for width: printf "%1.2f\n" 1.6547
% width.precisiontype format
The width is an integer number, which precedes the dot.
%width.precisiontype
If the width is larger than the number of characters of the
output, it will add whitespace characters to the left.
Try these
printf "%.2f\n" 1.6547
printf "%5.2f\n" 1.6547
printf "%6.2f\n" 1.6547
printf – define type and precision of multiple arguments and include text
printf "format1 format2" argument1 argument2
The text you include within the " " will be printed to the screen, including the spaces
characters
printf "Mass %.2f .. in %s\n" 65.4747 Kg
Mass 65.47 .. in Kg