0% found this document useful (0 votes)

78 views9 pages

Unix Text Analysis

This document provides an introduction to manipulating data files from the UNIX command line. It discusses using commands like less, head, wc, grep, cut, and tr to view, count, select, and transform data in plain text files. The goal is to perform useful data cleaning, manipulation and simple reporting without needing a full spreadsheet or database program. It focuses on comma-delimited files but the techniques also apply to other delimited formats.

Uploaded by

jimbotyson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views9 pages

Unix Text Analysis

Uploaded by

jimbotyson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Data Manipulation with UNIX

Introduction

Who is this course for?

This course is for anyone who sometimes needs to manipulate UNIX data files – that is to say files in
plain text format – without needing all the power of a spreadsheet or database. It is for someone who
does not necessarily know much Unix but is comfortable with typing less at the command line to view a
file and moving around with cd, maybe piping grep to less to see just some lines of a file. For some
elements of the course it will be an advantage to know some programming but not much: everything we
do can be explained.

Why do this course?

Besides learning that you can do quite a lot of useful data cleansing, manipulation, and simple reporting
from the command line, this course will improve your general confidence in using Unix.
I refer below to two text files which can be downloaded from the ISD training resources website.

Displaying file contents

In what follows, I assume that we are dealing with record oriented data where each line of the file
analysed is a case or record – a collection of data items that belong together.
First let’s check the contents of your files with less or head. This will give you a clue to their format. We
will start with a comma delimited file called results.csv. Each line has the following structure
Surname, Maths_score, English_score, History_score
There are four fields separated by commas. This is a very common file type and easy to work with, but
it has a disadvantage - you may have text fields that contain commas as data. In those cases it is
easiest to use another character as a field delimiter when you create the data (you may be able to do
this for example with Excel- if you cannot you may have to do some clever data munging).
You can use either
less results.csv
Or
head results.csv
When we view the file with less we see
ADAMS,55,63,65
ALI,52,46,35
BAGAL,51,58,55
BENJAMIN,59,70,68
BLAKEMORE,56,38,40
BUCHAN,45,62,59
CHULANI,63,69,69
CLARK,52,64,65
DALE,50,55,52
DE SOUZA,44,60,62
DENCIK,57,67,65
DOBLE,64,56,65
DRURY,50,50,49
EL-DANA,62,59,60
FREEMAN,52,58,62
FROGGATT,39,57,59
GEORGARA,56,52,50
JAN,62,63,59
JENNER,56,67,65
JUNCO,48,57,55
LEFKARITIS,53,56,59
LUKKA,58,59,55
MILNER,53,62,58

Joining Data Files 1

MIYAJI,58,66,60
NICHOLSON,55,55,58
PATEL,60,59,54
PEIRIS,60,52,55
RAMANI,42,43,40
ROSEN,54,55,54
ROWLANDS,47,50,48
(A screen at a time).

Counting Data Items

First, let’s make some simple counts on this file. We can use wc to count characters, words (anything
surrounded by whitespace) and lines
wc xresults.csv
wc -w xresults.csv
wc -c xresults.csv
wc -l xresults.csv
When we are dealing with record oriented data like ours wc –l will display the number of records.

Selecting Data Items

Selecting Rows
Next, we will select some data using grep. Try the following command
grep ‘^R’xresults.csv
This will display only three rows of the file. The expression in quotes is the search string. If we wish we
can direct the output of this process to a new file, like this
grep ‘^R’ xresults.csv > outputfile.txt
This command line uses the redirect output symbol. In Unix the default output destination is the screen,
and it’s known as stdout (when it needs naming). The default input source is the keyboard, known as
stdin. So when data is coming from or going to anywhere else, we use redirection. We use redirection
with > to pass the results of a process to a new output or with < to get data from a new input. If we want
to append the data to the end of an existing file (as new rows) we use >> instead of >.
We can use a similar command line to count the rows selected, but this time let’s change the grep
command slightly.
grep ‘^[RB] ’xresults.csv | wc -l
This command line uses the pipe symbol. We use piping with | to pass the results of one process to
another process. If we only wanted a count of the lines that match then instead of piping the result to
wc we could use the –c parameter on grep, like this
grep -c ‘^[RB] ’xresults.csv | wc -l
Also notice that in the cases above we have used the anchor ^ to limit the match by grep to the start of
a line. The anchor $ limits the search to matches at the end of the line. We have used the character
class, indicated by [ and ], containing Rand B and grep will succeed if it finds any of the characters in
the class. We enclose this regular expression in single quotes.
We use grep in this way to select row data.

More About Searching

The standard form of a basic grep command is
grep -[options] ‘search expression’ filename
Typically search expression is a regular expression. The simplest type of expression is a string literal –
a succession of characters each treated literally, that is to say standing for themselves and nothing else.
If the string literal contains a space, we will need to surround it by single quote marks. In our data we
might look for the following
grep ‘de souza’ xresults.csv

The next thing to learn is how to match a class of characters rather than a specific character. Consider

Joining Data Files 2

grep [A-Z] xresults.csv
This matches any uppercase alphabetic character. Similarly
grep [1-9] xresults.csv
matches any numeric character. In both these cases any single character of the right class causes a
successful match. You can specify the class by listing as well. Consider
grep [perl] xresults.csv
Which matches any character from the list p,e,r,l (the order in which they are listed is immaterial).
You can combine a character class and a literal in a search string. Consider
grep ‘Grade [BC]’ someresults.csv
this search would find lines containing Grade B and lines containing Grade C.
You can also search using special characters as wildcards. The character . for example, used in a
search stands for any single character except the newline character. So the search
grep . xresults.csv
succeeds for every non-empty line. (If . matched the newline character it would succeed for empty lines
as well). The character * stands for zero or any number of repetitions of a character. So
grep ‘a*’ xresults.csv

matches

a
aa
aaa
and so on. Notice the blank line there? Probably not, but it’s there. This regular expression matches
zero or more instances of the preceding character.
Suppose that I wish to find a string that contains any sequence of characters followed by, for example
m. The grep command would be
grep ‘.+m’ xresults.csv
This is a greedy search: it is not satisfied with the very first successful match, it continues past the first
match it finds to match the longest string it can. For now we will just accept this greedy searching, but if
you investigate regular expressions further you will discover that some versions have non-greedy
matching strategies available.

Selecting Columns
We can also select columns. Because this is a delimited file we can split it into columns at each
delimiter - in this case a comma. This is equivalent to selecting fields from records.
Suppose that we want to extract column two from our data. We do this with the cut command. Here’s
an example
cut -d, -f2 xresults.csv| head
The first nine lines of the resulting display are
xx
xx
xx
xx
xx
xx
xx
xx
xx
xx
We can display several columns like this
cut d, -f1-3 xresults.csv
which displays a contiguous range of columns, or
cut -d, -f1,3 xresults.csv

Joining Data Files 3

which displays a list of separate columns. The -d option on cut specifies the delimiter (your system will
have a default if you don’t specify - find out what it is!) and the -f option specifies the column or field
number. We use cut in this way to select column data.
The general form of the cut command is
cut -ddelimiter -ffieldnumbers datafile
So in the examples, we specified comma as the delimiter and used fields 1 and 3 and the range of fields
1 to 3.

Selecting Columns and Rows

Suppose that we want to select just some columns for only some rows? We do this by first selecting
rows with grep and passing this to cut to select columns. You can try
grep ‘^[AR]’ xresults.csv | cut -d, -f 1,3 | less
(I put the less in because it’s generally a good idea if you’re going to squirt data at the screen - it’s not
doing anything important). Again, we use piping to pass the results of one process to another. You
could also redirect the output to a new file.

Transforming Data
There is another comma delimited file called results.csv which has the following structure

Surname, Mean_score, Grade

Currently the grade is expressed as an alphabetic character. You should check this by viewing the
surnames and grades from this file. The command is
cut -d, -f1,3 results.txt
We can translate the alphabetic grade into a numeric grade (1=A, 2=B etc) with the command tr. Try
this
tr ‘,A’ ‘,1’ < results.txt
Notice that I included a leading comma in the search and replace strings because I wanted to catch just
the field containing A. I could have done this more elegantly by using ‘A$’ to anchor the match to the
end of the line.
In the example tr gets its input from the file by redirection. You can perform a multiple translation by
including more than one pair on the command line. For example
tr ‘,A ,B ,C’ ‘,1 ,2 ,3’< results.txt | less
You can use special characters in a tr command. For example to search for or replace a tab there are
two methods:
1. use the escape string \t to represent the tab
2. at the position in the command line where you want to insert a tab, first type control-v (^v) and
then press the tab key.
There are a number of different escape sequences (1 above) and there are different control sequences
(2 above) to represent special characters for example \n or sometimes ^M to represent a new line and \s
for white space. In general the escape sequence is easier to use.

Sorting

Alphabetically
Unix sorts alphabetically by default. This means that 100 comes before 11.

On Rows
You can sort with the command sort. For example
sort results.csv | less
This sorts the file in UNIX order on each character of the entire line. The default alphanumeric sort
order means that the numbers one to ten would be sorted like this

Joining Data Files 4

1,10,2,3,4,5,6,7,8, 9
This makes perfect sense but it can be a surprise the first time you see it.

Descending
You can sort in reverse order with the option -r. Like this
sort -r results.csv | less

Numerically
To force a numeric sort, use the option -n.
sort -n results.csv
You can use a sort on numeric data to get maximum and minimum values for a variable. Sort then pipe
to head 1 and tail 1, which will produce the first and last records in the file.

On Columns
To sort on columns you must specify a delimiter, with -t and a field number with -k. To sort on the third
column of the results data, try this
sort -n -t ‘,’ -k3 results.csv | less
(I’ve used a slightly more verbose method of specifying the delimiter here). You can select rows after
sorting, like this
sort-n -t ‘,’ -k3 results.csv | grep ‘^[A]‘ | less
Which shows those pupils with surnames beginning with A sorted on the third field of the data file.
To sort on multiple columns we use more than one –k parameter. For example, to sort first on Maths
score and then on surname we use
sort -n -t ‘,’ -k2n -k1 xresults.csv | less

Finding Unique Values in Columns

Suppose that you want to know how many different values appear in a particular column. With a little
work, you can find this out using the command uniq. Used alone, uniq tests each line against what
preceded it before writing it out and ignores duplicate lines.
Before we try to use uniq we need a sorted column with some repeated values. We can use cut to
extract one. Test this first
cut -d, -f2 results.csv | less
This should list just the second column of data which has a few duplicate values.
IWe pass the output through sort to uniq
cut -d, -f2 results.csv | sort |uniq | less
to get data in which the adjacent duplicates have been squeezed to one.
We can now pipe this result to wc-l to get the count of unique values.
cut -d, -f2 results.csv | sort | uniq | wc -l
Effectively, we can now calculate frequency results for our data.

Joining Data Files

There are two UNIX commands that will combine data from different files: paste and join. We will look
first at paste.

Paste
Paste has two modes of operation depending on the option selected. The first operation is simplest:
paste takes two files and treats each as column data and appends the second to the first. The
command is
paste first_file second_file

Joining Data Files 5

Consider this file:
one
two
three
Call this first_file. Then let this
four five six
seven eight nine
ten eleven twelve
be second_file. The output would be
one four five six
two seven eight nine
three ten eleven twelve
So paste appends the columns from the second file to the first row by row. As with other commands
you can redirect the output to a new file:
paste first_file second_file > new_file

The other use of paste is to linearize a file. Suppose I have a file in the format
Jim
Tyson
UCL
Information Services
You can create this in a text editor. I can use paste to merge the four lines of data into one line
Jim Tyson UCL Information Services
The command is
Paste -s file
As well as the –s option, I can add a delimiter character with –d. Try this
Paste -d: -s file

Join
We have seen how to split a data file into different columns and we can also join two data files together.
To do this there must be a column of values that match in each file and the files must be sorted on the
field you are going to use to join them.
We start with files where for every row in file one there is a row in file two and vice versa.
Consider our two files. The first has the structure

Surname, Maths_score, English_score, History_score

The second

Surname, Mean_score, Grade

We can see then that these could be joined on the column surname with ease since surname is unique.
After sorting both files we can do this with the command line
join -t, -j1 results.csv xresults.csv | less
The option-t specifies the delimiter and-j allows us to specify a single field number where this is the
shared field.
If the columns on which to match for joining don’t appear in the same position in each file, you can use
the -jn m option several times where in each case n is the numeric file handle (look at the order that you
name the files later) and m is the number of the join field. In fact, we could write
join -t, -j1 1 -j2 1 results.csv xresults.csv | less
for the same result as our previous join command.
Essentially, join matches lines on the chosen fields and adds column data. We could send the resulting
output to a new file with > if we wished.

Joining Data Files 6

In my example there is (deliberately) one line in file one for each line in file 2. There is of course no
guarantee that this will be the case. To list all the files from a file regardless of a match being found, we
use the option –a and the file handle number.
join -t, -a1 -j1 1 -j2 1 results.csv xresults.csv | less
This would list every line of results.csv and only those lines of xresults.csv where a match is found.
The default join is that only items having a matching element in both files are displayed. We can also
produce a join where all the rows from the first file named and only the matching rows from the second
are selected as we did above. Finally we can produce a version where all the rows of the second file
are listed with only matching rows from the first with the following
join -t, -a2 -j1 1 -j2 1 results.csv xresults.csv | less
And lastly, we can produce all rows from both files, matching or not with
join -t, -a1 -a2 -j1 1 -j2 1 results.csv xresults.csv | less
The last thing we should learn about join is how to control the output. The option –o allows us to choose
which data fields from each file are displayed. For example
-o 0 1.2 2.3
displays the match column ( always denoted 0), the second column from the first file (1.2) and the third
column from the second file (2.3).

sed and AWK - more powerful searching and replacing

sed
Sed is a powerful Unix tool and there are books devoted to explaining it. The name stands for stream
editor a reminder that it reads and processes files line by line. One of the basic uses of sed is to search
a file - much like grep does - and replace the search expression with some other text specified by the
user. An example may make this clearer
sed s/abc/def/g input
After the command name, we have s for search followed by the search string and then the replace string
surrounded and separated by / and then g indicating that this operation is global- we are looking to
process every occurrence of abc in this file. The filename follows, in this case a file called input.

Some sed Hacks

Rather than pretend to cover sed in any real depth, there follow a very short list of sed tricks that are
sometimes useful in processing data files. These are famous “sed one-liners” and are listed by Eric
Pement on his website at http://www.pement.org/sed/sed1line.txt.
sed G
Double spaces the file. It reads a line and G appends a newline character. Remember that reading in a
newline is basic to sed’s operation.
sed '/^$/d;G'
Double spaces a file that already has some blank lines. First remove an empty line then append a
newline.
sed 'G;G'
Triple spaces the file.

sed 'n;d'
This removes double line spacing - and does it in a rather crafty way. Assuming that all the first line
read is not blank, then all even lines should be blank, so alternately printing a line out and deleting a line
should result in a single spaced file.
sed '/regex/{x;p;x;}'
This command puts a blank line before every occurrence of the search sting regex.
sed -n '1~2p'
This command deletes odd lines from a file.
I leave the investigation of more sed wizardry to you.

Joining Data Files 7

AWK
AWK is a programming language developed specifically for text data manipulation. You can write
complete programs in AWK and execute them in much the same way as a C or Java program (AWK is
interpreted though not compiled like C or byte code compiled like Java).
AWK allows for some sophisticated command line manipulation and I will use a few simple examples to
illustrate.
Because our file is comma delimited, we will invoke AWK with the option –F’,’. AWK will automatically
identify the columns of data and put the fields a row at a time into its variables $1…$n…$FN. The last
always identifies the last field of data.
So, we can try
awk -F’,’ ‘{print $2, $NF}’ results.csv
We can also find text strings in a particular column for example with
awk ‘$n~/searchtext’ results.csv
Where n in $n is a column number.
The ~ means ‘matches’. The expression !~means ‘does not match’.
Conditional processing in simple cases can be carried out by just stating the condition before the block
of code to be executed (that is inside the braces). For example
awk -F’,’ ‘$2>55 {print $2}’ xresults.csv
And we can create complex conditions
awk -F’,’ ‘$2 > 50 || $3 < 50 {print $3}’ xresults.csv
The || means OR and && means AND in awk.

But we can construct more complex processes quite easily. The following code won’t be difficult to
understand if you know any mainstream programming language

cut xresults.csv –d, -f2 | awk '{sum=0; for (i=1; i<=NF; i++) s=s+$i;
print sum}'

This code sums the three numeric fields and prints out the result.
As with sed there is a website for useful awk one-liners by Eric Pement at
http://www.pement.org/awk/awk1line.txt

In-line Perl - the Swiss army chainsaw of Unix data manipulation

The Perl programming language has always provided sophisticated data manipulation functions.
Learning Perl would be an even bigger project than learning sed but it is worth knowing at least
something about using Perl ‘in line’.
It is possible to use Perl code on the command line. Consider the simple Perl statement
print "hello"
(Programmers among you, notice that I omit the ‘;’). We can execute this directly from the Unix prompt
by invoking the Perl compiler with the option -e. Try this
perl -e print "Hello"
or better
perl -e print "Hello\n"
You remember \n: it gets us a nice new line. This way of running a Perl program combined with what
we can already do opens up the possibility of sophisticated data transformation. But still the -e option
runs a single (though possibly complex) Perl statement just once. So, we can have
perl-e ‘$number=5;$number>=4?print $number:print "less than four" ’
Can you work out what you think the result should be. Try it?

Joining Data Files 8

The example makes use of the popular but initially puzzling ternary operator which is a kind of
shorthand way of righting a conditional statement. Here the conditional is read

“if $number is greater than or equal to four, print $number, else print the
string “less than four”

The real value of in-line programming comes when we learn that we can loop through the output of
other command line operations and execute Perl code. We do this with the option -n Here is an
example
cut-d, -f2 results.csv | perl -ne ‘$_>=55?print"welldone":print"what a shame" ‘
Or we could do some mathematics
cut -d, -f2 results.csv | perl -ne '$n += $_; END { print "$n\n" }’
which will sum the column of numbers.
Another very useful Perl function for command line use is split. In Perl, split takes a string and divides it
into separate data items at a delimiter character and then it puts the results into an array.
To illustrate this try the following
less resluts.txt | perl -ne ‘@fields=split(/,/,$_);print "@fields[0] , "\t", "@fields[1]" , "\t","@fields[2]","\n" ‘ <

resluts.txt |

In this example the input from each line ($_ in perl) is split at the comma (/,/ where the slashes are
delimiters to differentiate the comma we pass as a parameter from the comma needed by the split
command). The next statement prints each field from the resultsing array, a tab and then ends with a
new line. This example uses escape sequence for tab again: \t.

Final Exercise
:To practise and consolidate try the following
1. Take the original results.csv data and find the average mark for each column of examination
marks. Can you see a way to write this set of values to the end of the file on a row labelled
averages?
2. Take the original results.csv and find the average examination mark for each pupil. Can you
add this new column of data to the original file? (Hint: > outputs the data from a process to a
new file but >> appends it to the end of an existing file)
3. Take the original results.csv and find the average examination mark for each pupil and on the
basis of the following rule, assign them to a stream

If average exam mark is greater than or equal to 60 the student is in

stream A, else if average exam mark is greater than or equal to 50 the
student is in stream B, else the student is in stream C.

Create a new file that includes these two new data items for each pupil.
Add solution.
THE END.

Joining Data Files 9

PPT OS Module2 Winter 24 25
No ratings yet
PPT OS Module2 Winter 24 25
94 pages
GPS Tracking System
No ratings yet
GPS Tracking System
79 pages
Iphone 5 Schematic 820-3141-B PDF
100% (1)
Iphone 5 Schematic 820-3141-B PDF
51 pages
Vijeo Basic V1.2 Technical Training
No ratings yet
Vijeo Basic V1.2 Technical Training
46 pages
Brother ADS-4300N Datasheet
No ratings yet
Brother ADS-4300N Datasheet
7 pages
Unix Commands
No ratings yet
Unix Commands
13 pages
Method Overloading: Answer
100% (1)
Method Overloading: Answer
14 pages
Android Project Synopsis
100% (1)
Android Project Synopsis
10 pages
Step-By-Step Guide 800xa System Installation
92% (13)
Step-By-Step Guide 800xa System Installation
78 pages
STL Cheatsheet
100% (1)
STL Cheatsheet
4 pages
Unix Basic Commands
No ratings yet
Unix Basic Commands
6 pages
Realhowto Java 201511
No ratings yet
Realhowto Java 201511
1,149 pages
1 Verilog Gate
No ratings yet
1 Verilog Gate
32 pages
IAF 0610 Hardware Test Design Coursework
No ratings yet
IAF 0610 Hardware Test Design Coursework
20 pages
Client Side Programming
No ratings yet
Client Side Programming
28 pages
Advanced_Unix_Commands-tmp.pptx
No ratings yet
Advanced_Unix_Commands-tmp.pptx
30 pages
Textprocessingutilities Awk Command: Used To Print The Output Based On Our Requirement
No ratings yet
Textprocessingutilities Awk Command: Used To Print The Output Based On Our Requirement
11 pages
OS & LINUX labmanual R20..
No ratings yet
OS & LINUX labmanual R20..
85 pages
Mimpython 1
No ratings yet
Mimpython 1
6 pages
UNIX Shell-Scripting Basics Complete Commands
100% (1)
UNIX Shell-Scripting Basics Complete Commands
62 pages
COSC_224_CAT_1._COMMANDS[1]
No ratings yet
COSC_224_CAT_1._COMMANDS[1]
17 pages
05 User Interfaces
No ratings yet
05 User Interfaces
2 pages
Linux 6
No ratings yet
Linux 6
25 pages
2023-07-31
No ratings yet
2023-07-31
6 pages
UnixCommands Day1
No ratings yet
UnixCommands Day1
20 pages
Module 5
No ratings yet
Module 5
14 pages
UNIX For BI-ClassBook-Lesson03
No ratings yet
UNIX For BI-ClassBook-Lesson03
22 pages
Unix Unit 2 Part 3
No ratings yet
Unix Unit 2 Part 3
11 pages
SW LAB 10 Filter
No ratings yet
SW LAB 10 Filter
45 pages
Unix Hands On Examples
No ratings yet
Unix Hands On Examples
17 pages
20 Terminal Function of Ics: 20.1. Ic2800 (Rfkwmvkx60Ga) : Ic Micro-Processor
No ratings yet
20 Terminal Function of Ics: 20.1. Ic2800 (Rfkwmvkx60Ga) : Ic Micro-Processor
1 page
DBMS
No ratings yet
DBMS
34 pages
Simple Filters
No ratings yet
Simple Filters
40 pages
20250122101321691_02-advanced-unix-commands-notes_px4D2Ov
No ratings yet
20250122101321691_02-advanced-unix-commands-notes_px4D2Ov
8 pages
CS29206 Systems Programming Laboratory, Spring 2022-2023
No ratings yet
CS29206 Systems Programming Laboratory, Spring 2022-2023
4 pages
Re: What Is Normalization Means..? Answer
No ratings yet
Re: What Is Normalization Means..? Answer
43 pages
UNIxnkv
No ratings yet
UNIxnkv
25 pages
Unit 03
No ratings yet
Unit 03
19 pages
Pipingfile
No ratings yet
Pipingfile
11 pages
Amjathfinal
No ratings yet
Amjathfinal
113 pages
Microsoft Visual Basic For Applications PDF
No ratings yet
Microsoft Visual Basic For Applications PDF
21 pages
L3 - Grep ND Egrep
No ratings yet
L3 - Grep ND Egrep
26 pages
L5 - Reg Exp
No ratings yet
L5 - Reg Exp
38 pages
Sodapdf Converted
No ratings yet
Sodapdf Converted
13 pages
Mca Sem1 Coa-Cbcgs May18
No ratings yet
Mca Sem1 Coa-Cbcgs May18
1 page
Linux Lecture 18
No ratings yet
Linux Lecture 18
21 pages
File Attributes, Permissions & Shell Programming and Interpretive Cycle
No ratings yet
File Attributes, Permissions & Shell Programming and Interpretive Cycle
35 pages
AWS Certified Developer-Associate Sample Questions v2.0 FINAL
No ratings yet
AWS Certified Developer-Associate Sample Questions v2.0 FINAL
5 pages
Module 5
No ratings yet
Module 5
13 pages
Unix Filterss
No ratings yet
Unix Filterss
37 pages
LINUX Exercises 5 To 10 Cavimbi Alfeu
No ratings yet
LINUX Exercises 5 To 10 Cavimbi Alfeu
19 pages
Unix and AWK Guide Final
No ratings yet
Unix and AWK Guide Final
13 pages
Unit 5: Distributed Multiprocessor Architectures
No ratings yet
Unit 5: Distributed Multiprocessor Architectures
48 pages
Command Line Tricks For Data Scientists - Kade Killary - Medium
No ratings yet
Command Line Tricks For Data Scientists - Kade Killary - Medium
16 pages
UNIX Filters
No ratings yet
UNIX Filters
18 pages
UNIX Basic Commands
No ratings yet
UNIX Basic Commands
41 pages
Unix Commands
No ratings yet
Unix Commands
76 pages
Tutorial Bash Data Handling
No ratings yet
Tutorial Bash Data Handling
64 pages
Unix Beyond Basics
No ratings yet
Unix Beyond Basics
20 pages
Windows Server 2008 R2 Failover Clustering - Best Practice Guide
No ratings yet
Windows Server 2008 R2 Failover Clustering - Best Practice Guide
6 pages
Lab Sheet 6
No ratings yet
Lab Sheet 6
6 pages
Awk Patterns: 'Awk' Patterns May Be One of The Following
No ratings yet
Awk Patterns: 'Awk' Patterns May Be One of The Following
3 pages
Week 7&8
No ratings yet
Week 7&8
8 pages
Master 2
No ratings yet
Master 2
173 pages
Unit 4: 1 Mohit Arora
No ratings yet
Unit 4: 1 Mohit Arora
23 pages
PCR Compano S MCR 2301
No ratings yet
PCR Compano S MCR 2301
9 pages
Unix_and_AWK_Guide_POLISHED_FINAL_FIXED
No ratings yet
Unix_and_AWK_Guide_POLISHED_FINAL_FIXED
22 pages
UNIX Helpful Commands: Brush Up Basic Commands
No ratings yet
UNIX Helpful Commands: Brush Up Basic Commands
12 pages
Perfected_Unix_and_AWK_Guide
No ratings yet
Perfected_Unix_and_AWK_Guide
21 pages
Unix - Commands
No ratings yet
Unix - Commands
24 pages
LinuxCommands CanadianUserGroup Summary
No ratings yet
LinuxCommands CanadianUserGroup Summary
4 pages
UNIT-4: Filters
No ratings yet
UNIT-4: Filters
30 pages
React JS 15 Interview Qns
No ratings yet
React JS 15 Interview Qns
4 pages
Lesson 1: Commands: Reference Manual Commands Files
No ratings yet
Lesson 1: Commands: Reference Manual Commands Files
52 pages
Linux Commands
No ratings yet
Linux Commands
33 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Complete Unix and AWK Guide
No ratings yet
Complete Unix and AWK Guide
19 pages
Bash Cheat Sheet by Tomi Mester
No ratings yet
Bash Cheat Sheet by Tomi Mester
19 pages
Filer Command
No ratings yet
Filer Command
38 pages
Basic Filters & Pipes
No ratings yet
Basic Filters & Pipes
33 pages
Linux Commands
No ratings yet
Linux Commands
11 pages
TrendMicro Datasheet Cloud One Workload Security
No ratings yet
TrendMicro Datasheet Cloud One Workload Security
7 pages
Computer Profile Summary
No ratings yet
Computer Profile Summary
8 pages
Grep Awk Sed
No ratings yet
Grep Awk Sed
9 pages
Install New Generation 6 Hardware Isilon
No ratings yet
Install New Generation 6 Hardware Isilon
17 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

Unix Text Analysis

Uploaded by

Unix Text Analysis

Uploaded by

Data Manipulation with UNIX

Who is this course for?

Why do this course?

Displaying file contents

Joining Data Files 1

Counting Data Items

Selecting Data Items

More About Searching

Joining Data Files 2

Joining Data Files 3

Selecting Columns and Rows

Surname, Mean_score, Grade

Joining Data Files 4

Finding Unique Values in Columns

Joining Data Files

Joining Data Files 5

Surname, Maths_score, English_score, History_score

Surname, Mean_score, Grade

Joining Data Files 6

sed and AWK - more powerful searching and replacing

Some sed Hacks

Joining Data Files 7

In-line Perl - the Swiss army chainsaw of Unix data manipulation

Joining Data Files 8

If average exam mark is greater than or equal to 60 the student is in

Joining Data Files 9

You might also like