0% found this document useful (0 votes)
27 views4 pages

Lab03 Exercises

This document provides instructions for using Linux commands to process text streams and files. It contains 3 exercises: 1. The first exercise demonstrates how to take a text file with numbers on one line separated by spaces, sort the numbers numerically, and output them to a new file on one line separated by spaces. 2. The second exercise shows how to determine the oldest file modification date in a directory by extracting dates from 'ls -l' output and sorting them. 3. The third exercise builds on the previous ones to find the day with the most file modifications in a directory by counting modification dates, then lists the files modified on that day.

Uploaded by

BCO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Lab03 Exercises

This document provides instructions for using Linux commands to process text streams and files. It contains 3 exercises: 1. The first exercise demonstrates how to take a text file with numbers on one line separated by spaces, sort the numbers numerically, and output them to a new file on one line separated by spaces. 2. The second exercise shows how to determine the oldest file modification date in a directory by extracting dates from 'ls -l' output and sorting them. 3. The third exercise builds on the previous ones to find the day with the most file modifications in a directory by counting modification dates, then lists the files modified on that day.

Uploaded by

BCO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Linux 101

Laboratory 03. Processing Text Streams

Exercise 1

Let’s start with a classical exercise of highschool level programming. We are given a text file containing a
line with numbers. Each two numbers are separated by a single space. Our task is to sort the numbers in
the file in ascending order and put the output into another file with the same format.

First of all we need an input file; we can create one using the echo command and output redirection:

$ echo “23 1 15 4 3 21 10 6 14 5” > tosort

Now we need to create a sequence of filter commands that generate (after we pipe them together) the
sorted file.

The first command in the sequence has to be something that converts the input file into a text stream:

$ cat tosort
23 1 15 4 3 21 10 6 14 5
Now we are facing a little problem: the sort command orders lines of streams but we have a single
line. So we need to put each number on a separate line then feed the stream to sort. We know that
between each two numbers there is a single space so we can use the tr to translate spaces to newline
characters:

$ cat tosort | tr „ „ „\n‟

We now feed the stream to the sort command. Don’t forget that the default type of sorting is
lexicographic and we deal with numbers. Thus, we need the option –n for the sort command to activate
numeric sorting.

$ cat tosort | tr „ „ „\n‟ | sort –n

To finish our task we have to put the sorted numbers back on a single line and output them to a file.

$ cat tosort | tr „ „ „\n‟ | sort –n | tr „\n‟ „ „ > sorted

We got our desired result. The only minor drawback is that the output file has no newline character at
the end. Thus, if you cat it to the output the prompt will be displayed on the same line:

$ cat sorted

1
Experiment

1. Try a command, either standalone or appended to the piped sequence, that would add a newline
character to the sorted file.
2. What would be the output if we omitted the –n option of the sort command?
3. What would have happened if the numbers in the input were separated by more than a single
space?

Exercise 2

We are given the task to determine which is the modification date of the oldest files in a certain
directory. Let’s see the sequence of filters that lead us to glorifying results.

In the first place we need to know where could we extract the modification date from. The answer is the
output of the ls command with the –l option.

-rw-r--r-- 1 linux101 linux101 26 2010-11-18 00:20 sorted


-rw-r--r-- 1 linux101 linux101 1972 2010-11-13 00:16 zless.1

As we can see, the field that we need is the 6th field of the ls command’s output. But there is a catch;
the fields are separated by more than one space (Ex: there are three spaces between linux101 and 26).
Because of this, if we use the cut command with the space delimiter, our date field would actually be
the 8th field for the first line and the 6th field for the second line. This is unacceptable for automation. We
have to make the result of ls cut-able by fields. One way to get this result would be to shrink all runs of
more than one space to a single space character. The simplest way to do this is with the help of sed:

$ ls –l | sed „s/\s\s*/ /g‟

In words, the command given above to sed means: substitute globally every occurrence of one space
character followed by a run of zero or more space characters (regular expression) with a single space
(replacement string). Our problem above is solved but we could still make a small adjustment. We know
that the default delimiter of the cut command is the tab character. Thus, we can cut the output of sed
but only if we specify the –d option of the cut command:

$ ls –l | sed „s/\s\s*/ /g‟ | cut –f 6 –d „ „

Through thinking ahead, we can simplify our cut command by asking sed to replace runs of repeated
spaces with a single tab instead of a single space:

$ ls –l | sed „s/\s\s*/\t/g‟

Now we can append the simplified cut:

$ ls –l | sed „s/\s\s*/\t/g‟ | cut –f 6

2
We obtained a list of modification dates of the files in the current directory. We need to determine
which is the smallest date. We could put the output in a file and then look through it in hope of a result
but this would be just boring labor. Instead we can sort the output in reverse order; as such, the last
date we see in the result is the smallest date in the list.

$ ls –l | sed „s/\s\s*/\t/g‟ | cut –f 6 | sort -r

Experiment

1. Where does the blank line in the output (last line) come from?
2. Modify the command sequence such that the blank line does not appear any more.
3. What would happen if the regular expression given to sed would be ‘\s*’ instead of ‘\s\s*’?
4. After solving point 2, add a single command to the sequence such the the only output is the wanted
date.
5. Write this command sequence into a file; it can come in handy later. Devise a plan (or two) to move
it directly from the shell’s history instead of just writing it again.

Exercise 3

We are happily administering our linux system (commonly known as linux box) when our boss comes
and sais that a few days ago (he doesn’t know exactly what day) a process generated lots and lots of
files in a directory. He wants us to investigate the problem. In the first place we want to determine in
which day were the most files modified. After we have the date we want to also know the files so we
can isolate them.

Having mastered the previous exercise we know of a simple way to get a list of modification dates for all
the files in a directory:

$ ls –l | sed „s/\s\s*/\t/g‟ | cut –f 6

The modification of the ls output to be cut-able could come in handy later so instead of typing it each
time we could save it as a command alias and then just type the alias:

$ alias lscut=”ls –l | sed „s/\s\s*/\t/g‟”

$ lscut | cut –f 6

We have the dates so now we just have to count them and see which one appears more than the
others. The uniq command comes to mind with its count (-c) option. We know that for the uniq
command to work properly we need to sort its input.

$ ls –l | sed „s/\s\s*/\t/g‟ | cut –f 6 | sort | uniq -c

3
We have now obtained a list of unique dates preceded by their number of occurrences. The list can be
quite large (or at least can scroll out of the screen) so we want to append a command to the sequence
that would show us directly the line having the greatest number. We again come into the hands of
sort. This time we specify to sort that we want to order the output numerically on the first field:

$ lscut | cut –f 6 | sort | uniq –c | sort -n –k 1

Now the last line in the output shows us the date we wanted.

Experiment

1. What would have we obtained if we didn’t sort the stream before feeding it to uniq?
2. Automate the above task in a single command called impdate. Modify the command sequence
such that the output of impdate is just the needed date.

Having got what we searched for in the first place, we now move towards determining which files were
modified on that particular date. We can get a long listing of the directory but we have to select from it
only the lines that have in the 6th field our obtained date. We will use the join command. Here’s how:

First we get the date and store it in a file:

$ impdate > datefile

Then we put into another file a long listing of the directory sorted by the date field:

$ lscut | sort –k 6 > filelist

And finally we join the two files on the date field:

$ join –t „\t„ -1 6 -2 1 filelist datefile

Experiment

1. What would we get if we didn’t sort the directory listing before storing it in the file?

You might also like