0% found this document useful (0 votes)
10 views24 pages

Unit-2 Part 4

Uploaded by

komal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

Unit-2 Part 4

Uploaded by

komal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Pipes and Filters in Linux

What is a pipe in linux?

In the terminal, we use a pipe to connect 2 or more commands to provide a more specific or distinguished
output. In simpler words, the pipe symbol takes sed the output of the first command as the input to the
subsequent command. When we pipe the main command with any of the filters in Linux, we can get very
specific answers.

Since pipe helps you mash up many commands and run them at the same time, we can create really powerful
and efficient commands that can handle really complex tasks in a flash.

Instead of understanding pipes separately by taking individual examples, let us also understand what filters
are then directory combine pipes and filters to get a better understanding and save time.

What are filters in linux?

In Linux, filters are a set of various commands that take in standard input (say from the main command) and
then perform different options like sorting trimming, finding, lengthening, and so many more.

By definition, filters take in standard input (stdin) and produce an output on standard output (stdout). Let us
look at some of the filters in brief.

1. Linux cat command

The cat command on its own is a really powerful command which allows you to edit, create and manipulate
files. But as a filter, it simply moves standard input to standard output.

Syntax of Cat Command

cat <options> <filename>

On piping the cat


command with
“tac” will print the
data in your files
in reverse order. If
you are really
interested in
checking out the
features of the cat
command, feel
free to check out
my full-blown
article on the
command.
2. Linux cut command

The cut command is a pretty simple command, as the name suggests, it cuts text from where you specify.
After you chose to divide the text, you can even choose what to print.

Syntax of cut command:

cut <options> <byte position> <filename>

The cut command is a very versatile command as it lets you cut by various aspects like a hyphen (-), space,
byte position, character or even a specific word or pattern

The output is an example of cutting by the byte


position. Let us take a look at some of the options
that we can use with the cut command:

1. –c: This option lets you specify the character


from where you want to cut.

2. –d: This option lets you select a specific


section (like a hyphen or space) to cut.

3. –f: This option lets you specify the field you


want to print after cutting.

4. –b: This option lets you specify the position of


the byte you want to cut from.

5. –help: This option displays the help menu with all the options, uses and information regarding the
command.

3. Linux grep command

Syntax of grep command

grep <options> <string pattern> <filename>

The grep command is one of the most used commands when paired with filtering. The grep command is
used to search a file(s) for a word(s), sentence(s), or any pattern of words or characters and displays all the
lines that contain the same.

We use the grep command both individually or by piping it:

In the output, grep was piped with


the cat command, whereas in the
below output, grep was run
independently, yet produced the
same output.
However, we cannot do this always. The grep command comes in really handy when we want to find
something specific when searching in really long pieces of information, be it either text files or other
pieces of data like environmental variables, network packets and so many more.

Let us look at some options used with the grep command:

1. –i: This option makes the grep case insensitive.

2. –n: This option prints out the line numbers.

3. –c: This option gives the line number of the result you searched is in.

4. –v: This makes grep work like an inverted filter.

5.-w: This option matches the whole word you want to search.

6. –o: This option prints only the word you searched for and not the sentence.

7. –h: This option displays the help menu with all the different options.

8. –v: This option displays the information about the grip you are using.

4. Linux comm command

The comm command is simply used to compare 2 files. By default, the output of the comm command will
have 3 columns, the first one contains the similarities between both the items and the remaining 2 columns
carry the left out items of each of the files.

Syntax of Comm Command

comm <options> <filname 1> <filename 2>

Let us take the below-given text files for example:

Now if I compare both these text files using the comm command, The output will be as follows:
5. Linux sed command

The sed command is another file editing and a manipulating tool similar to cat. Sed stands for stream editor,
or basically file editor. The main difference between sed with other file editing commands is that this editing
is not permanent, the edits you make remain only on the display and the actual file contents remain the
same.

Syntax of linux sed command

sed <options> <script> <filename>

Let us understand the sed command by taking a small example. Say I have the below-given text file:

If we pipe the sed command to the cat command, following will be the output:

The following command I ran, replaced the word “Lambo” with “ferrari”. As mentioned earlier, the changes
only apply on the screen, which means If I go to the file again, the content will be unchanged. Let us look at
some of the options used long with the sed command:

1. –n: This option activates quiet mode.

2. –e:This option adds the script to the commands that are to be executed.

3. –f: This option adds the contents of the script file to the commands.

4. –sandbox: This option operates in sandbox mode.

5. –z: This option displays the help manual

6. –v: This option prints information about the version of the sed command
6. Linux tee command

The tees command just like sed, is very similar to the cat command, except 1 notch better. The tee command
takes the standard input on standard output and also puts them in a file. We can use the tee command by
piping it with other commands to make better use of it.

Syntax of tee command:

tee <options> <filename>

Let us understand the tees command better by taking an example. Let us consider the car.txt file:

If I pipe this command with a tee command:

The above command copied the contents of the file into another file. we can do the same thing using the cp
command also.

we can also add new content to a file by using the option “-a”

1. –a:This option appends the data to a given file.

2. –i: This option ignores the interrupt signals

3. –p: This option diagnoses errors while writing to non-pipes

4. –output-error=<mode>:This option sets the behavior in write error mode.

5. –v: This option displays the version information about the tee command
7. Linux tr command

The tr command is used to change the text from lower case to uppercase and vice versa. It cannot get any
simpler or straightforward than that. Tr stands for translate. To use the tr command we follow the syntax:

tr <options> <old text> <new text> <filename>

In the above example, we uppercased the previously lowercase l, f, p, and b.

Apart from just changing lower case letters to uppercase and vice versa, it can also do things like squeezing
content into one single line, making it case sensitive and so many more with the help of the options
available.

In the above output, I squeezed the content of the text file into a single line by using the command cat
<filename> | tr ‘\n’ ‘ ’

Since we are on the topic of options, let us look at some used with the tr command.

1. –s: This option squeezes the occurrence of multiple characters into one.

2. rot13: This option encrypts the text. It is case-sensitive.

3. –d: this option is used to delete characters

8. Linux uniq command

The uniq command is short for unique, and as the name suggests, it keeps only the unique contents of the
file and removes the repeated lines.

Syntax of linux uniq command

uniq <options> <filename>


Let us understand this command better by taking an example. Let us consider the below-given text file:

Now If I want to remove the repeated entries, I need to pipe the uniq command with the sort command as
follows:

Apart from suppressing duplicate entries in the file, the uniq command can also do different things like
counting the occurrence of the word:

Or displaying the repeated lines in a file as shown below and so many more.

We can do many tasks using different available options like:

1. –c: This option counts the occurrence of a word.

2. –d: This option displays the repeated lines.

3. –u:This option displays the unique lines.

4. –s:This option ignores the characters in comparison.


5. –f:This option ignores fields in comparison.

9. Linux wc command

The wc command is a very handy and powerful tool that helps in the process of counting the lines, words
and characters in a file. The wc command is usually used by piping it to some main command.

Syntax of wc command

wc <options> <filename>

By default, the wc command prints the number of lines, number of characters, and the number of words one
after the other separated by a space

In the above output, there are 8 words, 8 lines and 62 characters.

it can even tell you the count of lines, words and characters in multiple files.

You can also use the wc command by piping it, here’s an example:

Let us look at some of the options that are used with the wc command:

1. –c: This option prints the count of the bytes.

2. –m:This option prints the count so the characters.

3. –l:This option prints the count of the lines.

4. –w:This option prints the count of the words.


5. –L:This option prints the maximum display width.

10. Linux od command

Syntax of od command

od <-b>/<-t x1>/<-c><filename>

The od command is short for Octal Dump. This command displays the contents of files in different forms
like hexadecimal, octal, or ASCII characters. Let us see the different syntaxes we use for the different
formats along with an example for each:

1. od -b <filename>

This syntax displays the contents of a file in octal format.

2. od -t x1 <filename>

This syntax displays the contents of a file in hexadecimal bytes format.

3. od -c <filename>

This syntax displays the contents of a file in ASCII format.

11. Linux sort command

Syntax of sort command

sort <options> <filename>

The name says it all, the sort command helps in sorting the contents of a file in different ways and methods.
Let us take a look at some examples:
Normal sorting: This will sort the contents of a file alphabetically.

Sorting a column: this command will sort a specified column of the file.
Numeric sorting: you can sort content in a file numerically by using the option “-n”. You may also have to
specify the column number.

12. Linux gzip/gunzip commands

The gunzip and gzip commands are very similar to unzipping and zipping files. We follow the following
commands to gzip and gunzip:

gzip <filename>
gunzip <filename>

Let us look at some of the commonly used options used with the command:

1. –a:This option converts end-of-lines using local conventions. This command can also be written as ‘–
ascii’. Please note that this option is supported only on some non-Unix systems.

2. –c: This option writes output on standard output. It keeps original files unchanged.

3. –d: This command decompressed the files. It can also be written as ‘–decompress’ or ‘–uncompress’

4. –L: This option displays the information about the gunzip license.

5. –n: This option does not save the original file name and timestamp when decompressing it.

6. –N: This option is the exact opposite of the option ‘-n’, it saves the original file name and timestamp
when decompressing it.

7. –q: This option suppresses all the warnings.

8. –S: This option uses the suffix ‘.suf’, instead of ‘.gz’ when decompressing.

9. –t: This option performs a test by checking the integrity of the decompressed file.
10. –v: This option provides a verbose to give a better understanding of what is happening.

12. –V: This option displays the information about the version of gunzip you are using.

13. –H: This option displays the help menu.

13. Linux less command

As the name suggests, this command shows less information for more lengthy pieces of information. It cut
shorts the data and displays only the important data.

We can also improve the functionalities of the commands by using many other commands. For example, the
option “-s” suppresses the blank lines, and the option “-<n>” displays only the number of lines you
specified.

Syntax of less command

less <options> <filename>

14. Linux more command

The more command is exactly the same as the less command, with only one difference. The more command
displays a part of the information, and if you wish to see more of it, just press the enter button on your
keyboard.

Syntax of more command

more <options> <filename>

Creating your own filters in Linux

we can createoue own filters too. we cannot create our own commands (like grep, sed, sort, etc), but we can
create shell programs that when run, act as filters.

For example, we can create a filter that catches files greater than a specific size and then perform some
actions using the following syntax:

Let us create a simple program that prints the name, size, date, and time of files greater than 10000:
Instead of writing such big programs, you can use the “awk” command to write simple yet effective 1 line
programs that work very similar to filters. You can write the above shell program using the “awk” command
as shown below:

Display beginning and ending of the file:


To display the first part of the file, we use the head command in the Linux system.

The head command is used to display the beginning of a text file or piped data. By default, it displays the
first ten lines of the specified files. The tail command is also used to display the ending part of the file.

Syntax

The general syntax of the head command is as follow −

head [OPTION]... [FILE]...

Brief description of options available in the head command.

Sr.No. Option & Description

1 -c, --byte = [-]NUM


Display the first NUM bytes of each file. With the leading ‘-‘, print all but the
last NUM bytes of each file.

2 -n, --lines [-]NUM


Display the first NUM lines instead of the first ten with the leading ‘- ‘, display
all but the last NUM lines of each file.

3 -q, --quiet, --silent


Never prompt headers giving file names.

4 -v, --verbose
Always display headers giving file names.

5 -z, --zero-terminated
Line delimiter is NULL, not newline.

6 --help
Displays a help message and then exits.

7 --version
It gives info about the version and then exits.
By default, the head command prints the first ten lines without any option as shown in this example.

First, we will create a file containing more than Then, we will use the head command in the Linux
ten lines using the cat command in the Linux system to display the first ten lines.
system as shown below.

To print the first n lines, we use the -n or --lines option with the head command as shown below.

Suppose we want to display four lines of the text.txt file then we have to execute the command as shown
below.

$ head -n 4 text.txt

To print lines between m and n, we use the head and tail command in the Linux system as shown below.

Print line between M and N lines(M>N): For this purpose, we use the head, tail, and pipeline(|)
commands. The command is: head -M file_name | tail +N since the head command takes first M lines and
from M lines tail command cuts lines starting from +N till the end, we can also use head -M file_name | tail
+(M-N+1) command since the head command takes first M lines and from M lines tail command cuts (M-
N+1) lines starting from the end. Let say from the state.txt file we have to print lines between 10 and 20.
$ head -n 20 state.txt | tail -10

Jharkhand
Karnataka
Kerala
Madhya Pradesh
Maharashtra
Manipur
Meghalaya
Mizoram
Nagaland
Odisha
To check more information about the head command, we use the --help option with the head command in
the Linux operating system as shown below.
$ head --help

To check version information of the head command, we use the --version option with the head command in
the Linux operating system as shown below. $ head –version
To display the last part of the file, we use the tail command in the Linux system.

The tail command is used to display the end of a text file or piped data in the Linux operating system. By
default, it displays the last 10 lines of its input to the standard output. It is also complementary of the head
command.

Syntax

The general syntax of the tail command is as follow −

tail [OPTION]... [FILE]...

Brief description of options available in the tail command.

Sr.No. Option & Description

1 -c, --byte = [-]NUM


Display the last NUM bytes of each file. Or -c +NUM to display starting with
byte NUM of each file.

2 -f, --follow [ = {name | descriptor}]


Display appended data as the file grows.

3 -F
Same as --follow =name --retry

4 -n, --lines [-]NUM


Display the last NUM lines instead of the first 10.

5 --max-unchanged-starts = N
With --follow = name, reopen a FILE which has not

6 --pid = PID
With -f option, terminate after process ID, PID dies

7 -q, --quiet, --silent


Never prompt headers giving file names

8 --retry
Keep trying to open a file if it is not accessible

9 -v, --verbose
Always display headers giving file names
Sr.No. Option & Description

10 -z, --zero-terminated
Line delimiter is NULL, not newline

11 --help
Displays a help message and then exits.

12 --version
It gives info about the version and then exits.

By default, the tail command prints the last ten lines without any option as shown in this example.

First, we will create a file containing more than Then, we will use the tail command in the Linux
ten lines using the cat command in the Linux system to display the last ten lines.
system as shown below.

To prints the last n lines, we use -n or --lines option with the head command as shown below.

Suppose we want to display the last four lines of the text.txt file then we have to execute the command as
shown below.

$ head -n 4 text.txt

To check more information about the tail command, we use the --help option with the head command in the
Linux operating system as shown below.

$ tail --help

To check version information of the tail command, we use the --version option with the tail command in the
Linux operating system as shown below.

$ tail --version

Concatenating Files
One of the most common functions of the cat command is to concatenate files, as its name suggests.

The most simple concatenation is to display multiple files in the standard output:
cat file1 file2

The command above displays the files sequentially:

My file 1
My file 2

We can also use wildcards to display all the files that match a common pattern:
cat file*Copy

So far, we’ve been displaying the files in the standard output, but we can write the output into a new
file: cat file1 file2 > file3Copy

Also, we can append a file to an existing file: cat file1 >> file2Copy

Another useful option is to read from the standard input, which we represent by using ‘-‘ :

cat - file1 > file2

Finally, we can pipe cat output to other utilities to create more powerful commands:

cat file1 file2 file3 | sort > file4Copy

In this case, we’ve concatenated three files, sorted the result of the concatenation, and written the sorted
output to a new file called file4.

Cut and Paste command

The cut command enables you to extract a column of columns of information from a file. To specify the
column that is to be extracted, we use the -c parameter. This is then followed by the column number. To
extract more than one column, a comma separated list can be passed. Fields may also be specified by using
the -f. A delimiter may also be specified with the -d parameter. The default delimiter is the tab character
unless specified.

In the above example we have specified a delimiter of ",". We are specifying fields 1 and 3 are to be cut in
the file "cutfile1.txt".
The above is an example of the cut command in its simplest form. Here we are cutting the first 4 letters from the file
"cutfile1.txt".

Paste command examples

The paste command is useful for merging files together. The first line of each file is joined separated by a
Tab character. It is possible to specify a different delimiter with the -d parameter.

The next example is the same, however, we have changed the default Tab delimiter to a ":"

Comparing Two Files


The Linux diff command is used to compare two files line by line and display the difference between them.
This command-line utility lists changes you need to apply to make the files identical.

diff Syntax

diff [option] file1 file2

Output Syntax

When working with diff, it is crucial to know how to interpret the output, which consists of:

 Output starting with < refers to the content in the first file.

 Output starting with > refers to the content in the second file.

 Line numbers corresponding to the first file.

 A special symbol. Special symbols indicate how the first file needs to be edited to match the second
file. The output may display:

 a (add)

 c (change)

 d (delete)

 Line numbers corresponding to the second file.

diff Example

To show how the diff command works, we created two sample files and compared their content.

Create Two Sample Files

1. First, using the terminal, create two Linux file named example1.txt and example2 .txt. sudo nano
example1.txt

Compare the Files with the diff Command


1. With the two sample files in place, use the diff command to see how they differ and how to make them
identical:

diff example1.txt example2.txt

The output lists instructions on how to modify the first file to have the same content as in example2.txt. Let’s
look at the output for the sample files and decode the instructions.

 1d0 – The first line (1) from the first file should be deleted (d). If not, it would appear in line 0 in the
second file.

 < Apple –The content you need to delete (as referred to with 1d0).

 2a2,3 – In line 2 of the first file, you should add (a) lines 2 and 3 (2,3) from the second file.

 > Peach, > Apple – The content you need to add (as referred to with 2a2,3).

 4c5 – The fourth line (4) from the first file should be changed (c) to the fifth line (5) from the
second file.

 < Watermelon – The content you need to change.

 > Melon – What you need to change it to.

Note: Once you have the output instructions, you can use the patch command to save the output and
apply the differences.

diff Options

Without additional options, diff displays the output in the default format. There are ways to modify this
output to make it more understandable or applicable for your use case. Read on to learn more
about diff command options.

-c Option: The context format is a diff command-line utility option that outputs several lines of context
around the lines that differ.

To display the difference between the files in context form, use the command:
diff -c file1 file2

Take a look at the output for the sample files in the context form in the image below.

Lines displaying information about the first file begin with ***, while lines indicating the second file
start with ---.

The first two lines display the name and timestamp of both files:

*** example1.txt 2021-12-27 10:53:30.700640904 +0100

--- example2.txt 2021-12-27 10:54:41.304939358 +0100

**************** - is used just as a separator.

Before listing the lines from each file, the output starts with the line range of the files:

*** 1,5 ****

--- 1,6 ----

The rest of the lines list the content of the files. The beginning of each line instructs how to
modify example1.txt to make it the same as example2.txt. If the line starts with:

- (minus) – it needs to be deleted from the first file.


+ (plus) – it needs to be added to the first file.
! (exclamation mark) – it needs to be changed to the corresponding line from the second file.

If there is no symbol, the line remains the same.

Therefore, in the example above, you should delete Apple from the first line,
replace Watermelon with Melon in line four, and add Peach and Apple to lines two and three.

-u Option: The unified format is an option you can add to display output without any redundant context
lines. To do so, use the command:
diff -u file1 file2

Now, let's examine the output for the sample files in the unified format:

Lines displaying information about the first file begin with ---, while lines indicating the second file start
with +++.

The first two lines display the name and timestamp of both files:

*** example1.txt 2021-12-27 10:53:30.700640904 +0100

--- example2.txt 2021-12-27 10:54:41.304939358 +0100

@@ -1,5 +1,6 @@ - shows the line range for both files.

The lines below display the content of the files and how to modify example1.txt to make it identical
to example2.txt. When the line starts with:

- (minus) – it needs to be deleted from the first file.


+ (plus) – it needs to be added to the first file.

If there is no symbol, the line remains the same.

In the example above, the output instructs that Apple and Watermelon should be removed,
whereas Peach, Apple, and Melon should be added.

-i Option

By default, diff is case sensitive. If you want it to ignore case, add the -i option to the command:

diff -i file1 file2

The output with no additional options shows there are differences between the files and gives instructions
how to modify them.
However, if you add the -i option, there is no output as the command doesn’t detect any differences.

--version Option

To check the version of diff running on your system, run the command:

diff --version

--help Option

To output a summary of diff usage run:

diff --help

Other diff Options


Other options that diff supports include:

-a / --text View files as text and compare them line-by-line.

-b / --ignore-space-change Ignore white spaces when comparing files.

-B / --ignore-blank-lines<code> Ignore blank lines when comparing files.

--binary Compare and write data in binary mode.

-d--minimal Modify the algorithm (for example, to find a smaller set of changes).

-e / --ed Make output a valid ed script.

-E / --ignore-tab-expansion Ignore tab extension when comparing files.

-l / --paginate Run the output through pr to paginate it.

-N / --new-file Treat a missing file as present but empty.

-q / --brief Output whether files differ without specifying details.

-s / --report-identical-files Output when the files are identical.

-w / --ignore-all-space Ignore white space when comparing files.

You might also like