0% found this document useful (0 votes)
1 views

SPL_Lecture3

The document outlines key concepts in systems programming, particularly focusing on pattern matching, wildcards, redirection, and command manipulation in Linux. It explains the use of metacharacters, shell variables, and various shell types, along with examples of command substitution and the use of pipes. Additionally, it covers the handling of standard input and output, including redirection and the 'tee' command for duplicating output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

SPL_Lecture3

The document outlines key concepts in systems programming, particularly focusing on pattern matching, wildcards, redirection, and command manipulation in Linux. It explains the use of metacharacters, shell variables, and various shell types, along with examples of command substitution and the use of pipes. Additionally, it covers the handling of standard input and output, including redirection and the 'tee' command for duplicating output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Systems Programming Laboratory,

Spring 2025

Instructor: Koustav Rudra


Teaching Assistants: Dipankar Mandal, Subhankar Maity

Department of Artificial Intelligence (AI)


Indian Institute of Technology Kharagpur
January 20, 2025
Overview

➢ Pattern Matching—The Wild Cards (i.e., Metacharacters)


➢ Escaping and Quoting
➢ Redirection
➢ Collective Manipulation
➢ Pipes
➢ tee: Creating a Tee
➢ Command Substitution
➢ Shell Variables
Pattern Matching—The Wild Cards (i.e., Metacharacters)

❖ The metacharacters used to match filenames belong to a category called wild cards.

e.g.,

$ ls chap chap01 chap02 chap03 chap04 chapx chapy chapz

For instance, chap* represents all filenames beginning with chap. You can use this
pattern as an argument to a command rather than supply a long list of filenames which
the pattern represents. The shell will expand it suitably before the command is executed.
The * and ?
❖ This character matches any number of characters including none. When it is appended to the string
chap, the pattern chap* matches filenames beginning with the string chap—including the file chap.
You can now use this pattern as an argument to ls:

When the shell encounters this command line, it immediately identifies the * as a metacharacter. It
then creates a list of files from the current directory that match this pattern. It reconstructs the
command line as follows, and then hands it over to the kernel for execution:
Caution!
❖ Be careful when you use the * with rm to remove files. You could land yourself in a real
mess if, instead of typing rm *.o, which removes all the C object files, you inadvertently
introduce a space between * and .o:

The error message here masks a disaster that has just occurred; rm has removed all
files in this directory! Whenever you use a * with rm, you should pause and check the
command line before you finally press [Enter].
The ?
❖ The ? matches a single character. When used with the same string chap (as chap?),
the shell matches all five-character filenames beginning with chap. Place another ? at
the end of this string, and you have the pattern chap??. Use both of these expressions
separately, and the meaning of the ? becomes obvious:
The Character Class
❖ The character class comprises a set of characters enclosed by the rectangular brackets [ and
], but it matches a single character in the class. The pattern [abcd] is a character class, and it
matches a single character—an a, b, c, or d.

❖ Range specification is also possible inside the class with a - (hyphen); the two characters on
either side of it form the range of the characters to be matched. Here are two examples:
❖ The expression [a-zA-Z]* matches all filenames beginning with a letter, irrespective
of case.

❖ You can match a word character by including numerals and the underscore character
as well: [a-zA-Z0-9_]
Shells
Shell itself is a program on the server and can be one of many varieties

➢ bash : Most popular shell, default on most Linux systems. Installed on all Linux
systems

➢ zsh : A bash-like shell with some extra features.

E.g., support for decimals, spelling correction etc.

➢ tcsh : A C-like syntax for scripting, supports arguments for aliases etc (i.e.,
C-Shell)

Source: https://www.purdue.edu/hla/sites/varalalab/wp-content/uploads/sites/20/2018/02/Lecture_5.pdf
Negating the Character Class (!)
❖ The solution that we prescribe here unfortunately doesn’t work with the C shell,
but with the other shells, you can use the ! as the first character in the class to
negate the class.
Matching the Dot

❖ If you want to list all hidden filenames in your directory having at least three characters
after the dot, then the dot must be matched explicitly:

❖ However, if the filename contains a dot anywhere but at the beginning, it need not be
matched explicitly. For example, the expression *c also matches all C programs that end
with c, regardless of what comes before it.
There are two things that the * and ? can’t match

❖ First, they don’t match a filename beginning with a dot, although they can match any
number of embedded dots. For instance, apache*gz matches apache_1.3.20.tar.gz.

❖ Second, these characters don’t match the / in a pathname. You can’t use cd /usr*local
to switch to /usr/local.
The Shell’s Wild Cards
Wild Card Matches

* Any number of characters including none

? A single character

[ijk] A single character—either an i, j, or k

[x-z] A single character that is within the ASCII range of the characters x and z

[!ijk] A single character that is not an i, j, or k (Not in C shell)

[!x-z] A single character that is not within the ASCII range of the characters x and z
(Not in C shell)

{pat1,pat2...} pat1, pat2, etc. (Not in Bourne Shell; see Going Further)

!(flname) All except flname (Korn and Bash; see Going Further)

!(fname1 | fname2) All except fname1 and fname2 (Korn and Bash; see Going Further)
Escaping

❖ Escaping: Providing a \ (backslash) before the wild card to remove (escape) its special
meaning. or , When the \ precedes a metacharacter, its special meaning is turned off.

To remove the file My Document.doc, which has a space embedded, a similar reasoning
should be followed:

$
Ignoring the Newline Character

❖ Command lines that use several arguments often overflow to the next line. To
ensure better readability, split the wrapped line into two lines, but make sure that
you input a \ before you press [Enter]:

The \ here ignores [Enter]. It also produces the second prompt (which could be
a > or a ?), which indicates that the command line is incomplete.
Quoting
❖ There’s another way to turn off the meaning of a metacharacter. When a command
argument is enclosed in quotes, the meanings of all enclosed special characters are
turned off.

Single quotes

Double quotes
Redirection
Redirection in Linux is a mechanism that allows you to control the input and output of
commands by redirecting standard streams to files, other commands, or devices. Linux provides
three standard streams:

➢ Standard input: The file (or stream) representing input, which is connected to the keyboard.

➢ Standard output: The file (or stream) representing output, which is connected to the display.

➢ Standard error: The file (or stream) representing error messages that emanate from the
command or shell. This is also connected to the display.
Standard Input
❖ The keyboard, the default source.

❖ A file using redirection with the < symbol (a metacharacter).

❖ Another program using a pipeline (to be taken up later).


Command $ wc
The following options (or flags) are supported:

-c : Count bytes.

-m : Count characters.

-C : Same as -m.

-l : Count lines.

-w : Count words delimited by white space characters or new line characters. Delimiting characters
are Extended Unix Code (EUC) characters from any code set defined by iswspace().

If no option is specified the default is -lwc (count lines, words, and bytes.)
The keyboard, the default source

When you use wc without an argument and have no special symbols like the < and | in the
command line, wc obtains its input from the default source. You have to provide this input from the
keyboard and mark the end of input with [Ctrl-d]:
wc obtains its input from the default source

# Lines # Words # Characters


A file using redirection with the < symbol
❖ When you provide a file name to wc, it opens the file, reads its contents, and counts the lines,
words, and characters in the file.

❖ The shell opens the file and assigns it as standard input to the wc command. This redirection
requires the < symbol:

The filename is missing once again, which means that wc didn’t open /etc/passwd. It read the
standard input file as a stream but only after the shell made a reassignment of this stream to a disk
file. The sequence works like this:

➢ On seeing the <, the shell opens the disk file, /etc/passwd, for reading.
➢ It unplugs the standard input file from its default source and assigns it to /etc/passwd.
➢ wc reads from standard input that has previously been reassigned by the shell to /etc/passwd.
Taking Input Both from File and Standard Input

When a command takes input from multiple sources, say, a file and standard input, the -
symbol must be used to indicate the sequence of taking input.
Standard Output
All commands displaying output on the terminal actually write to the standard output file as a
stream of characters, and not directly to the terminal as such. There are three possible
destinations for this stream:

❖ The terminal, the default destination.

❖ A file, using the redirection symbols > and >>

❖ As input to another program using a pipeline


Examples of > (overwrite)
❖ You can replace the default destination (the terminal) with any file by using the > (right
chevron) operator, followed by the filename:
Examples of >> (append)
❖ If the output file doesn’t exist, the shell creates it before executing the command. If it
exists, the shell overwrites it, so use this operator with caution. The shell also provides
the >> symbol (the right chevron used twice) to append to a file:

❖ Redirection can also be used with multiple files. The following example saves all C
programs:
The File Descriptor

❖ Each of the three standard files is represented by a number, called a file descriptor.

0: Standard input

1: Standard output

2: Standard error

Note: > and 1> mean the same thing to the shell, while < and 0< also are identical. We need
to explicitly use one of these descriptors when handling the standard error stream.
Standard Error
❖ When a command runs unsuccessfully, diagnostic messages often show up on the screen.
This is the standard error stream whose default destination is the terminal.
Filters: Using Both Standard Input and Standard Output

Filters

e.g.,
Example
1
2
3
4

These four commands are functionally equivalent in terms of their execution. They all use redirection to pass calc.txt as input to
the wc command and save the output to result.txt. Let's break them down:

1. 2. 3. 4.

< calc.txt: Redirects the > result.txt: Specifies >result.txt: Redirects the > result.txt: Redirects
contents of calc.txt as that the output of wc output of wc to result.txt. the output of wc to
input to the wc should go to result.txt. result.txt.
command.

> result.txt: Redirects < calc.txt: Specifies that <calc.txt: Redirects the < calc.txt: Redirects the
the output of wc to the input for wc should contents of calc.txt as contents of calc.txt as
result.txt. come from calc.txt. input to wc. input to wc.
Collective Manipulation

So far, we have used the > to handle a single stream of a single command. But the shell also
supports collective stream handling. This can happen in these two ways:

❖ Handle two standard streams as a single one using the 2>&1 and 1>&2 symbols.

❖ Form a command group by enclosing multiple commands with the ( and ) symbols or {
and } symbols. You can then use a single instruction to control all commands in the
group.
Replicating Descriptors

➢ 1>&2 Send the standard output to the destination of the standard error.
➢ 2>&1 Send the standard error to the destination of the standard output.

The 2> symbol reassigns standard error to error.txt and 1>&2 sends the standard output of
echo to the standard error. Note the sequence: first we redirect and then we specify the
replication of the descriptor
Command Grouping
Sometimes, we need to manipulate a group of commands collectively: redirect them, run
them in the background, and so on. The () and {} handle a command group.

This saves all C program sources in a file preceded by a multicolumn list of programs acting
as a table of contents. The echo command serves to insert a blank line between them. The {}
can also be used for this purpose:
() vs {}
❖ Commands inside () are executed in a subshell. ❖ Commands inside {} are executed in the current
❖ A subshell is a separate instance of the shell that shell.
inherits the environment of the parent shell but ❖ No subshell is created, so any changes made to
does not affect it. Changes to variables, variables, directories, or environment settings
directories, or environment settings inside the inside {} persist in the parent shell.
parentheses are isolated and do not persist in the
parent shell.

❖ The commands inside the curly braces are


executed in the current shell.
❖ A subshell is created for the command group ( ls ❖ The output of all commands inside the curly braces
-x *.c ; echo ; cat *.c ). is redirected to c_progs_all.txt.
❖ The output of all commands inside the subshell is ❖ Any changes made inside the curly braces (e.g.,
redirected to c_progs_all.txt. variable assignments or cd) will persist after the
❖ Any changes made inside the parentheses (e.g., group finishes executing.
variable assignments or cd) do not affect the ❖ Curly braces ({}) extensively when programming
parent shell. with the shell.
Pipes
To understand pipes, we’ll set ourselves the task of counting the number of users currently
logged in.

➢ For long-running commands, this process can be slow. The second command can’t act unless the
first has completed its job.

➢ You require an intermediate file that has to be removed after completion of the job. When you are
handling large files, temporary files can build up easily and eat up disk space in no time.
Pipes
The shell can connect these streams using a special operator—the | (pipe)—and avoid the
creation of the disk file.

The output of who has been passed directly to the input of wc, and who is said to be piped to wc.
When a sequence of commands is combined together in this way, a pipeline is formed.

# Combining Commands using |

$ cat largefile.txt | sort | uniq > sorted_unique.txt


tee: Creating a Tee
tee is an external command and not a feature of the shell. It duplicates its input, saves one
copy in a file, and sends the other to the standard output.

$ ls | tee files.txt | grep ".txt"


➔ ls lists directory contents.
➔ tee writes the output of ls to files.txt.
➔ The output is then passed to grep, which filters and displays lines containing .txt.
➔ The terminal will display only the lines from the output that contain .txt.

$ echo "Data for multiple files" | tee file1.txt file2.txt


➔ Writes "Data for multiple files" to both file1.txt and file2.txt.
➔ Simultaneously sends the output to the terminal (stdout).
Command Substitution
A powerful shell capability that allows the output of one command to be used as an argument or
input for another command. Unlike pipes (|), which redirect data streams, command substitution takes
the output of a command and embeds it directly into another command as if it were typed out. When
scanning the command line, the ` (backquote or backtick) is another metacharacter that the shell
looks for.
Environment vs. Shell Variables
➢ Environment variables are ‘global’ i.e., shared ➢ Shell variables are only present in the shell in
which they were defined.
by all shells started AFTER variable is defined.

➢ A variable assignment is of the form


e.g., variable=value (no spaces around =), but its
evaluation requires the $ as prefix to the variable
HOME=/home/kvarala name:
e.g.,
SHELL=/bin/bash
$ count=5 # No $ required for assignment

$ echo $count # but needed for evaluation

$ total=$count # Assigning a value to another variable

$ echo $total

5
Environment vs. Shell Variables
$ export FOO=BAR (FOO defined in the environment or global variable)

$ FOO2=BAR2 (FOO2 defined in shell)

$ bash (Start new shell)

$ echo $FOO

BAR (echoes value of FOO)

$ echo $FOO2

(empty) [ The new shell cannot access FOO2 because it was not exported to the
environment in the parent shell.]

Source: https://www.purdue.edu/hla/sites/varalalab/wp-content/uploads/sites/20/2018/02/Lecture_5.pdf
Variable Concatenation is Simple

$ ext=.avi # This line assigns the string .avi to the variable ext

$ moviename=holmes # This line assigns the string holmes to the variable moviename.

$ filename=$moviename$ext # This line demonstrates string concatenation.

$ echo $filename
holmes.avi
All About Variables
➢ Variable names begin with a letter but can contain numerals and the _ as the other characters.

➢ Names are case-sensitive; x and X are two different variables.

➢ Unlike in programming languages, shell variables are not typed; you don’t need to use a char,
int, or long prefix when you define them.

➢ All shell variables are of the string type, which means that even a number like 123 is stored as a
string rather than in binary. (This may not remain true in the future.)

➢ All shell variables are initialized to null strings by default. While explicit assignment of null strings
with x=”” or x=’’ or x= is possible.
Thank you!

You might also like