0% found this document useful (0 votes)
99 views73 pages

Grep - Searching For A Pattern, Grep Options, Regular Expressions, Egrep and Fgrep

The document discusses UNIX processes and process APIs. It covers the fork, vfork, _exit, wait, waitpid, exec system calls. It describes how processes are created in UNIX using fork(), the attributes and segments of a process, and how the child process inherits attributes from the parent. It explains how a process can execute a new program using exec() and terminate using _exit(). The advantages of using fork() and exec() together are also summarized.

Uploaded by

Ahmad Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views73 pages

Grep - Searching For A Pattern, Grep Options, Regular Expressions, Egrep and Fgrep

The document discusses UNIX processes and process APIs. It covers the fork, vfork, _exit, wait, waitpid, exec system calls. It describes how processes are created in UNIX using fork(), the attributes and segments of a process, and how the child process inherits attributes from the parent. It explains how a process can execute a new program using exec() and terminate using _exit(). The advantages of using fork() and exec() together are also summarized.

Uploaded by

Ahmad Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

MODULE-4

UNIX Processes: UNIX kernel support for processes, Process APIs-


fork, vfork, _exit, wait, waitpid, exec, pipe- Process status,
running jobs in background, nice, signals, kill, at and batch,
cron
Simple filters and Regular Expressions: more, wc, od, pr, cmp,
diff, comm, head, tail, cut, paste, sort, tr, uniq, nl
grep – searching for a pattern, grep options, regular expressions,
egrep and fgrep

Unix Processes

A process is a program in execution.For Example a shell is a


process which is created when a user logs in to the system.When
the user enters another command say ls,another process is created
by the shell for the execution of the command.

When a process creates a child process,it becomes the parent


process and the child process inherits most of the attributes of
the parent process.

Advantages of Process creation

 Users can create multitasking applications


 Since the child process executes in its own virtual address
space,the success or failure in execution will not affect the
parent.A parent process can also query the exit status and
run time statistics of the child process after it has
terminated.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 1


 Users can write programs in which the child process can be
made to execute a new program,such as extending a
functionality while the parent process will be parallely
executing.

UNIX kernel support for processes

 The data structure and execution of processes are dependent


on operating system implementation.
 A UNIX process consists minimally of a text segment, a data
segment and a stack segment. A segment is an area of memory
that is managed by the system as a unit.
 A text segment consists of the program text in machine
executable instruction code format.
 The data segment contains static and global variables and
their corresponding data.
 A stack segment contains runtime variables and the return
addresses of all active functions for a process.
 UNIX kernel has a process table that keeps track of all
active process present in the system. Some of these processes
belongs to the kernel and are called as “system process”.
 Every entry in the process table contains pointers to the
text, data and the stack segments and also to U-area of a
process.
 U-area of a process is an extension of the process table
entry and contains other process specific data such as the
file descriptor table, current root and working directory
inode numbers and set of system imposed process limits

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 2


 All the process in a UNIX system except the very first
process(process -0 init) which is created by the system boot
code,are created by the fork() system call.
 After the fork() system call, both the parent and the child
process will resume execution at the return of fork() system
call.
 The process will be assigned with attributes, which are
either inherited from its parent or will be set by the
kernel.
 A real user identification number (rUID): the user ID of a
user who created the parent process.
 A real group identification number (rGID): the group ID of a
user who created that parent process.
 An effective user identification number (eUID): this allows
the process to access and create files with the same
privileges as the program file owner.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 3


 An effective group identification number(eGID):The process
will be assigned with attributes, which are either inherited
from its parent or will be set by the kernel.
 A real user identification number (rUID): the user ID of a
user who created the parent process.
 A real group identification number (rGID): the group ID of a
user who created that parent process.
 An effective user identification number (eUID): this allows
the process to access and create files with the same
privileges as the program file owner.
 An effective group identification number (eGID): this allows
the process to access and create files with the ame
privileges as the group to which the program file belongs.
 Saved set-UID and saved set-GID: these are the assigned eUID
and eGID of the process respectively.
 Process group identification number (PGID) and session
identification number (SID): these identify the process group
and session of which the process is member.
 Supplementary group identification numbers: this is a set of
additional group IDs for a user who created the process.
 Current directory: this is the reference (inode number) to a
working directory file.
 Root directory: this is the reference to a root directory.
 Signal handling: the signal handling settings.
 Signal mask: a signal mask that specifies which signals are
to be blocked.
 Umask: a file mode mask that is used in creation of files to
specify which accession rights should be taken out.
 Nice value: the process scheduling priority value.
 Controlling terminal: the controlling terminal of the
process.
In addition to the above attributes, the following attributes
are different between the parent and child processes:

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 4


 Process identification number (PID): an integer
identification number that is unique per process in an entire
operating system.
 Parent process identification number (PPID): the parent
process PID.
 Pending signals: the set of signals that are pending delivery
to the parent process.
 Alarm clock time: the process alarm clock time is reset to
zero in the child process.
 File locks: the set of file locks owned by the parent process
is not inherited by the chid process.
 fork and exec are commonly used together to spawn a sub-
process to execute a different program. The advantages of
this method are:
 A process can create multiple processes to execute multiple
programs concurrently.
 Because each child process executes in its own virtual
address space, the parent process is not affected by the
execution status of its child process.
 After fork(),a parent process may choose to suspend its
execution until its child process terminates by calling the
wait or waitpid system call,or it may continue independently
of its child process exceution.In the latter case,the parent
process may use the signal or sigaction function to detect or
ignore the child process termination.
 A process terminates its execution by calling the _exit
system call.The argument to the _exit system call is the exit
status code of the process.If the exit status code is 0 ,then
it indicates successful execution whereas any non-zero exit
code indicates failure.
 A process can execute a different program by executing the
exec system call.If the call succeeds ,the kernel will
replace the process existing text,code and data segments with

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 5


a new set that represents the program to be executed.However
the process is the same with the same process id and the
parent process id,its file descriptor and the opened
directory streams almost remains the same(except for those
whose close-on-exec flags are set.
 The exit status code returned by the parent may be polled by
the process’s parent via the wait or waitpid function.
 Fork and exec are commonly used to spawn a subprocess which
will execute a different program.
 Advantages:Multiple processes can be created for parallel
execution and the parent will not be affected by child
process execution as it executes in its own virtual space.

PROCESS APIS
 fork() and vfork()
NAME
fork - create a child process

SYNOPSIS
#include <unistd.h>
pid_t fork(void);

Fork() function takes no arguments.


Returns: 0 in child, process ID of child in parent, -1 on error
and the function sets errno with an error code.
Some common causes of fork failure and corresponding errno values
Errno value Meaning
ENOMEM Insufficient memory to create a
process
EAGAIN Number of process in the system
concurrently eecuting exceeds
the system imposed limit and

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 6


hence to tray again later.
CHILD_MAX-Max no of processes
that can be created by a single
user.
MAXPID-Max no of processes that
can concurrently execute

DESCRIPTION
 The new process created by fork is called the child process.
 This function is called once but returns twice.
 The only difference in the returns is that the return value
in the child is 0, whereas the return value in the parent is
the process ID of the new child.
 The reason the child's process ID is returned to the parent
is that a process can have more than one child, and there is
no function that allows a process to obtain the process IDs
of its children.
 The reason fork returns 0 to the child is that a process can
have only a single parent, and the child can always call
getppid() to obtain the process ID of its parent. (Process ID
0 is reserved for use by the kernel, so it's not possible for
0 to be the process ID of a child.)

 Both the child and the parent continue executing with the
instruction that follows the call to fork .
 The child is a copy of the parent.
 For example, the child gets a copy of the parent's data
space, heap, and stack.
 Note that this is a copy for the child; the parent and the
child do not share these portions of memory.
 Both the child and the parent continue executing with the
instruction that follows the fork () system call.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 7


 The child is a copy of the parent.
 For example, the child gets a copy of the parent's data
space, heap, and stack.
 Note :that this is a copy for the child; the parent and the
child do not share these portions of memory.the call to fork

The parent and the child share the text segment .

Example programs:
Program 1
/* Program to demonstrate fork function Program name – fork1.c */
#include<sys/types.h>
#include<unistd.h>
int main( )
{
fork( );
printf(“\n hello USP”);
}
Output :
$ cc fork1.c
$ ./a.out

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 8


hello USP
hello USP
Note : The statement hello USP is executed twice as both the child
and parent have executed that instruction.
Program 2
/* Program name – fork2.c */
#include<sys/types.h>
#include<unistd.h>
int main( )
{
printf(“\n HELLO “);
fork( );The process id of the parent is 2393
The process id of the child is 3079
printf(“\n hello USP”);
}
Output :
$ cc fork2.c
$ ./a.out
HELLO
hello USP
hello USP
Note: The statement HELLO is executed only once by the parent
because it is called before fork and statement hello USP is
executed twice by child and parent.
Program 3
/* Program name – fork3.c */
#include<sys/types.h>
#include<unistd.h>
int main( )
{
printf(“\n HELLO “);
fork( );
printf(“\n hello USP”);

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 9


}
Output :
$ cc fork3.c
$ ./a.out
HELLO
hello
Note: The statement HELLO is executed by the parent and statement
hello is executed by chiThe process id of the parent is 2393
The process id of the child is 3079ld.
Program 4
/* Program name – fork4.c */
#include<sys/types.h>
#include<stdio.h>
#include<unistd.h>
int main( )
{
int pid;
pid=fork( );
if(pid<0)
printf("ERROR\n");
if(pid==0)
printf("The process id of the child is %d\n",getpid());
else
printf("The process id of the parent is %d\n",getppid());
}

Output :
$ cc fork4.c
$ ./a.out
The process id of the parent is 2393
The process id of the child is 3079

Program5-fork5.c

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 10


#include<sys/types.h>
#include<stdio.h>
#include<unistd.h>
int main( )
{
int pid;
int a=5,b=6;
pid=fork( );
if(pid<0)
printf("ERROR\n");
if(pid==0)
{a++;
b--;
printf("The value of a and b are %d \t %d\n",a,b);
_exit(0);
}
else
printf("The value of a and b are %d \t %d\n",a,b);
}
OUTPUT
cc fork5.c
./a.out
The value of a and b are 5 6
The value of a and b are 6 5
vfork()

NAME
vfork - create a child process and block parent

SYNOPSIS

#include <sys/types.h>

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 11


#include <unistd.h>
pid_t vfork(void);

 The function vfork has the same calling sequence and same
return values as fork .
 The vfork function is intended to create a new process when
the purpose of the new process is to exec a new program.
 The vfork function creates the new process, just like fork ,
without copying the address space of the parent into the
child, as the child won't reference that address space;
 the child simply calls exec (or exit )right after the vfork .
 Instead, while the child is running and until it calls either
exec or exit , the child runs in the address space of the
parent. This optimization provides an efficiency gain on some
paged virtual-memory implementations of the UNIX System.
 Another difference between the two functions is that vfork
guarantees that the child runs first, until the child calls
exec or exit . When the child calls either of these
functions, the parent resumes.
Program 1:vfork.c
#include<sys/types.h>
#include<stdio.h>
#include<unistd.h>
int main( )
{
int pid;
int a=5,b=6;
pid=vfork( );
if(pid<0)
printf("ERROR\n");
if(pid==0)
{a++;
b--;

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 12


printf("The value of a and b are %d \t %d\n",a,b);
_exit(0);
}
else
printf("The value of a and b are %d \t %d\n",a,b);
}
OUTPUT
cc vfork.c
./a.out
The value of a and b are 6 5
The value of a and b are 6 5

What happens after Vfork()


 Kernel suspends the execution of parent process
 Child process executes in parents address space
 When the child process calls _exit,parent will resume its
execution.
 NOTE:vfork() is unsafe to use,because the child process can
modify the data of the parent.This can cause unexpected
behaviour in the parent as it can cause parents stream files
being closed or can modify parents run time stack etc.

Modern Unix systems improved the efficiency of fork() by allowing


the parent and the child calls exec or _exit function.If either
the parent or the child modifies any data in the address
space,kernel creates new pages tht cover the virtual address space
modified.This mechanism is known as copy on write.

_exit()
A process can terminate normally in five ways:
 Executing a return from the main function.
 Calling the exit function.
 Calling the _exit or _Exit function.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 13


 In most UNIX system implementations, exit(3) is a function
in the standard C library, whereas _exit(2) is a system
call.
 Executing a return from the start routine of the last thread
in the process. When the last thread returns from its start
routine, the process exits with a termination status of 0.
 Calling the pthread_exit function from the last thread in
the process.
The three forms of abnormal termination are as follows:
Calling abort. This is a special case of the next item, as it
generates the SIGABRT signal.
When the process receives certain signals. Examples of signals
generated by the kernel include the process referencing a memory
location not within its address space or trying to divide by 0.
The last thread responds to a cancellation request. By default,
cancellation occurs in a deferred manner: one thread requests
that another be canceled, and sometime later, the target thread
terminates.

 The _exit system call terminates a process.


 This API will cause the calling process data,stack and the U-
Area to be deallocated and all the open file descriptors to
be closed.
 A process which has finished execution but has failed to
return the status to the parent,has its entry in the process
table intact and such a process is called a zombie process.
 A child process who is unable to return tha exit status due
to the termination of the parent is called an orphan
process.Such orphan processes exit status will be collected
by init,the ultimate parent process.
 The data in the process table will be retireved by the parent
by the wait or waitpid system call.
 SYNOPSIS

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 14


include <unistd.h>
void _exit(int status);

 The argument is the exit status code.


 Only the lower 8 bits are passed to the parent process.
 NOTE:_exit system never fails nor returns any value

wait,waitpid
 wait for process to change state-used by the parent to wait
for the child process to terminate or to retrieve the child
process exit status.

 SYNOPSIS
#include <sys/types.h>
#include <sys/wait.h>
pid_t wait(int *wstatus);
pid_t waitpid(pid_t pid, int *wstatus, int options);
 When a process terminates, either normally or abnormally, the
kernel notifies the parent by sending the SIGCHLD signal to
the parent. Because the termination of a child is an
asynchronous event - it can happen at any time while the
parent is running - this signal is the asynchronous
notification from the kernel to the parent.
 The parent can choose to ignore this signal, or it can
provide a function that is called when the signal occurs: a
signal handler.
 A process that calls wait or waitpid can:
◦ Block, if all of its children are still running
◦ Return immediately with the termination status of a child,
if a child has terminated and is waiting for its
termination status to be fetched
◦ Return immediately with an error, if it doesn't have any
child processes.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 15


 The wait() system call suspends execution of the calling
process until one of its children terminates.
 Both will collect the exit status code and PID upon
termination.
 The differences between these two functions are as follows.
 The wait function can block the caller until a child process
terminates, whereas waitpid has an option that prevents it
from blocking.
 The waitpid function doesn't wait for the child that
terminates first; it has a number of options that control
which process it waits for.
• If a child has already terminated and is a zombie,
wait returns immediately with that child's status.
Otherwise, it blocks the caller until a child
terminates.
• If the caller blocks and has multiple children,
wait returns when one terminates.
• The call wait(&wstatus) is equivalent to:
waitpid(-1, &wstatus, 0);
• The value of pid can be:

< -1 meaning wait for any child process whose


process group ID is equal to the absolute value
of pid.
-1 meaning wait for any child process.
0 meaning wait for any child process whose process
group ID is equal to that of the calling
process.
> 0 meaning wait for the child whose process ID
is equal to the value of pid.

The waitpid function can be done in blocking or non-blocking mode.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 16


These are specified via the options field

Macro Description
WIFEXITED (status) Returns a non-zero value if a
child was terminated via _exit
call and 0 otherwise.
WEXITSTATUS (status) Returns the low-order 8 bits of
the argument that the child
passed to exit , _exit ,or
_Exit.This should be called only
if WIFEXITED returns a non zero
value.
WIFSIGNALED (status) True if status was returned for
a child that terminated
abnormally, due to signal
interruption
WTERMSIG (status) Returns the signal number that
caused the termination.This
should be called only if
WIFSIGNALED returns a non zero
value.
WIFSTOPPED(status) Returns a non zero value if the
child is stopped due to job
control
WSTOPSIG(status) Returns the signal number that
had stopped the process.This
should only be called if
WIFSTOPPED returns a non zero
value

If the return value is a positive integer in wait or waitpid


then ,its the child PID.Otherwise it means that no child is
satisfied by the wait criteria or the function was interrupted by

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 17


a caught signal due to which an errno must be assigned.The errno
can take the following values

Errno value Meaning


EINTR Wait or waitpid returns as
system call was interrupted by a
signal
ECHILD For wait it means that calling
process has no unwaited children
whereas for waitpid it means
either the child pid value is
illegal or process is in a state
which cannot be specified in the
options value
EFAULT The status argument points to an
illegal argument
EINVAL The options value is illegal

#include<stdio.h>
#include<sys/types.h>
#include<sys/wait.h>
#include<unistd.h>

int main()
{
int pid;
int status;
pid=fork();
if(pid<0)
printf("ERROR\n");
if(pid==0)
{
printf("child process created \n");
_exit(15);

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 18


}
else
{
printf("Parent process after fork with child process id as
%d\n",pid);
pid=waitpid(pid,&status,WUNTRACED);
}
if(WIFEXITED(status))
printf("child process %d exits:%d\n",pid,WEXITSTATUS(status));
else if(WIFSTOPPED(status))
printf("child process %d stopped by %d\n",pid,WSTOPSIG(status));
else if(WIFSIGNALED(status))
printf("child process %d killed by %d\n",pid,WTERMSIG(status));
else
printf("waitpidERROR\n");
_exit(0);
}
OUTPUT
Parent process after fork with child process id as 2709
child process created
child process 2709 exits:15
In the above program,the child exited with status as 15.

exec
When a process calls one of the exec functions, that process is
completely replaced by the new program, and the new program starts
executing at its main function. The process ID does not change
across an exec , because a new process is not created; exec merely
replaces the current process - its text, data, heap, and stack
segments - with a brand new program from disk.There are 6 exec
functions:
#include <unistd.h>

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 19


int execl(const char *pathname, const char *arg0,... /* (char *)0
*/ );
int execv(const char *pathname, char *const argv []);
int execle(const char *pathname, const char *arg0,... /*(char
*)0, char *const envp */ );
int execve(const char *pathname, char *const argv[], char *const
envp[]);
int execlp(const char *filename, const char *arg0, ... /
int execvp(const char *filename, char *const argv []);

All six return: -1 on error, no return on success.

The first difference in these functions is that the first four


take a pathname argument, whereas the last two take a filename
argument.

When a filename argument is specified

• If filename contains a slash, it is taken as a


pathname.

• Otherwise, the executable file is searched for in


the directories specified by the PATH environment
variable.

The next difference concerns the passing of the argument list ( l


stands for list and v stands for vector). The

functions execl , execlp , and execle require each of the command-


line arguments to the new program to be specified as separate
arguments. For the other three functions execv , execvp , and
execve ), we have to build an array of pointers to the arguments,
and the address of this array is the argument to these three
functions.

The final difference is the passing of the environment list to


the new program. The two functions whose names end in an e
( execle and execve ) allow us to pass a pointer to an array of

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 20


pointers to the environment strings. The other four functions,
however, use the environ variable in the calling process to copy
the existing environment for the new program.

If the exec call succeeds,the orginal process text,data and stack


segments are replaced by new segments for an exec ed
program.However the file descriptor remains unchanged.Only those
specified on close-on exec flags will be closed before new program
starts.The following attributes are changed when new process runs:

• Effective UID:changed if set-UID flag is set.

• Effective UID:changed if set-GID flag is set.

• Saved set-UID:changed if set-UID flag is set.

• Signal Handling:Signals that are set up to be caught in a


process are reset to accept their default actions when the
process exec ed a new program

program

#include<stdio.h>

#include<stdlib.h>

#include<unistd.h>

#include<errno.h>

#include<sys/types.h>

#include<sys/wait.h>

void sys(char *cmdstr)

int pid;

pid=fork();

if(pid==0)

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 21


execl("/bin/bash","bash","-c",cmdstr,NULL);

else

waitpid(pid,NULL,0);

int main(int argc,char *argv[])

int i;

for(i=1;i<argc;i++)

sys(argv[i]);

printf("\n");

exit(0);

PIPE

• The Pipe system call creates a communication channel between


two related process(between parent-child or siblings)

• It creates a pipe device file that serves as a temperory


buffer for a calling process to read and write data with
another process.

• Pipe device file has no name and is deallocated once all the
processes close their file descriptors refering the pipe.

SYNOPSIS

#include <unistd.h>

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 22


int pipe(int fifo[2]);

The fifo argument is an array of two integers that are assigned by


the Pipe API.,fifo[0] being file descriptor that process can use
to read whereas fifo[1] for write.

Data stored in a Pipe is accessed in a sequential FIFO manner.

A process cannot use lseek to perform random access in a pipe

The common method for IPC is to create a pipe

Parent and child process:parent calls pipe to create pipe,forks a


child and then uses fifo[0] and fifo[1] to communicate

Sibling child process:parent calls pipe to create pipe,forks two


or more process and siblings uses fifo[0] or fifo[1] for
communication.

Max size of the pipe buffer is limited to PIPE_BUF

The kernel uses blocking mechanism to synchronize the working of


read and write processes.if buffer is empty,reader will be blocked
and if buffer is full,writer will be blocked.

There is no limit on the number of process that can get attached


to the pipe,each process can write atmost PIPE_BUF bytes into the
buffer.

Pipe is used by the UNIX system to implement the | for connecting


the standard output of one process to the standard input of the
next process.

Return value :0 on success and -1 with errno on failure

Possible Errno value s are

Errno value Meaning


EFAULT The fifo argument is illegal
ENFILE The system tabe is full

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 23


Program

#include<sys/types.h>

#include<stdio.h>

#include<stdlib.h>

#include<string.h>

#include<sys/types.h>

#include<sys/wait.h>

#include<unistd.h>

int main( )

int pid;

int fifo[2],status;

char buf[80];

pid=fork( );

if(pid<0)

printf("ERROR\n");

if(pid==0)

{close(fifo[0]);

printf("child process with id %d under execution\n",getpid());

write(fifo[1],"USP",strlen("USP"));

close(fifo[1]);

exit(0);

close(fifo[1]);

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 24


while(read(fifo[0],buf,80))

printf("%s",buf);

close(fifo[0]);

if(waitpid(pid,&status,0)==pid && WIFEXITED(status))

return WEXITSTATUS(status);

return 3;

OUTPUT

child process with id 2279 under execution.

Process

Basics
A process is a program in execution. A process is said to be born
when the program starts execution and remains alive as long as the
program is active. After execution is complete, the process is
said to die.
The kernel is responsible for the management of the processes. It
determines the time and priorities that are allocated to processes
so that more than one process can share the CPU resources.
Just as files have attributes, so have processes. These attributes
are maintained by the kernel in a data structure known as process
table. Two important attributes of a process are:
1. The Process-Id (PID): Each process is uniquely identified by a
unique integer called the PID, that is allocated by the kernel
when the process is born. The PID can be used to control a
process.
2. The Parent PID (PPID): The PID of the parent is available as a
process attribute.
The Shell Process

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 25


As soon as you log in, a process is set up by the kernel. This
process represents the login shell, which can be either sh(Bourne
Shell), ksh(korn Shell), bash(Bourne Again Shell) or csh(C Shell).
Parents and Children
When you enter an external command at the prompt, the shell acts
as the parent process, which in turn starts the process
representing the command entered. Since every parent has a parent,
the ultimate ancestry of any process can be traced back to the
first process (PID 0) that is set up when the system is booted. It
is analogous to the root directory of the file system. A process
can have only one parent. However, a process can spawn multiple
child processes.
2. ps: Process Status
Because processes are so important to getting things done, UNIX
has several commands that enable you to examine processes and
modify their state. The most frequently used command is ps, which
prints out the process status for processes running on your
system. Each system has a slightly different version of the ps
command, but there are two main variants, the System V version
(POSIX) and the Berkeley version. The following table shows the
options available with ps command.POSIX

-f Full listing showing PPID of


each process
-u user U user Processes of user user
only
-a Processes of all users excluding
processes not
-l Long Listing
-term Processes
running
on
the

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 26


terminal term
information

Examples
$ ps
PID
TTY TIME CMD
4245 pts/7 00:00:00 bash
5314 pts/7 00:00:00 ps
The output shows the header specifying the PID, the terminal
(TTY), the cumulative
processor time (TIME) that has been consumed since the process was
started, and the process
name (CMD).
$ ps -f
UID PID PPID
C STIME TTY TIME COMMAND
root 14931 136 0 08:37:48 ttys0 0:00 rlogind
sartin 14932 14931 0 08:37:50 ttys0 0:00 -sh
sartin 15339 14932 7 16:32:29 ttys0 0:00 ps -f
The header includes the following information:
UID - Login name of the user
PID - Process ID
PPID - Parent process IDSTIME - Starting time of the process in
hours, minutes and seconds
TTY - Terminal ID number
TIME - Cumulative CPU time consumed by the process
CMD - The name of the command being executed
System processes (-e or -A)
Apart from the processes a user generates, a number of system
processes keep running all the time. To list them use,
$ ps -e

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 27


PID TTY TIME
CMD
0 ? 0:34 sched
1 ? 41:55 init
23274 Console 0:03 sh
272 ? 2:47 cron
7015 term/12 20:04 vi
3. Mechanism of Process Creation
There are three distinct phases in the creation of a process and
uses three important system calls viz., fork, exec, and wait. The
three phases are discussed below:
Fork: A process in UNIX is created with the fork system call,
which creates a copy of itself. PID.
Exec: The forked child overwrites its own image with the code and
data of the new program.
Wait: The parent then executes the wait system call to wait for
the child to complete. It picks up the exit status of the child
and continues with its other functions. Note that a parent need
not decide to wait for the child to terminate. To get a better
idea of this, let us explain with an example.

When you enter ls to look at the contents of your current working


directory, UNIX does a series of things to create an environment
for ls and the run it:
 The shell has UNIX perform a fork.
 This creates a new process that the shell will use to run the
ls program.
 The shell has UNIX perform an exec of the ls program.
 This replaces the shell program and data with the program
and data for ls and then starts running that new program.
 The ls program is loaded into the new process context,
replacing the text and data of the shell.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 28


 The ls program performs its task, listing the contents of
the current directory.
 In the meanwhile, the shell executes wait system call for ls
to complete.
 When a process is forked, the child has a different PID and
PPID from its parent.
 However, it inherits most of the attributes of the parent.

How the Shell is created?

When the system moves to multiuser mode, init forks and execs a
getty for every active
Each one of these getty’s prints the login prompt on the
respective terminal and then goes off to sleep. When a user tries
to log in, getty wakes up and fork-execs the login program to
verify login name and password entered. On successful login, login
for-execs the process representing the login shell. init goes off
to sleep, waiting for the children to terminate. The processes
getty and login overlay themselves.When the user logs out, it is
intimated to init, which then wakes up and spawns another getty
for that line to monitor the next login.
4. Internal and External Commands
From the process viewpoint, the shell recognizes three types of
commands:
1. External commands: Commonly used commands like cat, ls etc. The
shell creates a process for each of these commands while remaining
their parent.
2. Shell scripts: The shell executes these scripts by spawning
another shell, which then executes the commands listed in the
script. The child shell becomes the parent of the commands that
feature in the shell.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 29


3. Internal commands: When an internal command is entered, it is
directly executed by the shell. Similarly, variable assignment
like x=5, doesn’t generate a process either.
6. Running Jobs in Background
&: No Logging out
use the & symbol at the end of the command line to direct the
shell to execute the command in the background.
$ sort -o emp.dat emp.dat &
[1] 1413 The job’s PID
nohup: Log out Safely
A background job executed using & operator ceases to run when a
user logs out. This is because, when you logout, the shell is
killed and hence its children are also killed. The UNIX system
provides nohup statement which when prefixed to a command, permits
execution of the process even after the user has logged out. You
must use the & with it as well.
The syntax for the nohup command is as follows:
nohup command-string [input-file] output-file &

7. nice: Job Execution with Low PriorityProcesses in UNIX are


sequentially assigned resources for execution. The kernel assigns
the CPU to a process for a time slice; when the time elapses, the
process is places in a queue. How the execution is scheduled
depends on the priority assigned to the process.
nice values are system dependent and typically range from 1 to 19.
A high nice value implies a lower priority. A program with a high
nice number is friendly to other programs, other users and the
system; it is not an important job. The lower the nice number, the
more important a job is and the more resources it will take
without sharing them.
Example:
$ nice wc -l hugefile.txt
OR $ nice wc -l hugefile.txt &

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 30


We can specify the nice value explicitly with -n number option
where number is an offset to the default. If the -n number
argument is present, the priority is incremented by that amount up
to a limit of 20.
Example: $ nice -n 5 wc -l hugefile.txt &
9. Job Control
A job is a name given to a group of processes that is typically
created by piping a series of commands using pipeline character.
You can use job control facilities to manipulate jobs. You can use
job control facilities to,
1. Relegate a job to the background (bg)
2. Bring it back to the foreground (fg)
3. List the active jobs (jobs)
4. Suspend a foreground job ([Ctrl-z])
5. Kill a job (kill)
The following examples demonstrate the different job control
facilities. Assume a process is taking a long time. You can
suspend it by pressing [Ctrl-z].
[1] + Suspended wc -l hugefile.txt
A suspended job is not terminated. You can now relegate it to
background by, $ bgYou can start more jobs in the background any
time: $ sort employee.dat > sortedlist.dat &
[2] 530
$ grep ‘director’ emp.dat &
[3] 540
You can see a listing of these jobs using jobs command,
$ jobs
[3] + Running grep ‘director’ emp.dat &
[2] - Running sort employee.dat > sortedlist.dat &
[1] Suspended wc -l hugefile.txt
You can bring a job to foreground using fg %jobno OR fg %jobname
as,
$ fg %2 OR $ fg %sort

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 31


10. at And batch: Execute Later
at: One-Time Execution
To schedule one or more commands for a specified time, use the at
command. With this
command, you can specify a time, a date, or both.
For example,
$ at 14:23 Friday
at> lp /usr/sales/reports/*
The above job prints all files in the directory /usr/sales/reports
at 2:23 noon.
All at jobs go into a queue known as at queue.at shows the job
number, the date and time of
scheduled execution.
$ at 1 pm today
at> echo “^G^GLunch with Director at 1
PM^G^G”>
/dev/term/43
The above job will display the following message on your screen
(/dev/term/43) at 1:00 PM, along with two beeps(^G^G).Lunch with
Director at 1 PM
To see which jobs you scheduled with at, enter at -l. Working with
the preceding examples, you may see the following results:
job 756603300.a at Tue Sep 11 01:00:00 2007
job 756604200.a at Fri Sep 14 14:23:00 2007
The following forms show some of the keywords and operations
permissible with at command: at hh:mm Schedules job at the hour
(hh) and minute (mm) specified, using a 24-hour clock at hh:mm
month day year Schedules job at the hour (hh), minute (mm), month,
day, and year specified at -r job_id Cancels the job with the job
number matching job_id
batch: Execute in Batch Queue
The batch command lets the operating system decide an appropriate
time to run a process. When you schedule a job with batch, UNIX

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 32


starts and works on the process whenever the system load isn’t too
great.
To sort a collection of files, print the results enter the
following commands:
$ batch
sort /usr/sales/reports/* | lp

11. cron: Running jobs periodically


cron program is a daemon which is responsible for running
repetitive tasks on a regular schedule. It is a perfect tool for
running system administration tasks such as backup andsystem
logfile maintenance. It can also be useful for ordinary users to
schedule regular tasks including calendar reminders and report
generation.
Both at and batch schedule commands on a one-time basis. To
schedule commands or processes on a regular basis, you use the
cron (short for chronograph) program. You specify the times and
dates you want to run a command in crontab files. Times can be
specified in terms of minutes, hours, days of the month, months of
the year, or days of the week.
cron is listed in a shell script as one of the commands to run
during a system boot-up sequence. Individual users don’t have
permission to run cron directly.
If there’s nothing to do, cron “goes to sleep” and becomes
inactive; it “wakes up” every minute, however, to see if there are
commands to run.
cron looks for instructions to be performed in a control file
in /var/spool/cron/crontabs After executing them, it goes back to
sleep, only to wake up the next minute.
A typical entry in crontab file
A typical entry in the crontab file of a user will have the
following format.
minute hour day-of-month month-of-year day-of-week command

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 33


where, Time-Field Options are as follows:
Field Range
------------------------------------------------------------------
-----------------------------
minute 00 through 59 Number of minutes after the hour
hour 00 through 23 (midnight is 00)
day-of-month 01 through 31
month-of-year 01 through 12
day-of-week 01 through 07 (Monday is 01, Sunday is
07)---------------------------------------------------------------
--------------------------------
The first five fields are time option fields. You must specify all
five of these fields. Use an asterisk (*) in a field if you want
to ignore that field.
Examples:
00-10 17 * 3.6.9.12 5 find / -newer .last_time -print >backuplist
In the above entry, the find command will be executed every minute
in the first 10 minutes after 5 p.m. every Friday of the months
March, June, September and December of every year.
30 07 * * 01 sort /usr/wwr/sales/weekly |mail -s”Weekly Sales” srm
In the above entry, the sort command will be executed with
/usr/www/sales/weekly as argument and the output is mailed to a
user named srm at 7:30 a.m. each Monday.

Signals
Signals are software interrupts. Signals provide a way of handling
asynchronous events: a user at a terminal typing the interrupt key
to stop a program or the next program in a pipeline terminating
prematurely.
When a signal is sent to a process, it is pending on the process
to handle it. The process can react to pending signals in one of
three ways:

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 34


 Accept the default action of the signal, which for most
signals will terminate the process.
 Ignore the signal. The signal will be discarded and it has no
affect whatsoever on the recipient process.
 Invoke a user-defined function. The function is known as a
signal handler routine and the signal is said to be caught
when this function is called.
EXAMPLES

Name Description Default action

SIGABRT bbnormal termination ( abort )


terminate+core

SIGALRM timer expired ( alarm ) terminate

SIGCHLD change in status of ignore


child

SIGKILL termination terminate

SIGSTOP stop stop process

SIGCONT continue stopped continue/ignore


process

THE UNIX KERNEL SUPPORT OF SIGNALS

When a signal is generated for a process, the kernel will set the
corresponding signal flag in the process table slot of the
recipient process.
If the recipient process is asleep, the kernel will awaken the
process by scheduling it.
When the recipient process runs, the kernel will check the process
U-area that contains an array of signal handling specifications.
If array entry contains a zero value, the process will accept the
default action of the signal.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 35


If array entry contains a 1 value, the process will ignore the
signal and kernel will discard it.
If array entry contains any other value, it is used as the
function pointer for a user-defined signal handler routine.
KILL
A process can send a signal to a related process via the kill API.
This is a simple means of inter-process communication or control.
The function prototype of the API is:

#include<signal.h>
int kill(pid_t pid, int signal_num);

Returns: 0 on success, -1 on failure.


The signal_num argument is the integer value of a signal to be
sent to one or more processes designated by pid. The possible
values of pid and its use by the kill API are:
pid > 0: The signal is sent to the process whose process ID is
pid.
Pid == 0: The signal is sent to all processes whose process group
ID equals the process group ID of the sender and
for which the sender has permission to send the signal.
Pid < 0: The signal is sent to all processes whose process group
ID equals the absolute value of pid and for which the sender has
permission to send the signal.
Pid == 1:The signal is sent to all processes on the system for
which the sender has permission to send the signal.

To forcefully kill a process use the option -f

kill -f 19278
note:19278 is the pid of the process to be killed
SIMPLE FILTERS
Filters are the commands which accept data from standard input
,manipulate it and write the results to standard output.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 36


Eg:)Many of the unix commands like cat, grep, tee, sort, more,
head, tail, cut, and paste are examples of filters.
Filters are the central tools of the UNIX tool kit, and each
filter performs a simple function. Some commands use delimiter,
pipe (|) or colon (:). Many filters work well with delimited
fields, and some simply won’t work without them. The piping
mechanism allows the standard output of one filter serve as
standard input of another. The filters can read data from standard
input when used without a filename as argument, and from the file
otherwise.

The Simple Database

For explanation of the commands like cut,paste,grep and others


need a database file. Let us consider a file emp.lst

Each line of this file has six fields separated by five


delimiters.
The details of an employee are stored in one single line. This
text file designed in fixed format and containing a personnel
database. There are 15 lines, where each field is separated by the
delimiter |.

$ cat emp.lst
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
9876 | jai sharma | director | production | 12/03/50 | 7000
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300
1265 | s.n. dasgupta | manager | sales | 12/09/63 | 5600
4290 | jayant choudhury | executive | production | 07/09/50 | 6000

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 37


2476 | anil aggarwal | manager | sales | 01/05/59 | 5000
6521 | lalit chowdury | directir | marketing | 26/09/45 | 8200
3212 | shyam saksena | d.g.m. | accounts | 12/12/55 | 6000
3564 | sudhir agarwal | executive | personnel | 06/07/47 | 7500
2345 | j. b. sexena | g.m. | marketing | 12/03/45 | 8000
0110 | v.k.agrawal | g.m.| marketing | 31/12/40 | 9000

more:Paging output
more is a pager command offered by the UNIX system.It enables us
to view the huge file in terms of pages.
Eg:more richard.txt
OUTPUT
Stallman grew up in New York City and went to a local public
school. He was a math whiz and put in a class for bright children
where he learnt little bit more advanced math than what most
classes taught (but mostly learning on his own). He was impacted
early in his childhood by American history and American civics whe
re he learnt about the American Civil War which was fought to
abolish slavery. The civil rights movement in the US was hitting
its peak because after slavery was abolished, many states
practiced legal discrimination against black people. He recalls,
“At that time, they (black people) were campaigning to put an end
to that (legal discrimination) which was successful. Protests and,
in some cases, killing of non-violent activists by segregationists
and advocates of inequality were in news. Even though I wasn't
tremendously engaged at the age of 10 years, I got some of this.”

There weren't many computers then. Stallman remembers going to a


summer camp a couple of times in 1962-63 where he read a manual
for a programming language and got fascinated by it. He says, “I
read manuals and started writing programs down on paper. There was
no computer available as it was too expensive for a summer camp to
have.”

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 38


The first time he actually got to see a computer and do something
with it was in 1969 at the IBM New York Scientific Centre.
--More--(23%)(Percentage of file that has been viewed.
 We can use the internal commands f -to move forward,b to move
backwork and q to quit.We can even specify a repeat factor.
 man command uses more and less pager commands
 We can do pattern search by specifying /pat
 We can also specify multiple filenames as
more file1 file2 file3
which displys first file1 followed by file 2 and file3

wc:Line,Word and Character Counting


wc command counts the no of lines,words,and characters depending
upon the option used

eg:cat hello
how
r
u

$wc hello
OUTPUT
3 3 8 hello
Options

OPTION ACTION
-l Displays no of lines
-w Displays no of words
-c Displays the number of
characters

ls -l |wc -l //Displays the number of files


who|wc -l//Displays number of users

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 39


od:Displaying data in octal
Many files (especially executables)contain non-printing
characters,and most Unix commands wont display them .To display
all these non-printable characters we have to make use of the
command od.
Suppose content of file odfile is as follows

^H ^G ^L
On executing the command cat odfile,nothing will be displayed.To
print the non-printable characters,like ctrl H,Ctrl G and Ctrl L
seperated by Tab space as given in the above example,we have to
execute the comamnd
od odfile
OUTPUT
0000000 004410 004407 005014
0000006
-b option helps us to display value of each character seperately
od -b odfile
OUTPUT
0000000 010 011 007 011 014 012
0000006
To make sense out of the above output we generally combine wth the
-c option
od -bc odfile
OUTPUT
0000000 010 011 007 011 014 012
\b \t \a \t \f \n
0000006
Tab character-011
Enter[\n]-012
\f(Formfeed)-014

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 40


pr : paginating files

The pr command adds suitable headers, footers and formatted text.


pr adds five lines of margin at the top and bottom. The header
shows the date and time of last modification of the file along
with the filename and page number.
Let the contents of the file dept.lst be as follows

cat dept.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006

Now on execution of the command


$pr dept.lst

May 06 10:38 1997 dept.lst page 1

01:accounts:6213
02:progs:5423 These six lines were the
03:marketing:6521 in the file dept.lst
04:personnel:2365
05:production:9876
06:sales:1006

…blank lines…

pr options

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 41


The different options for pr command are:
OPTIONS ACTION
-k Prints k integer columns
-t To suppress header and
footer
-h To have a header of users
choice
-n Will number each line in
debugging
-o n Offsets the line by n spaces
and increases from left
margin
-l n Sets the page length to 54
lines
-head Displays the beginning of
the file

Explanation
-k prints k (integer) columns
Consider a c pgm pgm1.c
On executing cat pgm1.c

#include<stdio.h>
void main()
{
int i;
for(i=0;i<25;i++)
printf(“%d\n”,i);
}

To compile the program use the command $cc pgm1.c


To execute the command :. /a.out // The nos will be displayed in
25 lines

On executing the command :$pr – 5


It will display the numbers in 5 columns

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 42


2015 -03-12 16:15 page1
0 1 2 3 4
5 6 7 8 9
………………………………..
20 21 22 23 24

….blank lines………………..

-t to suppress the header and footer


On executing the –t option the header and footer will be
suppressed and we will be able to see only the content
Eg:On executing
./a.out | pr –t -5

0 5 10 15 20
1 6 11 16 21

4 9 14 19 24

-h to have a header of user’s choice


On executing the –h option the user can customize the header
along with the default information. On executing the command

$./a.out | pr -5 – h COPYWRITE

2015 -03-12 16:15 COPYWRITE page1


0 5 10 15 20
1 6 11 16 21
………………………………..
4 9 14 19 24
….blank lines………………..

-d double spaces input

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 43


On executing the command
$./a.out|pr –d –t – 5

0 5 10 15 20
1 6 11 16 21
………………………………..
4 9 14 19 24

-n will number each line and helps in debugging


On executing the command
$./a.out|pr –t – 5 –n

1 0 6 5 11 10 16 15 21 20
2 1 7 6 12 11 17 16 22 21
………………………………..
5 4 10 9 15 14 20 19 25 24

-o n offsets the lines by n spaces and increases left margin of


page

0 5 10 15 20
1 6 11 16 21
………………………………..
4 9 14 19 24

10 space from left margin

pr +10 chap01
starts printing from page 10
2015 -03-12 16:15 page10

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 44


pr -l 54 chap01
this option sets the page length to 54

cmp:Comparing two files


cmp command can be used to know whether two files are identical.
Let the contents of hello be
how
r
u
and let the contents of hello1 be
how
r
you

when we execute the command cmp hello hello1


OUTPUT
hello hello1 differ: byte 7, line 3
The two files are compared byte by byte and the location of the
first mismatch is echoed on the screen and doesnt bother about
multiple mismatches.
When used with -l option,it gives a detailed list of byte number
and the differing bytes in octal for each character that differs
in both the files.
cmp -l hello hello1

OUTPUT
7 165 171
8 12 157
cmp: EOF on hello

diff:Converting one file to another

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 45


diff is used to display file differences
Unlike cmp it also tells which lines in one file need to be
changed to make two files identical
diff hello hello1
OUTPUT
3c3
< u
---
> you

3C3 changes line 3 with one line,which remains line 3 after the
change

comm:what is common?
comm commands not only identifies the difference,but also displays
all the commonalities with respect to files

comm hello hello1


OUTPUT
how
r
u
you

First column displays what is unique to 1st file,Second column


displays what is unique to 2nd file and 3rd column the common ones

Options
comm -3 hello hello1//Displays lines not common
OUTPUT
u
you

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 46


comm -13 hello hello1//Displays lines present only in 2nd file
OUTPUT
you

nl command:Line Numbering
nl command has elaborate schemes for numbering lines.
Nl numbers only logical lines,those containing something apart
from \n characters
eg:nl hello
OUTPUT
1 how
2 r
3 u

head – displaying the beginning of the file


The command displays the top of the file. It displays the first 10
lines of the file,
when used without an option.
head emp.lst

-n to specify a line count


head -n 3 emp.lst
will display the first three lines of the file.

Vi `ls –t|head –n 1`
Opens the last edited file ven if we don’t recall the name.

tail – displaying the end of a file


This command is the head counterpart which displays the end of the
file. It displays the last 10 lines of the file, by default when
used without an option.
tail emp.lst

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 47


-n to specify a line count
tail -n 3 emp.lst
displays the last three lines of the file. We can also address
lines from the beginning of the file instead of the end. The
+count option allows to do that, where count represents the line
number from where the selection should begin.

tail +11 emp.lst


Will display 11th line onwards
In Linux we have to give the option tail –n +11 emp.lst for
displaying from the 11th line onwards.
Different options for tail are:

 Monitoring the file growth (-f)


Use tail –f when we are running a program that continuously writes
to a file, and we want to see how the file is growing. We have to
terminate this command with the interrupt key.
Eg: tail –f /oracle/app/oracle/product/8.1/orainst/install.log//
watches the growth of log file install.log
 Extracting bytes rather than lines (-c)
POSIX requires tail to support –c option followed by a
positive or negative integer depending upon we want the
extraction from the beginning or from the end.
tail –c 512 file1 //copies last 512 bytes from file1
tail –c +512 file1 //copies everything after skipping
511 bytes

cut – slitting a file vertically


cut is a powerful text manipulator and often used in combination
with other commands or filters. It is used for slitting the file
vertically.
Before executing this command lets make a copy of first five lines
of emp.lst to shortlist using the command

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 48


head -n 5 emp.lst | tee shortlist
will select the first five lines of emp.lst and saves it to
shortlist. We can cut by using -c option with a list of column
numbers, delimited by a comma (cutting columns).
Eg:cut -c 6-22,24-32 shortlist
cut -c -3,6-22,28-34,55- shortlist
The expression 55- indicates column number 55 to end of line.
Similarly, -3 is the same as 1-3.
Most files don’t contain fixed length lines, so we have to cut
fields rather than columns (cutting fields).
Cutting fields(-f)
The –c option is used for cutting files of fixed length and not
useful for files which don’t have fixed length. For such files we
can use the field option.
Cut uses the tab as the default field limiter, but can also for
with other delimiters like –d and -f.
-d for the field delimiter
-f for the field list
Example:
cut -d \ | -f 2,3 shortlist | tee cutlist1
will check for the delimiter “|” in the file and will display the
second and third columns of shortlist and saves the output in
cutlist1. here | is escaped to prevent it as pipeline character.
To print the fields 1,4,5,6, we have
cut –d “|” -f 1,4- shortlist > cutlist2
Extracting User List from the who output
Cut can be used to extract the first word of a line by specifying
the space as the delimiter.
$who |cut –d “ “ –f1
Root
Kumar
Cse
Sharma

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 49


Note: when using cut command one of the options –f and –c must be
specified. one of these is compulsory.
paste – pasting files
When we cut with cut, it can be pasted back with the paste
command, but vertically rather than horizontally. We can view two
files side by side by pasting them.
paste cutlist1 cutlist2
it pastes the contents of the file vertically.
We can specify one or more delimiters with -d
paste -d “|” cutlist1 cutlist2
Where each field will be separated by the delimiter |. Even though
paste uses at least two files for concatenating lines, the data
for one file can be supplied through the standard input.
Eg: cut –d\| -f 1,4- shortlist|paste –d “|” cutlist1 -// pastes
after thecutlist2 contents

 Cutlist //pastes at the beginning of


cutlist2 contents

Joining lines (-s)


Let us consider that the file address book contains the details of
three persons
$cat addressbook
Anupkumar
[email protected]
24569083
Vinodsharma
[email protected]
123456789
Suppose we execute the following command
paste -s addressbook
The ouput will be

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 50


Anupkumar [email protected] 24569083 Vinodsharma
[email protected] 123456789 //in a single line

paste -s -d ”| | \n” addressbook

Anupkumar|[email protected]|24569083
Vinodsharma|[email protected]|123456789
-are used in a circular manner ie The first and the second line
are joined with a | ,second and third using | and the third and
fourth separated by “\n”.

sort : ordering a file


Sorting is the ordering of data in ascending or descending
sequence. The sort command orders a file and by default, the
entire line is sorted. We can sort the files based on a specific
field too. By default sort reorders lines in ASCII collating
sequence –whitespace first, then numerals, uppercase letters and
finally lowercase letters.
$sort shortlist
This default sorting sequence can be altered by using certain
options. We can also sort one or more keys (fields) or use a
different ordering rule.
sort options
The important sort options are:
 Sorting based on primary key(- k)
Example :$ sort -t "|" -k 1 shortlist
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 51


9876 | jai sharma | director | production | 12/03/50 | 7000

Note that field 1 empid is sorted


To sort in the reverse order(-r)
$sort -t "|" -k 1 -r shortlist
9876 | jai sharma | director | production | 12/03/50 | 7000
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
Note:The empid is sorted in the descending order
The above command can be given as $sort -t "|" -k 1r shortlist
also.
 Sorting on secondary key
When we execute the command sort –t”|” –k 3 shortlist
Now observe that there are 3 director entries
If we want to sort them again based on a secondary key say field
1 we have to give the command as
$sort –t “|” –k 3,3 –k 1,1 shortlist
OUTPUT
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
9876 | jai sharma | director | production | 12/03/50 | 7000
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300

NOTE: The entries with director are again sorted based on field 1
empID
 Sorting on columns

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 52


If we want to specify a character position within a field to be
the beginning of sort, say for example to sort based on year of
birth
Sort –t “|” –k 5.7,5.8 sortlist// 5th field 7th and 8th column
OUTPUT
1006 | chanchal singhvi | director | sales |03/09/38 | 6700
5678 | sumit chakrobarty | d.g.m. | marketing |19/04/43 | 6000

2365 | barun sengupta | director | personnel |11/05/47 | 7800


9876 | jai sharma | director | production |12/03/50 | 7000
2233 | a.k.shukla | g.m | sales |12/12/52 | 6000
5423 | n.k.gupta | chairman | admin |30/08/56 | 5400
6213 | karuna ganguly | g.m. | accounts |05/06/62 | 6300

 Numeric sort
Suppose we have a file named numlist which contains only
numbers.When we sort the file based on numbers we get curious
result.It displays not in a sorted way..Thats because it
sorts based on ASCII collating sequence.To override and get
it in the sorted form use the option –n
$ sort –n numlist
Will give the numbers in sorted form

 Removing repeated lines


cut -d "|" -f 3 emp.lst |sort -u |tee desig.lst
OUTPUT:-
chairman
d.g.m.
director //only once printed
executive
g.m.//only once printed
manager
 Other sort options

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 53


Summary of sort options
Option Description
-tchar Uses the delimiter character
to identify the field
-k n Sorts on the nth field
-k m,n Starts sort on the mth field
and ends on the nth field
-k m.n Starts sort on the nth
column of the mth field
-u Removes repeated lines
-n Sorts numerically
-r Reverses sort order
-f Folds lowercase to
equivalent uppercase
-m list Merges sorted files in list
-c Checks is the file is sorted
-o filename Places the sorted output in
the filename

uniq command – locate repeated and nonrepeated lines


When we concatenate or merge files, we will face the problem of
duplicate entries creeping in. we saw how sort removes them with
the –u option. UNIX offers a special tool to handle these lines –
the uniq command. Consider a sorted dept.lst that includes
repeated lines:
cat dept.lst
displays all lines with duplicates. Where as,
uniq dept.lst
simply fetches one copy of each line and writes it to the standard
output. Since uniq requires a sorted file as input, the general
procedure is to sort a file and pipe its output to
uniq. The following pipeline also produces the same output, except
that the output is
saved in a file:
sort dept.lst | uniq – uniqlist

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 54


Different uniq options are :
Selecting the nonrepeated lines (-u)
cut –d “|” –f3 emp.lst | sort | uniq –u
OUTPUT:
Chairman
Selecting the duplicate lines (-d)
Selects only one copy of the repeated lines
cut –d “|” –f3 emp.lst | sort | uniq –d
OUTPUT
d.g.m.
director
executive
g.m.
manager
Counting frequency of occurrence (-c)
cut –d “|” –f3 emp.lst | sort | uniq –c
OUTPUT
1 chairman
2 d.g.m.
4 director
2 executive
4 g.m.
2 manager
tr command – translating characters
The tr filter manipulates the individual characters in a line. It
translates characters using one or two compact expressions.
Syntax:
tr options expn1 expn2 standard input

It takes input only from standard input, it doesn’t take a


filename as argument. By default, it translates each character in
expression1 to its mapped counterpart in expression2. The first

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 55


character in the first expression is replaced with the first
character
in the second expression, and similarly for the other characters.
tr ‘|/’ ‘~-’ < emp.lst | head –n 3
Here the | symbol in the first expression will be replaced with ~
and the / symbol in the first expression will be replaced with -
ie
exp1=‘|/’ ; exp2=‘~-’
tr “$exp1” “$exp2” < emp.lst

NOTE: The lengths of the two expressions should be equal. If they


are not the longer expression will have unmapped characters(not in
linux)
Like wild cards tr also accepts a range in the expressions.
RULES
The character on the right of – must have an ASCII value higher
than that of the character on the left
Escaping rules also should be followed to get rid of special
meaning of the symbol ‘
Eg:[\[

Changing case of text is possible from lower to upper case using


tr
head –n 3 emp.lst | tr ‘[a-z]’ ‘[A-Z]’//changes the first three
lines of the document emp.lst to uppercase
OUTPUT
2233 | A.K.SHUKLA |G.M.| SALES | 12/12/52 | 6000
9876 | JAI SHARMA | DIRECTOR | PRODUCTION | 12/03/50 | 7000
5678 | SUMIT CHAKROBARTY | D.G.M. | MARKETING | 19/04/43 | 6000
 tr options
 Deleting characters (-d)
The –d option can be used to delete the pattern matching
characters.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 56


Eg:in the date filed we have dd/mm/yy
Suppose we want to just convert it to ddmmyy, then it is better to
delete the character / and this can be done with tr command.
tr –d ‘|/’ < emp.lst | head –n 3// deletes off the | and /
characters from the first three lines of emp.lst
OUTPUT:
2233 a.k.shukla g.m. sales 121252 6000
9876 jai sharma director production 120350 7000
5678 sumit chakrobarty d.g.m. marketing 190443 6000

 Compressing multiple consecutive characters (-s)


In unix,we might have to work with non-fixed length line data more
frequently, so rather than the option of –c we will be using the
field option and for this we require delimiters.To remove off all
redundant spaces we can use –s option to remove multiple
occurrences of the character to a single one.
Eg:tr –s ‘ ‘ < emp.lst | head –n 3
 Complementing values of expression (-c)
Complements the characters of the expression
Eg:tr –cd ‘|/’ < emp.lst// deletes all characters except | and /
OUTPUT
||||//|||||//|||||//|||||//|||||//|||||//|||||//|||||//|||||//||||
|//|||||//|||||//|||||//|||||//|||||//|$
NOTE:Even the new line was not spared from deletion

 Using ASCII octal values and escape sequences


In order to print the non-printable characters in an
expression we can use this option
Eg:To replace | with the line feed character \n(octal value
012)we can use the command

tr ‘|’ ‘\012’ < emp.lst | head – 6


OUTPUT

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 57


2233
a.k.shukla
g.m.
sales
12/12/52
6000
FILTERS USING REGULAR EXPRESSIONS – grep and sed

We often need to search a file for a pattern, and do searches


like
 to see the lines containing (or not containing) it
 or to have it replaced with something else.
This chapter discusses two important filters that are specially
suited for these tasks – grep and sed.
grep – searching for a pattern

grep command scans its input for a pattern and displays lines
containing the pattern, the line numbers or filenames where the
pattern occurs. It is a principal member of a special family in
UNIX for handling search requirements.

Syntax : grep options pattern filename(s)

Example : grep “sales” emp.lst

will display lines containing “sales” from the file emp.lst.

2233 | a.k.shukla |g.m.| sales|12/12/52| 6000


1006 | chanchal singhvi |director| sales |03/09/38| 6700
1265 | s.n. dasgupta |manager| sales |12/09/63| 5600
2476 | anil aggarwal |manager| sales |01/05/59|5000

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 58


Patterns with and without quotes is possible. It’s generally safe
to quote the pattern. Quote is mandatory when pattern involves
more than one word. It returns the prompt in case the pattern
can’t be located.

Example :grep president emp.lst//NO output if no pattern match

When grep is used with multiple filenames, it displays the


filenames along with the output.

Example:
$ grep director emp.lst shortlist
OUTPUT

emp.lst:9876 | jai sharma |director|production|12/03/50| 7000


emp.lst:2365 | barun sengupta |director| personnel |11/05/47| 7800
emp.lst:1006 | chanchal singhvi |director| sales |03/09/38| 6700
emp.lst:6521 | lalit chowdury |director| marketing |26/09/45| 8200
shortlist:9876 | jai sharma |director|production|12/03/50| 7000
shortlist:2365 | barun sengupta |director| personnel |11/05/47|
7800
shortlist:1006 | chanchal singhvi |director| sales |03/09/38| 6700

Where it shows filename followed by the contents.


$grep 'jai sharma' emp.lst
NOTE: Always use ‘ ‘ or “ “ while searching for a pattern that has
space between them.

grep options

grep is one of the most important UNIX commands, and we must


know the options that POSIX requires grep to support. Linux
supports all of these options.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 59


OPTION ACTION
-i ignores case for matching

-v doesn’t display lines matching expression


-n displays line numbers along with lines
-c displays count of number of occurrences
-l displays list of filenames only
-e exp specifies expression with this option
-x matches pattern with entire line
-f file takes pattrens from file, one per line

Explanation with Examples


- i option
When we are looking for a pattern,but not sure of the case use the
–i option
Example:
grep -i ‘agarwal’ emp.lst
OUTPUT:
3564 | sudhir agarwal |executive| personnel |06/07/47| 7500

-v option
-v(inverse option) selects all lines except those containing the
pattern
Example:
grep -v ‘director’ emp.lst
OUTPUT:
2233 | a.k.shukla |g.m.| sales|12/12/52| 6000
5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43| 6000
5423 | n.k.gupta |chairman| admin |30/08/56| 5400
6213 | karuna ganguly |g.m.| accounts |05/06/62| 6300
1265 | s.n. dasgupta |manager| sales |12/09/63| 5600
4290 | jayant choudhury |executive| production|07/09/50| 6000

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 60


2476 | anil aggarwal |manager| sales |01/05/59|5000
3212 | shyam saksena |d.g.m.| accounts |12/12/55|6000
3564 | sudhir agarwal |executive| personnel |06/07/47| 7500
2345 | j. b. sexena |g.m.| marketing |12/03/45|8000
0110 | v.k.agrawal |g.m.| marketing |31/12/40|9000

 -n option:to display the line numbers containing the pattern


along with the lines

OUTPUT:
grep -n marketing emp.lst
3:5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43| 6000
11:6521 | lalit chowdury |director| marketing |26/09/45| 8200
14:2345 | j. b. sexena |g.m.| marketing |12/03/45|8000
15:0110 | v.k.agrawal |g.m.| marketing |31/12/40|9000
3,11,14,AND 15 are the lines numbers separated from the actual
line by a : where the pattern occurs.

 -c option :To count the number of lines containing the


pattern

grep –c ‘director’ emp.lst

OUTPUT:
4 //we have four occurrences of directors in
emp.lst

If we use the command with multiple files


Example: grep –c ‘director’ emp.lst shortlist
OUTPUT
emp.lst:4
shortlist:3

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 61


grep can also read from the standard input stream
ls|grep -c -i usp

OUTPUT
7

 Displaying names of files containing the pattern –l option


grep with –l option will print filenames containing the pattern
Example:

$grep –l ‘manager’ *.lst

OUTPUT
desig.lst
emp.lst
tempemp.lst

 To match multiple patterns we use

grep –e ‘Agarwal’ –e ‘aggarwal’ –e ‘agrawal’ emp.lst

will print matching multiple patterns

We can keep all the patterns in a seperate file ,one pattern per
line.Then we can find the pattern using the -f options with the
filename where the pattern has to be searched

grep –f pattern.lst emp.lst

all the above three patterns are stored in a separate file


pattern.lst
NOTE: contents of pattern.lst will be

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 62


Agarwal
aggarwal
agarwal

Basic Regular Expressions (BRE) – An Introduction

It is tedious to specify each pattern separately with the -e


option. grep uses an expression of a different type to match a
group of similar patterns.
The term regular expression comes from theoretical computer
science.It can be defined as a language for specifying patterns
that match a sequence of characters.The patterns can be made up of
any one of the following

1. Normal characters that match exactly the same character


in the input
2. Character classes that matches any single character in
the class
Eg:[a-z]: can match any lower case character from a-z
3. Certain other special characters that specify the way in
which parts of an expression are to be matched against
the input
POSIX identifies regular expressions as belonging to two
categories-basic and extended. grep supports basic regular
expressions(BRE)by default and extended regular expression(ERE)
with the -E option.
NOTE1: sed supports only BRE set
NOTE2: Regular Expressions are interpreted by the command and not
by the shell.Quoting ensures that the shell isnt able to interfere
and interpret the metacharacters in its own way

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 63


BRE character subset

The basic regular expression character subset uses an


elaborate meta character set, overshadowing the shell’s wild-
cards, and can perform amazing matches.

Symbols or Matches
Expression
* Zero or more occurrences
g* nothing or g, gg, ggg, etc.
. A single character
.* nothing or any number of characters
[pqr] a single character p, q or r
[c1-c2] a single character within the ASCII range
represented by c1 and c2
[1-3] A digit between 1 and 3
[^pqr] A single character which is not p,q,or r
[^a-zA-Z] A non alphabetic character
^pat Pattern pat at the beginning of the line
Pat$ Pattern pat at the end of the line

Bash$ Bash at the end of the line


^bash$ Bash as the only word in line
^$ Lines containing nothing

Meta characters and their meaning


 The caret or circumflex character(^)
This metacharacter can be used to search or extract lines or
records that begin with a specific pattern
Eg:if we want to retrieve numbers starting with ‘2’ in
emp.lst

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 64


Case1:grep ^2 emp.lst//Note the use inside the character
class

OUTPUT
2233 | a.k.shukla |g.m.| sales|12/12/52| 6000
2365 | barun sengupta |director| personnel |11/05/47| 7800
2476 | anil aggarwal |manager| sales |01/05/59|5000
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000

Case 2:grep ^[^2] emp.lst//Negates the value given in the


character class
grep ^[^2] emp.lst

OUTPUT

9876 | jai sharma |director|production|12/03/50| 7000


5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43| 6000
5423 | n.k.gupta |chairman| admin |30/08/56| 5400
1006 | chanchal singhvi |director| sales |03/09/38| 6700
6213 | karuna ganguly |g.m.| accounts |05/06/62| 6300
1265 | s.n. dasgupta |manager| sales |12/09/63| 5600
4290 | jayant choudhury |executive| production|07/09/50| 6000
6521 | lalit chowdury |director| marketing |26/09/45| 8200
3212 | shyam saksena |d.g.m.| accounts |12/12/55|6000
3564 | sudhir agarwal |executive| personnel |06/07/47| 7500
0110 | v.k.Agarwal |g.m.| marketing |31/12/40|9000

case 3:of the form a^b//matches a or b


grep [2^4] emp.lst

2233 | a.k.shukla |g.m.| sales|12/12/52| 6000


9876 | jai sharma |director|production|12/03/50| 7000

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 65


5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43| 6000
2365 | barun sengupta |director| personnel |11/05/47| 7800
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000
0110 | v.k.Agarwal |g.m.| marketing |31/12/40|9000

 The Dollar character ($)


This metacharacter is used to search and extract lines or
records that ends with a specific pattern.

Example1:grep “7…$” emp.lst


This command prints all the details of employees whose salary lies
between 7000 and 7999
9876 | jai sharma |director|production|12/03/50| 7000
2365 | barun sengupta |director| personnel |11/05/47| 7800
3564 | sudhir agarwal |executive| personnel |06/07/47| 7500

Example2:Suppose the file contents of a file named “pattern” be

veeeent
Murthy
bdhsgsdhhd Murthy
dhgd
bbnsbns Murthy ndjhdjk

On execution of the command: grep "Murthy$" pattern


OUTPUT
Murthy
bdhsgsdhhd Murthy
NOTE:Only the lines which ends with Murthy are displayed

The Dot

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 66


A .(dot) matches a single character except a new line
character.The command
Eg:grep 2… emp.lst
matches for all ‘2 s ‘

A dot matches a single character. The shell uses ? Character to


indicate that.
Example: grep ^2... emp.lst

OUTPUT
2233 | a.k.shukla |g.m.| sales|12/12/52| 6000
2365 | barun sengupta |director| personnel |11/05/47| 7800
2476 | anil aggarwal |manager| sales |01/05/59|5000
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000

$grep "Murth." pattern


Murthy
bdhsgsdhhd Murthy
bbnsbns Murthy ndjhdjk

The asterisk(*)
The asterisk is used to match multiple characters .It (*)refers
indicates zero or more occurrences of the previous character.
Eg: g*implies - nothing or g, gg, ggg, etc.

grep “[aA]gg*[ar][ar]wal” emp.lst

NOTE:[aA]-indicates it can start with a or A


:gg* -indicates one g followed by zero or more g’s
[ar] –expression following single g
[ar]-expression following zero or more occurrences of g
wal-expression following [ar]

.* - signifies any number of characters or none

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 67


Consider we want to look for j. saxena , but not sure of what
preceds saxena we can use .*.
Eg: grep “j.*saxena” emp.lst

OUTPUT
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000

The character class

grep supports basic regular expressions (BRE) by default and


extended regular expressions (ERE) with the –E option. A regular
expression allows a group of characters enclosed within a pair of
[ ], in which the match is performed for a single character in the
group.

grep “[aA]gg*[ar][ar]wal” emp.lst


OUTPUT
2476 | anil aggarwal |manager| sales |01/05/59|5000
3564 | sudhir agarwal |executive| personnel |06/07/47| 7500
0110 | v.k.Agarwal |g.m.| marketing |31/12/40|9000

NOTE:[aA]-indicates it can start with a or A


:gg* -indicates one g followed by zero or more g’s
[ar] –expression following single g
[ar]-expression following zero or more occurrences of g
wal-expression following [ar]

 The pattern [a-zA-Z0-9] matches a single alphanumeric


character.
 When we use range, make sure that the character on the left
of the hyphen has a lower ASCII value than the one on the
right.

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 68


 Negating a class (^) (caret) can be used to negate the
character class.
 When the character class begins with this character, all
characters other than the ones grouped in the class are
matched.

Specifying Pattern Locations (^ and $)

Most of the regular expression characters are used for


matching patterns, but there are two that can match a pattern at
the beginning or end of a line. Anchoring a pattern is often
necessary when it can occur in more than one place in a line, and
we are interested in its occurance only at a particular location.
There are two characters which can achieve this .They are:

^ (Caret) for matching at the beginning of a line


$(Dollar) for matching at the end of a line

Eg:grep "^2" emp.lst


OUTPUT:

2233 | a.k.shukla |g.m.| sales|12/12/52| 6000


2365 | barun sengupta |director| personnel |11/05/47| 7800
2476 | anil aggarwal |manager| sales |01/05/59|5000
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000

Example3:grep “^[^2]” emp.lst


This command displays all the emp-ids whose id does not start with
2 symbol
OUTPUT:

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 69


9876 | jai sharma |director|production|12/03/50| 7000
5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43| 6000
5423 | n.k.gupta |chairman| admin |30/08/56| 5400
1006 | chanchal singhvi |director| sales |03/09/38| 6700
6213 | karuna ganguly |g.m.| accounts |05/06/62| 6300
1265 | s.n. dasgupta |manager| sales |12/09/63| 5600
4290 | jayant choudhury |executive| production|07/09/50| 6000
6521 | lalit chowdury |director| marketing |26/09/45| 8200
3212 | shyam saksena |d.g.m.| accounts |12/12/55|6000
3564 | sudhir agarwal |executive| personnel |06/07/47| 7500
0110 | v.k.Agarwal |g.m.| marketing |31/12/40|9000

Example 4:To display only directories


ls –l|grep “^d”

OUTPUT

drwxrwxrwx 2 hog hog 4096 Mar 19 10:41 dir1


drwxr-xr-x 2 hog hog 4096 Mar 6 07:23 links
drwxrwxr-x 3 hog hog 4096 Mar 25 17:08 USPCLASS
drwxrwxr-x 2 hog hog 4096 Mar 25 17:01 USPCLASS1

SUMMARY OF THE USE OF CARET CHARACTER


Case 1:
grep ^[2...] emp.lst//^ at the beginning of character class
OUTPUT
2233 | a.k.shukla |g.m.| sales|12/12/52| 6000
2365 | barun sengupta |director| personnel |11/05/47| 7800
2476 | anil aggarwal |manager| sales |01/05/59|5000
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000

Case 2:^ character inside the character classnegation


Displays all except starting with 2

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 70


grep [^2...] emp.lst

emp.lst:9876 | jai sharma |director|production|12/03/50| 7000


emp.lst:5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43|
6000
emp.lst:2365 | barun sengupta |director| personnel |11/05/47| 7800
emp.lst:5423 | n.k.gupta |chairman| admin |30/08/56| 5400
emp.lst:1006 | chanchal singhvi |director| sales |03/09/38| 6700
emp.lst:1265 | s.n. dasgupta |manager| sales |12/09/63| 5600
emp.lst:4290 | jayant choudhury |executive| production|07/09/50|
6000
emp.lst:6521 | lalit chowdury |director| marketing |26/09/45| 8200
emp.lst:3212 | shyam saksena |d.g.m.| accounts |12/12/55|6000
emp.lst:3564 | sudhir agarwal |executive| personnel |06/07/47|
7500

When meta characters lose their meaning

It is possible that some of these special characters actually


exist as part of the text. Sometimes, we need to escape these
characters. For example, when looking for a pattern g*, we have to
use \
To look for [, we use \[
To look for .*, we use \.\*

Extended Regular Expression (ERE) and grep

grep supports extended regular expression also.Some of the


metacharacters that we use are
+ matches one or more occurrences
of the previous character

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 71


? Matches zero or one occurrence
of the previous character
b+ Matches b, bb, bbb, etc.
b? Matches either a single instance
of b or nothing

These characters restrict the scope of match as compared to the *

To run the command which uses ERE, we should use –E option.If


current version of grep doesn’t support ERE, then use egrep but
without the –E option. -E option treats pattern as an ERE.

Eg: grep –E “[aA]gg?arwal” emp.lst

OUTPUT:
2476 | anil aggarwal |manager|sales |01/05/59|5000
3564 | sudhir agarwal |executive| personnel |06/07/47|7500
0110 | v.k.Agarwal |g.m.|marketing |31/12/40|9000

Eg2:)egrep [aA]gg+arwal emp.lst


2476 | anil aggarwal |manager|sales |01/05/59|5000

The ERE set

ch+ matches one or more occurrences of character ch


ch? Matches zero or one occurrence of character ch
exp1|exp2 matches exp1 or exp2
(x1|x2)x3 matches x1x3 or x2x3

Matching multiple patterns (|, ( and ))

Example:
grep -E 'sengupta|dasgupta' emp.lst

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 72


OUTPUT:

2365 | barun sengupta |director| personnel |11/05/47|7800


1265 | s.n. dasgupta |manager|sales |12/09/63|5600

We can locate both without using –e option twice, or we can give


the expression in this way also.

grep –E ‘(sen|das)gupta’ emp.lst


OUTPUT

2365 | barun sengupta |director| personnel |11/05/47|7800


1265 | s.n. dasgupta |manager|sales |12/09/63|5600

PREPARED BY UMA.N ,ASST,PROF,NHCE ACY 2019-20 73

You might also like