Grep - Searching For A Pattern, Grep Options, Regular Expressions, Egrep and Fgrep
Grep - Searching For A Pattern, Grep Options, Regular Expressions, Egrep and Fgrep
Unix Processes
PROCESS APIS
fork() and vfork()
NAME
fork - create a child process
SYNOPSIS
#include <unistd.h>
pid_t fork(void);
DESCRIPTION
The new process created by fork is called the child process.
This function is called once but returns twice.
The only difference in the returns is that the return value
in the child is 0, whereas the return value in the parent is
the process ID of the new child.
The reason the child's process ID is returned to the parent
is that a process can have more than one child, and there is
no function that allows a process to obtain the process IDs
of its children.
The reason fork returns 0 to the child is that a process can
have only a single parent, and the child can always call
getppid() to obtain the process ID of its parent. (Process ID
0 is reserved for use by the kernel, so it's not possible for
0 to be the process ID of a child.)
Both the child and the parent continue executing with the
instruction that follows the call to fork .
The child is a copy of the parent.
For example, the child gets a copy of the parent's data
space, heap, and stack.
Note that this is a copy for the child; the parent and the
child do not share these portions of memory.
Both the child and the parent continue executing with the
instruction that follows the fork () system call.
Example programs:
Program 1
/* Program to demonstrate fork function Program name – fork1.c */
#include<sys/types.h>
#include<unistd.h>
int main( )
{
fork( );
printf(“\n hello USP”);
}
Output :
$ cc fork1.c
$ ./a.out
Output :
$ cc fork4.c
$ ./a.out
The process id of the parent is 2393
The process id of the child is 3079
Program5-fork5.c
NAME
vfork - create a child process and block parent
SYNOPSIS
#include <sys/types.h>
The function vfork has the same calling sequence and same
return values as fork .
The vfork function is intended to create a new process when
the purpose of the new process is to exec a new program.
The vfork function creates the new process, just like fork ,
without copying the address space of the parent into the
child, as the child won't reference that address space;
the child simply calls exec (or exit )right after the vfork .
Instead, while the child is running and until it calls either
exec or exit , the child runs in the address space of the
parent. This optimization provides an efficiency gain on some
paged virtual-memory implementations of the UNIX System.
Another difference between the two functions is that vfork
guarantees that the child runs first, until the child calls
exec or exit . When the child calls either of these
functions, the parent resumes.
Program 1:vfork.c
#include<sys/types.h>
#include<stdio.h>
#include<unistd.h>
int main( )
{
int pid;
int a=5,b=6;
pid=vfork( );
if(pid<0)
printf("ERROR\n");
if(pid==0)
{a++;
b--;
_exit()
A process can terminate normally in five ways:
Executing a return from the main function.
Calling the exit function.
Calling the _exit or _Exit function.
wait,waitpid
wait for process to change state-used by the parent to wait
for the child process to terminate or to retrieve the child
process exit status.
SYNOPSIS
#include <sys/types.h>
#include <sys/wait.h>
pid_t wait(int *wstatus);
pid_t waitpid(pid_t pid, int *wstatus, int options);
When a process terminates, either normally or abnormally, the
kernel notifies the parent by sending the SIGCHLD signal to
the parent. Because the termination of a child is an
asynchronous event - it can happen at any time while the
parent is running - this signal is the asynchronous
notification from the kernel to the parent.
The parent can choose to ignore this signal, or it can
provide a function that is called when the signal occurs: a
signal handler.
A process that calls wait or waitpid can:
◦ Block, if all of its children are still running
◦ Return immediately with the termination status of a child,
if a child has terminated and is waiting for its
termination status to be fetched
◦ Return immediately with an error, if it doesn't have any
child processes.
Macro Description
WIFEXITED (status) Returns a non-zero value if a
child was terminated via _exit
call and 0 otherwise.
WEXITSTATUS (status) Returns the low-order 8 bits of
the argument that the child
passed to exit , _exit ,or
_Exit.This should be called only
if WIFEXITED returns a non zero
value.
WIFSIGNALED (status) True if status was returned for
a child that terminated
abnormally, due to signal
interruption
WTERMSIG (status) Returns the signal number that
caused the termination.This
should be called only if
WIFSIGNALED returns a non zero
value.
WIFSTOPPED(status) Returns a non zero value if the
child is stopped due to job
control
WSTOPSIG(status) Returns the signal number that
had stopped the process.This
should only be called if
WIFSTOPPED returns a non zero
value
#include<stdio.h>
#include<sys/types.h>
#include<sys/wait.h>
#include<unistd.h>
int main()
{
int pid;
int status;
pid=fork();
if(pid<0)
printf("ERROR\n");
if(pid==0)
{
printf("child process created \n");
_exit(15);
exec
When a process calls one of the exec functions, that process is
completely replaced by the new program, and the new program starts
executing at its main function. The process ID does not change
across an exec , because a new process is not created; exec merely
replaces the current process - its text, data, heap, and stack
segments - with a brand new program from disk.There are 6 exec
functions:
#include <unistd.h>
program
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<errno.h>
#include<sys/types.h>
#include<sys/wait.h>
int pid;
pid=fork();
if(pid==0)
else
waitpid(pid,NULL,0);
int i;
for(i=1;i<argc;i++)
sys(argv[i]);
printf("\n");
exit(0);
PIPE
• Pipe device file has no name and is deallocated once all the
processes close their file descriptors refering the pipe.
SYNOPSIS
#include <unistd.h>
#include<sys/types.h>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/types.h>
#include<sys/wait.h>
#include<unistd.h>
int main( )
int pid;
int fifo[2],status;
char buf[80];
pid=fork( );
if(pid<0)
printf("ERROR\n");
if(pid==0)
{close(fifo[0]);
write(fifo[1],"USP",strlen("USP"));
close(fifo[1]);
exit(0);
close(fifo[1]);
printf("%s",buf);
close(fifo[0]);
return WEXITSTATUS(status);
return 3;
OUTPUT
Process
Basics
A process is a program in execution. A process is said to be born
when the program starts execution and remains alive as long as the
program is active. After execution is complete, the process is
said to die.
The kernel is responsible for the management of the processes. It
determines the time and priorities that are allocated to processes
so that more than one process can share the CPU resources.
Just as files have attributes, so have processes. These attributes
are maintained by the kernel in a data structure known as process
table. Two important attributes of a process are:
1. The Process-Id (PID): Each process is uniquely identified by a
unique integer called the PID, that is allocated by the kernel
when the process is born. The PID can be used to control a
process.
2. The Parent PID (PPID): The PID of the parent is available as a
process attribute.
The Shell Process
Examples
$ ps
PID
TTY TIME CMD
4245 pts/7 00:00:00 bash
5314 pts/7 00:00:00 ps
The output shows the header specifying the PID, the terminal
(TTY), the cumulative
processor time (TIME) that has been consumed since the process was
started, and the process
name (CMD).
$ ps -f
UID PID PPID
C STIME TTY TIME COMMAND
root 14931 136 0 08:37:48 ttys0 0:00 rlogind
sartin 14932 14931 0 08:37:50 ttys0 0:00 -sh
sartin 15339 14932 7 16:32:29 ttys0 0:00 ps -f
The header includes the following information:
UID - Login name of the user
PID - Process ID
PPID - Parent process IDSTIME - Starting time of the process in
hours, minutes and seconds
TTY - Terminal ID number
TIME - Cumulative CPU time consumed by the process
CMD - The name of the command being executed
System processes (-e or -A)
Apart from the processes a user generates, a number of system
processes keep running all the time. To list them use,
$ ps -e
When the system moves to multiuser mode, init forks and execs a
getty for every active
Each one of these getty’s prints the login prompt on the
respective terminal and then goes off to sleep. When a user tries
to log in, getty wakes up and fork-execs the login program to
verify login name and password entered. On successful login, login
for-execs the process representing the login shell. init goes off
to sleep, waiting for the children to terminate. The processes
getty and login overlay themselves.When the user logs out, it is
intimated to init, which then wakes up and spawns another getty
for that line to monitor the next login.
4. Internal and External Commands
From the process viewpoint, the shell recognizes three types of
commands:
1. External commands: Commonly used commands like cat, ls etc. The
shell creates a process for each of these commands while remaining
their parent.
2. Shell scripts: The shell executes these scripts by spawning
another shell, which then executes the commands listed in the
script. The child shell becomes the parent of the commands that
feature in the shell.
Signals
Signals are software interrupts. Signals provide a way of handling
asynchronous events: a user at a terminal typing the interrupt key
to stop a program or the next program in a pipeline terminating
prematurely.
When a signal is sent to a process, it is pending on the process
to handle it. The process can react to pending signals in one of
three ways:
When a signal is generated for a process, the kernel will set the
corresponding signal flag in the process table slot of the
recipient process.
If the recipient process is asleep, the kernel will awaken the
process by scheduling it.
When the recipient process runs, the kernel will check the process
U-area that contains an array of signal handling specifications.
If array entry contains a zero value, the process will accept the
default action of the signal.
#include<signal.h>
int kill(pid_t pid, int signal_num);
kill -f 19278
note:19278 is the pid of the process to be killed
SIMPLE FILTERS
Filters are the commands which accept data from standard input
,manipulate it and write the results to standard output.
$ cat emp.lst
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
9876 | jai sharma | director | production | 12/03/50 | 7000
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300
1265 | s.n. dasgupta | manager | sales | 12/09/63 | 5600
4290 | jayant choudhury | executive | production | 07/09/50 | 6000
more:Paging output
more is a pager command offered by the UNIX system.It enables us
to view the huge file in terms of pages.
Eg:more richard.txt
OUTPUT
Stallman grew up in New York City and went to a local public
school. He was a math whiz and put in a class for bright children
where he learnt little bit more advanced math than what most
classes taught (but mostly learning on his own). He was impacted
early in his childhood by American history and American civics whe
re he learnt about the American Civil War which was fought to
abolish slavery. The civil rights movement in the US was hitting
its peak because after slavery was abolished, many states
practiced legal discrimination against black people. He recalls,
“At that time, they (black people) were campaigning to put an end
to that (legal discrimination) which was successful. Protests and,
in some cases, killing of non-violent activists by segregationists
and advocates of inequality were in news. Even though I wasn't
tremendously engaged at the age of 10 years, I got some of this.”
eg:cat hello
how
r
u
$wc hello
OUTPUT
3 3 8 hello
Options
OPTION ACTION
-l Displays no of lines
-w Displays no of words
-c Displays the number of
characters
^H ^G ^L
On executing the command cat odfile,nothing will be displayed.To
print the non-printable characters,like ctrl H,Ctrl G and Ctrl L
seperated by Tab space as given in the above example,we have to
execute the comamnd
od odfile
OUTPUT
0000000 004410 004407 005014
0000006
-b option helps us to display value of each character seperately
od -b odfile
OUTPUT
0000000 010 011 007 011 014 012
0000006
To make sense out of the above output we generally combine wth the
-c option
od -bc odfile
OUTPUT
0000000 010 011 007 011 014 012
\b \t \a \t \f \n
0000006
Tab character-011
Enter[\n]-012
\f(Formfeed)-014
cat dept.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
01:accounts:6213
02:progs:5423 These six lines were the
03:marketing:6521 in the file dept.lst
04:personnel:2365
05:production:9876
06:sales:1006
…blank lines…
pr options
Explanation
-k prints k (integer) columns
Consider a c pgm pgm1.c
On executing cat pgm1.c
#include<stdio.h>
void main()
{
int i;
for(i=0;i<25;i++)
printf(“%d\n”,i);
}
….blank lines………………..
0 5 10 15 20
1 6 11 16 21
4 9 14 19 24
$./a.out | pr -5 – h COPYWRITE
0 5 10 15 20
1 6 11 16 21
………………………………..
4 9 14 19 24
1 0 6 5 11 10 16 15 21 20
2 1 7 6 12 11 17 16 22 21
………………………………..
5 4 10 9 15 14 20 19 25 24
0 5 10 15 20
1 6 11 16 21
………………………………..
4 9 14 19 24
pr +10 chap01
starts printing from page 10
2015 -03-12 16:15 page10
OUTPUT
7 165 171
8 12 157
cmp: EOF on hello
3C3 changes line 3 with one line,which remains line 3 after the
change
comm:what is common?
comm commands not only identifies the difference,but also displays
all the commonalities with respect to files
Options
comm -3 hello hello1//Displays lines not common
OUTPUT
u
you
nl command:Line Numbering
nl command has elaborate schemes for numbering lines.
Nl numbers only logical lines,those containing something apart
from \n characters
eg:nl hello
OUTPUT
1 how
2 r
3 u
Vi `ls –t|head –n 1`
Opens the last edited file ven if we don’t recall the name.
Anupkumar|[email protected]|24569083
Vinodsharma|[email protected]|123456789
-are used in a circular manner ie The first and the second line
are joined with a | ,second and third using | and the third and
fourth separated by “\n”.
NOTE: The entries with director are again sorted based on field 1
empID
Sorting on columns
Numeric sort
Suppose we have a file named numlist which contains only
numbers.When we sort the file based on numbers we get curious
result.It displays not in a sorted way..Thats because it
sorts based on ASCII collating sequence.To override and get
it in the sorted form use the option –n
$ sort –n numlist
Will give the numbers in sorted form
grep command scans its input for a pattern and displays lines
containing the pattern, the line numbers or filenames where the
pattern occurs. It is a principal member of a special family in
UNIX for handling search requirements.
Example:
$ grep director emp.lst shortlist
OUTPUT
grep options
-v option
-v(inverse option) selects all lines except those containing the
pattern
Example:
grep -v ‘director’ emp.lst
OUTPUT:
2233 | a.k.shukla |g.m.| sales|12/12/52| 6000
5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43| 6000
5423 | n.k.gupta |chairman| admin |30/08/56| 5400
6213 | karuna ganguly |g.m.| accounts |05/06/62| 6300
1265 | s.n. dasgupta |manager| sales |12/09/63| 5600
4290 | jayant choudhury |executive| production|07/09/50| 6000
OUTPUT:
grep -n marketing emp.lst
3:5678 | sumit chakrobarty |d.g.m.| marketing |19/04/43| 6000
11:6521 | lalit chowdury |director| marketing |26/09/45| 8200
14:2345 | j. b. sexena |g.m.| marketing |12/03/45|8000
15:0110 | v.k.agrawal |g.m.| marketing |31/12/40|9000
3,11,14,AND 15 are the lines numbers separated from the actual
line by a : where the pattern occurs.
OUTPUT:
4 //we have four occurrences of directors in
emp.lst
OUTPUT
7
OUTPUT
desig.lst
emp.lst
tempemp.lst
We can keep all the patterns in a seperate file ,one pattern per
line.Then we can find the pattern using the -f options with the
filename where the pattern has to be searched
Symbols or Matches
Expression
* Zero or more occurrences
g* nothing or g, gg, ggg, etc.
. A single character
.* nothing or any number of characters
[pqr] a single character p, q or r
[c1-c2] a single character within the ASCII range
represented by c1 and c2
[1-3] A digit between 1 and 3
[^pqr] A single character which is not p,q,or r
[^a-zA-Z] A non alphabetic character
^pat Pattern pat at the beginning of the line
Pat$ Pattern pat at the end of the line
OUTPUT
2233 | a.k.shukla |g.m.| sales|12/12/52| 6000
2365 | barun sengupta |director| personnel |11/05/47| 7800
2476 | anil aggarwal |manager| sales |01/05/59|5000
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000
OUTPUT
veeeent
Murthy
bdhsgsdhhd Murthy
dhgd
bbnsbns Murthy ndjhdjk
The Dot
OUTPUT
2233 | a.k.shukla |g.m.| sales|12/12/52| 6000
2365 | barun sengupta |director| personnel |11/05/47| 7800
2476 | anil aggarwal |manager| sales |01/05/59|5000
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000
The asterisk(*)
The asterisk is used to match multiple characters .It (*)refers
indicates zero or more occurrences of the previous character.
Eg: g*implies - nothing or g, gg, ggg, etc.
OUTPUT
2345 | j. b. saxena |g.m.| marketing |12/03/45|8000
OUTPUT
OUTPUT:
2476 | anil aggarwal |manager|sales |01/05/59|5000
3564 | sudhir agarwal |executive| personnel |06/07/47|7500
0110 | v.k.Agarwal |g.m.|marketing |31/12/40|9000
Example:
grep -E 'sengupta|dasgupta' emp.lst