Cse-IV-unix and Shell Programming (10cs44) - Notes
Cse-IV-unix and Shell Programming (10cs44) - Notes
Cse-IV-unix and Shell Programming (10cs44) - Notes
10CS44
UNIT 1
PART A
I.A.Marks : 25
Exam Hours: 03
Exam Marks: 100
6 Hours
The Unix Operating System, The UNIX architecture and Command Usage,The File System
UNIT - 2
6 H o ur s
7 H o ur s
7 H o ur s
6 Hours
6 H o ur s
7 H o ur s
7 H o ur s
10CS44
Table of contents
Sl
No.
1
2
Unit Description
UNIT 1
The Unix Operating System
UNIT - 2
Basic File Attributes, the vi Editor
UNIT - 3
4
5
7
8
Page
No.
1-19
20-33
34-62
63-76
77-87
88-121
UNIT - 7
122-
143
UNIT - 8
144-
159
10CS44
UNIT 1
1.1.
The UNIX Operating System
Introduction
This chapter introduces you to the UNIX operating system. We first look at what is an operating
system and then proceed to discuss the different features of UNIX that have made it a popular
operating system.
Objectives
System Utilities
UNIX Operating system allows complex tasks to be performed with a few keystrokes. It doesnt
tell or warn the user about the consequences of the command.
Kernighan and Pike (The UNIX Programming Environment) lamented long ago that as the
UNIX system has spread, the fraction of its users who are skilled in its application has
decreased. However, the capabilities of UNIX are limited only by your imagination.
Page 1
10CS44
Features of UNIX OS
Several features of UNIX have made it popular. Some of them are:
Portable
UNIX can be installed on many hardware platforms. Its widespread use can be traced to the
decision to develop it using the C language.
Multiuser
The UNIX design allows multiple users to concurrently share hardware and software
Multitasking
UNIX allows a user to run more than one program at a time. In fact more than one program can
be running in the background while a user is working foreground.
Networking
While UNIX was developed to be an interactive, multiuser, multitasking system, networking is
also incorporated into the heart of the operating system. Access to another system uses a standard
communications protocol known as Transmission Control Protocol/Internet Protocol (TCP/IP).
Organized File System
UNIX has a very organized file and directory system that allows users to organize and maintain
files.
Device Independence
UNIX treats input/output devices like ordinary files. The source or destination for file input and
output is easily controlled through a UNIX design feature called redirection.
Utilities
UNIX provides a rich library of utilities that can be use to increase user productivity.
Page 2
10CS44
needed to store them and the time needed to decode them - hence the tradition of short
UNIX commands we use today, e.g. ls, cp, rm, mv etc.
Ken Thompson then teamed up with Dennis Ritchie, the author of the first C compiler in
1973. They rewrote the UNIX kernel in C - this was a big step forwards in terms of the
system's portability - and released the Fifth Edition of UNIX to universities in 1974. The
Seventh Edition, released in 1978, marked a split in UNIX development into two main
branches: SYSV (System 5) and BSD (Berkeley Software Distribution). BSD arose from
the University of California at Berkeley where Ken Thompson spent a sabbatical year. Its
development was continued by students at Berkeley and other research institutions.
SYSV was developed by AT&T and other commercial companies. UNIX flavors based
on SYSV have traditionally been more conservative, but better supported than BSDbased flavors.
Until recently, UNIX standards were nearly as numerous as its variants. In early days, AT&T
published a document called System V Interface Definition (SVID). X/OPEN (now The Open
Group), a consortium of vendors and users, had one too, in the X/Open Portability Guide (XPG).
In the US, yet another set of standards, named Portable Operating System Interface for Computer
Environments (POSIX), were developed at the behest of the Institution of Electrical and
Electronics Engineers (IEEE).
In 1998, X/OPEN and IEEE undertook an ambitious program of unifying the two standards. In
2001, this joint initiative resulted in a single specification called the Single UNIX Specification,
Version 3 (SUSV3), that is also known as IEEE 1003.1:2001 (POSIX.1). In 2002, the
International Organization for Standardization (ISO) approved SUSV3 and IEEE 1003.1:2001.
Some of the commercial UNIX based on system V are:
IBM's AIX
Hewlett-Packard's HPUX
SCO's Open Server Release 5
Silicon Graphics' IRIS
DEC's Digital UNIX
Sun Microsystems' Solaris 2
1.2.
Page 3
10CS44
Introduction
In order to understand the subsequent chapters, we first need to understand the architecture of
UNIX and the concept of division of labor between two agencies viz., the shell and the kernel.
This chapter introduces the architecture of UNIX. Next we discuss the rich collection of UNIX
command set, with a specific discussion of command structure and usage of UNIX commands.
We also look at the man command, used for obtaining online help on any UNIX command.
Sometimes the keyboard sequences dont work, in which case, you need to know what to do to
fix them. Final topic of this chapter is troubleshooting some terminal problems.
Objectives
Users
Shell
K e rn e l
Hardware
System Calls
UNIX architecture comprCSEs of two major components viz., the shell and the kernel. The
kernel
interacts with the machines hardware and the shell with the user.
Page 4
10CS44
The kernel is the core of the operating system. It is a collection of routines written in C. It is
loaded into memory when the system is booted and communicates directly with the hardware.
User programs that need to access the hardware use the services of the kernel via use of system
calls and the kernel performs the job on behalf of the user. Kernel is also responsible for
managing systems memory, schedules processes, decides their priorities.
The shell performs the role of command interpreter. Even though theres only one kernel running
on the system, there could be several shells in action, one for each user whos logged in. The shell
is responsible for interpreting the meaning of metacharacters if any, found on the command line
before dispatching the command to the kernel for execution.
Locating Files
All UNIX commands are single words like ls, cd, cat, etc. These names are in lowercase. These
commands are essentially files containing programs, mainly written in C. Files are stored in
directories, and so are the binaries associated with these commands. You can find the location of
an executable program using type command:
$ type ls
ls is /bin/ls
This means that when you execute ls command, the shell locates this file in /bin directory and
makes arrangements to execute it.
The Path
The sequence of directories that the shell searches to look for a command is specified in its own
PATH variable. These directories are colon separated. When you issue a command, the shell
searches this list in the sequence specified to locate and execute it.
Page 5
10CS44
top priority to its own internal command with the same name. Some built-in commands are echo,
pwd, etc.
Command Structure
UNIX commands take the following general form:
verb [options] [arguments]
where verb is the command name that can take a set of optional options and one or more optional
arguments.
Commands, options and arguments have to be separated by spaces or tabs to enable the shell to
interpret them as words. A contiguous string of spaces and tabs together is called a whitespace.
The shell compresses multiple occurrences of whitespace into a single whitespace.
Options
An option is preceded by a minus sign (-) to distinguish it from filenames.
Example: $ ls l
There must not be any whitespaces between and l. Options are also arguments, but given a
special name because they are predetermined. Options can be normally compined with only one
sign. i.e., instead of using
$ ls l a t
we can as well use,
$ ls lat
Because UNIX was developed by people who had their own ideas as to what options should look
like, there will be variations in the options. Some commands use + as an option prefix instead of
Filename Arguments
Many UNIX commands use a filename as argument so that the command can take input from the
file. If a command uses a filename as argument, it will usually be the last argument, after all
options.
Example:
The command with its options and argumens is known as the command line, which is considered
as complete after [Enter] key is pressed, so that the entire line is fed to the shell as its input for
interpretation and execution.
Page 6
10CS44
Exceptions
Some commands in UNIX like pwd do not take any options and arguments. Some commands like
who may or may not be specified with arguments. The ls command can run without arguments
(ls), with only options (ls l), with only filenames (ls f1 f2), or using a combination of both (ls l
f1 f2). Some commands compulsorily take options (cut). Some commands like grep, sed can take
an expression as an argument, or a set of instructions as argument.
Combining Commands
Instead of executing commands on separate lines, where each command is processed and
executed before the next could be entered, UNIX allows you to specify more than one command
in the single command line. Each command has to be separated from the other by a ; (semicolon).
wc sample.txt ; ls l sample.txt
You can even group several commands together so that their combined output is redirected to a
file.
(wc sample.txt ; ls l sample.txt) > newfile
When a command line contains a semicolon, the shell understands that the command on each side
of it needs to be processed separately. Here ; is known as a metacharacter.
Note: When a command overflows into the next line or needs to be split into multiple lines, just
press enter, so that the secondary prompt (normally >) is displayed and you can enter the
remaining part of the command on the next line.
Page 7
10CS44
A pager is a program that displays one screenful information and pauses for the user to view the
contents. The user can make use of internal commands of the pager to scroll up and scroll down
the information. The two popular pagers are more and less. more is the Berkeleys pager, which
is a superior alternative to original pg command. less is the standard pager used on Linux
systems. less if modeled after a popular editor called vi and is more powerful than more as it
provides vi-like navigational and search facilities. We can use pagers with commands like ls |
more. The man command is configured to work with a pager.
wc(1)
NAME
wc displays a count of lines, words and characters in a file
SYNOPSIS
wc [-c | -m | -C] [-lw] [file ...]
DESCRIPTION
The wc utility reads one or more input files and, by default, writes the number of
newline characters, words and bytes contained in each input file to the standard output. The
utility also writes a total count for all named
files, if more than one input file is specified.
OPTIONS
The following options are supported:
-c
Count bytes.
Page 8
10CS44
-m
Count characters.
-C
same as m.
-l
Count lines.
-w
OPERANDS
The following operand is supported:
file
A path name of an input file. If no file operands are specified,
standard input will be used.
the
EXIT STATUS
See largefile(5) for the description of the behavior of wc when
greater than or equal to 2 Gbyte (2 **31 bytes)
encountering files
SEE ALSO
cksum(1),
A man page is divided into a number of compulsory and optional sections. Every command
doesnt need all sections, but the first three (NAME, SYNOPSIS and DESCRIPTION) are
generally seen in all man pages. NAME presents a one-line introduction of the command.
SYNOPSIS shows the syntax used by the command and DESCRIPTION provides a detailed
description.
The SYNOPSIS follows certain conventions and rules:
If a command argument is enclosed in rectangular brackets, then it is optional;
otherwCSE,
the argument is required.
The ellipsis (a set if three dots) implies that there can be more instances of the preceding
word.
The | means that only one of the options shows on either side of the pipe can be used.
All the options used by the command are listed in OPTIONS section. There is a separate section
named EXIT STATUS which lists possible error conditions and their numeric representation.
Note: You can use man command to view its own documentation ($ man man). You can also set
the pager to use with man ($ PAGER=less ; export PAGER). To understand which pager is being
used by man, use $ echo $PAGER.
The following table shows the organization of man documentation.
Section Subject (SVR4)
Subject (Linux)
User programs
User programs
Page 9
10CS44
Library functions
Library functions
Miscellaneous
Games
Games
Administration commands
Administration commands
nawk
nawk(1)
ftp(1)
f t pd
in.ftpd(1m)
cp(1)
-copy files
Pag e 1 0
10CS44
W hen T hi ng s G o W ro n g
Terminals and keyboards have no uniform behavioral pattern. Terminal settings directly impact
the keyboard operation. If you observe a different behavior from that expected, when you press
certain keystrokes, it means that the terminal settings are different. In such cases, you should
know which keys to press to get the required behavior. The following table lists keyboard
commands to try when things go wrong.
Keystroke
Function
or
command
[Ctrl-h]
Erases text
[Ctrl-c] or
Interrupts a command
Delete
[Ctrl-d]
[Ctrl-s]
[Ctrl-q]
[Ctrl-u]
[Ctrl-\]
Kills running program but creates a core file containing the memory
image of the program
[Ctrl-z]
[Ctrl-j]
Alternative to [Enter]
[Ctrl-m]
Alternative to [Enter]
stty sane
1.3.
Introduction
In this chapter we will look at the file system of UNIX. We also look at types of files their
significance. We then look at two ways of specifying a file viz., with absolute pathnames and
Pag e 1 1
10CS44
relative pathnames. A discussion on commands used with directory files viz., cd, pwd, mkdir,
rmdir and ls will be made. Finally we look at some of the important directories contained under
UNIX file system.
Objectives
Types of files
UNIX Filenames
Directories and Files
Absolute and Relative Pathnames
pwd print working directory
cd change directory
mkdir make a directory
rmdir remove directory
The PATH environmental variable
ls list directory contents
The UNIX File System
Types of files
A simple description of the UNIX system is this:
On a UNIX system, everything is a file; if something is not a file, it is a process.
A UNIX system makes no difference between a file and a directory, since a directory is just a file
containing names of other files. Programs, services, texts, images, and so forth, are all files. Input
and output devices, and generally all devices, are considered to be files, according to the system.
Most files are just files, called regular files; they contain normal data, for example text files,
executable files or programs, input for or output from a program and so on.
While it is reasonably safe to suppose that everything you encounter on a UNIX system is a file,
there are some exceptions.
Directories: files that are lists of other files.
Special files or Device Files: All devices and peripherals are represented by files. To read or write
a device, you have to perform these operations on its associated file. Most special files are in
/dev.
Links: a system to make a file or directory visible in multiple parts of the system's file tree.
(Domain) sockets: a special file type, similar to TCP/IP sockets, providing inter-process
networking protected by the file system's access control.
Named pipes: act more or less like sockets and form a way for processes to communicate with
each other, without using network socket semantics.
Pag e 1 2
10CS44
Directory File
A directory contains no data, but keeps details of the files and subdirectories that it contains. A
directory file contains one entry for every file and subdirectory that it houses. Each entry has two
components namely, the filename and a unique identification number of the file or directory
(called the inode number).
When you create or remove a file, the kernel automatically updates its corresponding directory by
adding or removing the entry (filename and inode number) associated with the file.
Device File
All the operations on the devices are performed by reading or writing the file representing the
device. It is advantageous to treat devices as files as some of the commands used to access an
ordinary file can be used with device files as well.
Device filenames are found in a single directory structure, /dev. A device file is not really a
stream of characters. It is the attributes of the file that entirely govern the operation of the device.
The kernel identifies a device from its attributes and uses them to operate the device.
Filenames in UNIX
On a UNIX system, a filename can consist of up to 255 characters. Files may or may not have
extensions and can consist of practically any ASCII character except the / and the Null character.
You are permitted to use control characters or other nonprintable characters in a filename.
However, you should avoid using these characters while naming a file. It is recommended that
only the following characters be used in filenames:
Alphabets and numerals.
The period (.), hyphen (-) and underscore (_).
UNIX imposes no restrictions on the extension. In all cases, it is the application that imposes that
restriction. Eg. A C Compiler expects C program filenames to end with .c, Oracle requires SQL
scripts to have .sql extension.
Pag e 1 3
10CS44
A file can have as many dots embedded in its name. A filename can also begin with or end with a
dot.
UNIX is case sensitive; cap01, Chap01 and CHAP01 are three different filenames that can
coexist in the same directory.
Pag e 1 4
10CS44
$ echo $HOME
/home/kumar
What you see above is an absolute pathname, which is a sequence of directory names starting
from root (/). The subsequent slashes are used to separate the directories.
cd - change directory
You can change to a new directory with the cd, change directory, command. cd will accept both
absolute and relative path names.
Syntax
cd [directory]
Examples
cd
cd /
cd ..
Pag e 1 5
10CS44
Examples
mkdir patch
Note the order of specifying arguments in example 3. The parent directory should be specified
first, followed by the subdirectories to be created under it.
The system may refuse to create a directory due to the following reasons:
1. The directory already exists.
2. There may be an ordinary file by the same name in the current directory.
3. The permissions set for the current directory dont permit the creation of files and directories
by the user.
E.g.
rmdir patch
Pag e 1 6
10CS44
A command runs in UNIX by executing a disk file. When you specify a command like date, the
system will locate the associated file from a list of directories specified in the PATH variable and
then executes it. The PATH variable normally includes the current directory also.
Whenever you enter any UNIX command, you are actually specifying the name of an executable
file located somewhere on the system. The system goes through the following steps in order to
determine which program to execute:
1. Built in commands (such as cd and history) are executed within the shell.
2. If an absolute path name (such as /bin/ls) or a relative path name (such as ./myprog),
system executes the program from the specified directory.
t he
Pag e 1 7
10CS44
The mode field is given by the -l option and consists of 10 characters. The first character is one
of the following:
CHARACTER
IF ENTRY IS A
directory
plain file
symbolic link
socket
The next 9 characters are in 3 sets of 3 characters each. They indicate the file access permissions:
the first 3 characters refer to the permissions for the user, the next three for the users in the Unix
group assigned to the file, and the last 3 to the permissions for other users on the system.
Designations are as follows:
r read permission
w write permission
x execute permission
- no permission
Examples
1. To list the files in a directory:
$ ls
2. To list all files in a directory, including the hidden (dot) files:
$ ls -a
3. To get a long listing:
$ ls -al
total 24
drwxr-sr-x 5 workshop acs 512 Jun 7 11:12 .
drwxr-xr-x 6 root sys 512 May 29 09:59 ..
-rwxr-xr-x 1 workshop acs 532 May 20 15:31 .cshrc
Pag e 1 8
10CS44
Content
/bin
Common programs, shared by the system, the system administrator and the users.
/ de v
Contains references to all the CPU peripheral hardware, which are represented as files with
special properties.
/etc
Most important system configuration files are in /etc, this directory contains data similar to
those in the Control Panel in Windows
/home
/lib
Library files, includes files for all kinds of programs needed by the system and the users.
/sbin
/tmp
Temporary space for use by the system, cleaned upon reboot, so don't use this for saving any
work!
/usr
/var
Pag e 1 9
10CS44
UNIT 2
2.1. Basic File Attributes
The UNIX file system allows the user to access other files not belonging to them and without
infringing on security. A file has a number of attributes (properties) that are stored in the inode. In
this chapter, we discuss,
The file type and its permissions are associated with each file. Links indicate the number of
file names maintained by the system. This does not mean that there are so many copies of the file.
File is created by the owner. Every user is attached to a group owner. File size in bytes is
displayed. Last modification time is the next field. If you change only the permissions or
ownership of the file, the modification time remains unchanged. In the last field, it displays the
file name.
For example,
$ ls l
total 72
-rw-r--r--
-rw-r--r--
-rw-rw-rw-
1 kumar metal
Pag e 2 0
10CS44
Directories are easily identified in the listing by the first character of the first column,
which here shows a d. The significance of the attributes of a directory differs a good deal from an
ordinary file. To see the attributes of a directory rather than the files contained in it, use ls ld
with the directory name. Note that simply using ls d will not list all subdirectories in the current
directory. Strange though it may seem, ls has no option to list only directories.
File Ownership
When you create a file, you become its owner. Every owner is attached to a group owner.
Several users may belong to a single group, but the privileges of the group are set by the owner of
the file and not by the group members. When the system administrator creates a user account, he
has to assign these parameters to the user:
The user-id (UID) both its name and numeric representation
The group-id (GID) both its name and numeric representation
File Permissions
UNIX follows a three-tiered file protection system that determines a files access rights.
It is displayed in the following format:
Filetype owner (rwx) groupowner (rwx) others (rwx)
For Example:
-rwxr-xr-- 1 kumar metal 20500 may 10 19:21 chap02
rwx
owner/user
r-x
group owner
r-others
Pag e 2 1
10CS44
The first group has all three permissions. The file is readable, writable and executable by
the owner of the file. The second group has a hyphen in the middle slot, which indicates the
absence of write permission by the group owner of the file. The third group has the write and
execute bits absent. This set of permissions is applicable to others.
You can set different permissions for the three categories of users owner, group and
others. Its important that you understand them because a little learning here can be a dangerous
thing. Faulty file permission is a sure recipe for disaster
Relative Permissions
chmod only changes the permissions specified in the command line and leaves the other
permissions unchanged. Its syntax is:
chmod category operation permission filename(s)
chmod takes an expression as its argument which contains:
user category (user, group, others)
operation to be performed (assign or remove a permission)
type of permission (read, write, execute)
Category
operation
permission
u - us e r
+ assign
r - read
g - group
- remove
w - write
o - others
= absolute
x - execute
a - all (ugo)
Let us discuss some examples:
Initially,
-rw-r--r--
kumar
23:38 xstart
Pag e 2 2
10CS44
kumar
23:38 xstart
The command assigns (+) execute (x) permission to the user (u), other permissions remain
unchanged.
chmod ugo+x xstart
or
or
chmod +x xstart
-rwxr-xr-x
kumar
23:38 xstart
Absolute Permissions
Here, we need not to know the current file permissions. We can set all nine permissions
explicitly. A string of three octal digits is used as an expression. The permission can be
represented by one octal digit for each category. For each category, we add octal digits. If we
represent the permissions of each category by one octal digit, this is how the permission can be
represented:
Octal
Permissions
Significance
---
no permissions
--x
execute only
-w-
write only
-wx
Pag e 2 3
10CS44
r--
read only
r-x
rw-
rwx
We have three categories and three permissions for each category, so three octal digits
can describe a files permissions completely. The most significant digit represents user and the
least one represents others. chmod can use this three-digit string as the expression.
Using relative permission, we have,
chmod a+rw xstart
Using absolute permission, we have,
chmod 666 xstart
chmod 644 xstart
chmod 761 xstart
will assign all permissions to the owner, read and write permissions for the group and only
execute permission to the others.
777 signify all permissions for all categories, but still we can prevent a file from being
deleted. 000 signifies absence of all permissions for all categories, but still we can delete a file. It
is the directory permissions that determine whether a file can be deleted or not. Only owner can
change the file permissions. User can not change other users files permissions. But the system
administrator can do anything.
The Security Implications
Let the default permission for the file xstart is
-rw-r--r-chmod u-rw, go-r xstart
or
Pag e 2 4
10CS44
Directory Permissions
It is possible that a file cannot be accessed even though it has read permission, and can be
removed even when it is write protected. The default permissions of a directory are,
rwxr-xr-x (755)
A directory must never be writable by group and others
Example:
mkdir c_progs
ls ld c_progs
drwxr-xr-x
If a directory has write permission for group and others also, be assured that every user
can remove every file in the directory. As a rule, you must not make directories universally
writable unless you have definite reasons to do so.
Pag e 2 5
10CS44
chown
Changing ownership requires superuser permission, so use su command
ls -l note
-rwxr----x
Once ownership of the file has been given away to sharma, the user file permissions that
previously applied to Kumar now apply to sharma. Thus, Kumar can no longer edit note since
there is no write privilege for group and others. He can not get back the ownership either. But he
can copy the file to his own directory, in which case he becomes the owner of the copy.
chgrp
This command changes the files group owner. No superuser permission is required.
ls l dept.lst
-rw-r--r--
-rw-r--r--
In this chapter we considered two important file attributes permissions and ownership.
After we complete the first round of discussions related to files, we will take up the other file
attributes.
vi Basics
To add some text to a file, we invoke,
vi <filename>
In all probability, the file doesnt exist, and vi presents you a full screen with the filename
shown at the bottom with the qualifier. The cursor is positioned at the top and all remaining lines
of the screen show a ~. They are non-existent lines. The last line is reserved for commands that
you can enter to act on text. This line is also used by the system to display messages. This is the
command mode. This is the mode where you can pass commands to act on text, using most of the
Pag e 2 6
10CS44
keys of the keyboard. This is the default mode of the editor where every key pressed is interpreted
as a command to run on text. You will have to be in this mode to copy and delete text
For, text editing, vi uses 24 out of 25 lines that are normally available in the terminal. To
enter text, you must switch to the input mode. First press the key i, and you are in this mode ready
to input text. Subsequent key depressions will then show up on the screen as text input.
After text entry is complete, the cursor is positioned on the last character of the last line.
This is known as current line and the character where the cursor is stationed is the current cursor
position. This mode is used to handle files and perform substitution. After the command is run,
you are back to the default command mode. If a word has been misspelled, use ctrl-w to erase the
entire word.
Now press esc key to revert to command mode. Press it again and you will hear a beep. A
beep in vi indicates that a key has been pressed unnecessarily. Actually, the text entered has not
been saved on disk but exists in some temporary storage called a buffer. To save the entered text,
you must switch to the execute mode (the last line mode). Invoke the execute mode from the
command mode by entering a: which shows up in the last line.
Pag e 2 7
10CS44
FUNCTION
inserts text
appends text
Action
:W
:x
:wq
:w <filename> save as
:w! <filename>
Pag e 2 8
10CS44
:q
:q!
:sh
:recover
Navigation
A command mode command doesnt show up on screen but simply performs a function.
To move the cursor in four directions,
k
moves cursor up
Word Navigation
Moving by one character is not always enough. You will often need to move faster along a
line. vi understands a word as a navigation unit which can be defined in two ways, depending on
the key pressed. If your cursor is a number of words away from your desired position, you can
use the word-navigation commands to go there directly. There are three basic commands:
b
Example,
5b takes the cursor 5 words back
3w takes the cursor 3 words forward
Pag e 2 9
10CS44
Scrolling
Faster movement can be achieved by scrolling text in the window using the control keys.
The two commands for scrolling a page at a time are
ctrl-f
scrolls forward
ctrl-b
scrolls backward
ctrl-u
Absolute Movement
The editor displays the total number of lines in the last line
Ctrl-g
40G
1G
Editing Text
The editing facilitates in vi are very elaborate and invoke the use of operators. They use
operators, such as,
d
delete
yank (copy)
Deleting Text
x
dd
yy
6dd
Pag e 3 0
10CS44
Moving Text
Moving text (p) puts the text at the new location.
p and P place text on right and left only when you delete parts of lines. But the same keys get
associated with below and above when you delete complete lines
Copying Text
Copying text (y and p) is achieved as,
yy
10yy
Joining Lines
J
4J
Pag e 3 1
10CS44
word
?pattern
Searches backward for the most previous instance of the pattern
n doesnt necessarily repeat a search in the forward direction. The direction depends on
the search command used. If you used? printf to search in the reverse direction in the first place,
then n also follows the same direction. In that case, N will repeat the search in the forward
direction, and not n.
Function
repeats search in same direction along which previous search was made
repeats search in direction opposite to that along which previous search was made
:3,10s/director/member/g
:.s/director/member/g
:$s/director/member/g
Pag e 3 2
10CS44
Interactive substitution: sometimes you may like to selectively replace a string. In that case, add
the c parameter as the flag at the end:
:1,$s/director/member/gc
Each line is selected in turn, followed by a sequence of carets in the next line, just below the
pattern that requires substitution. The cursor is positioned at the end of this caret sequence,
waiting for your response.
The ex mode is also used for substitution. Both search and replace operations also use
regular expressions for matching multiple patterns.
The features of vi editor that have been highlighted so far are good enough for a beginner
who should not proceed any further before mastering most of them. There are many more
functions that make vi a very powerful editor. Can you copy three words or even the entire file
using simple keystrokes? Can you copy or move multiple sections of text from one file to another
in a single file switch? How do you compile your C and Java programs without leaving the
editor? vi can do all this.
Pag e 3 3
10CS44
UNIT 3
3.1. The Shell
Introduction
In this chapter we will look at one of the major component of UNIX architecture The Shell.
Shell acts as both a command interpreter as well as a programming facility. We will look at the
interpretive nature of the shell in this chapter.
Objectives
The shell issues the prompt and waits for you to enter a command.
After a command is entered, the shell scans the command line for metacharacters and
expands abbreviations (like the * in rm *) to recreate a simplified command line.
It then passes on the command line to the kernel for execution.
Pag e 3 4
10CS44
The shell waits for the command to complete and normally cant do any work while the
command is running.
After the command execution is complete, the prompt reappears and the shell returns to
its waiting role to start the next cycle. You are free to enter another command.
Matches
A single character
[ijk]
[x-z]
[!ijk]
[!x-z]
A single character that is not within the ASCII range of the characters x and x
(Not in C Shell)
{pat1,pat2}
Examples:
To list all files that begin with chap, use
$ ls chap*
To list all files whose filenames are six character long and start with chap, use
$ ls chap??
Note: Both * and ? operate with some restrictions. for example, the * doesnt match all files
beginning with a . (dot) ot the / of a pathname. If you wish to list all hidden filenames in your
directory having at least three characters after the dot, the dot must be matched explicitly.
$ ls .???*
Pag e 3 5
10CS44
However, if the filename contains a dot anywhere but at the beginning, it need not be matched
explicitly.
Similarly, these characters dont match the / in a pathname. So, you cannot use
$ cd /usr?local to change to /usr/local.
$ ls chap[x-z]
You can negate a character class to reverse a matching criteria. For example,
- To match all filenames with a single-character extension but not the .c ot .o files,
use *.[!co]
- To match all filenames that dont begin with an alphabetic character,
use [!a-zA-Z]*
Pag e 3 6
10CS44
command as rm chap*, as it will remove all files beginning with chap. Hence to suppress the
special meaning of *, use the command rm chap\*
To list the contents of the file chap0[1-3], use
$ cat chap0\[1-3\]
A filename can contain a whitespace character also. Hence to remove a file named
My Documend.doc, which has a space embedded, a similar reasoning should be followed:
$ rm My\ Document.doc
Quoting is enclosing the wild-card, or even the entire pattern, within quotes. Anything within
these quotes (barring a few exceptions) are left alone by the shell and not interpreted.
When a command argument is enclosed in quotes, the meanings of all enclosed special characters
are turned off.
Examples:
$ rm chap*
$ rm My Document.doc
Pag e 3 7
10CS44
Pag e 3 8
10CS44
Pipes
With piping, the output of a command can be used as input (piped) to a subsequent command.
$ command1 | command2
Output from command1 is piped into input for command2.
This is equivalent to, but more efficient than:
$ command1 > temp
$ command2 < temp
$ rm temp
Examples
$ ls -al | more
$ who | sort | lpr
Creating a tee
Page 39
10CS44
tee is an external command that handles a character stream by duplicating its input. It saves one
copy in a file and writes the other to standard output. It is also a filter and hence can be placed
anywhere in a pipeline.
Example: The following command sequence uses tee to display the output of who and saves this
output in a file as well.
$ who | tee users.lst
Command substitution
The shell enables the connecting of two commands in yet another way. While a pipe enables a
command to obtain its standard input from the standard output of another command, the shell
enables one or more command arguments to be obtained from the standard output of another
command. This feature is called command substitution.
Example:
$ echo Current date and time is `date`
Observe the use of backquotes around date in the above command. Here the output of the
command execution of date is taken as argument of echo. The shell executes the enclosed
command and replaces the enclosed command line with the output of the command.
Similarly the following command displays the total number of files in the working directory.
$ echo There are `ls | wc l` files in the current directory
Observe the use of double quotes around the argument of echo. If you use single quotes, the
backquote is not interpreted by the shell if enclosed in single quotes.
Shell variables
Environmental variables are used to provide information to the programs you use. You can have
both global environment and local shell variables. Global environment variables are set by your
login shell and new programs and shells inherit the environment of their parent shell. Local shell
variables are used only by that shell and are not passed on to other processes. A child process
cannot pass a variable back to its parent process.
To declare a local shell variable we use the form variable=value (no spaces around =) and its
evaluation requires the $ as a prefix to the variable.
Example:
$ count=5
$ echo $count
5
Pag e 4 0
10CS44
A variable can be removed with unset and protected from reassignment by readonly. Both are
shell internal commands.
Note: In C shell, we use set statement to set variables. Here, there either has to be whitespace on
both sides of the = or none at all.
$ set count=5
$ set size = 10
file=$base$ext
echo $file
// prints foo.c
Objectives
Process Basics
ps: Process Status
Mechanism of Process Creation
Internal and External Commands
Process States and Zombies
Background Jobs
nice: Assigning execution priority
Processes and Signals
job Control
at and batch: Execute Later
cron command: Running Jobs Periodically
time: Timing Usage Statistics at process runtime
Pag e 4 1
10CS44
Process Basics
UNIX is a multiuser and multitasking operating system. Multiuser means that several people can
use the computer system simultaneously (unlike a single-user operating system, such as MSDOS). Multitasking means that UNIX, like Windows NT, can work on several tasks concurrently;
it can begin work on one task and take up another before the first task is finished.
When you execute a program on your UNIX system, the system creates a special environment for
that program. This environment contains everything needed for the system to run the program as
if no other program were running on the system. Stated in other words, a process is created. A
process is a program in execution. A process is said to be born when the program starts execution
and remains alive as long as the program is active. After execution is complete, the process is said
t o di e .
The kernel is responsible for the management of the processes. It determines the time and
priorities that are allocated to processes so that more than one process can share the CPU
resources.
Just as files have attributes, so have processes. These attributes are maintained by the kernel in a
data structure known as process table. Two important attributes of a process are:
1. The Process-Id (PID): Each process is uniquely identified by a unique integer called the
PID, that is allocated by the kernel when the process is born. The PID can be used to
control a process.
2. The Parent PID (PPID): The PID of the parent is available as a process attribute.
There are three types of processes viz.,
1.
2.
3.
Pag e 4 2
10CS44
It may wait for the child to die so that it can spawn the next process. The death of the
child is intimated to the parent by the kernel. Shell is an example of a parent that waits
for the child to terminate. However, the shell can be told not to wait for the child to
terminate.
It may not wait for the child to terminate and may continue to spawn other processes. init
process is an example of such a parent process.
BSD
Significance
-f
-e or A
a ux
-u user
U user
-a
-l
-t term
t term
Examples
$ ps
PID TTY TIME
CMD
Pag e 4 3
10CS44
The output shows the header specifying the PID, the terminal (TTY), the cumulative processor
time (TIME) that has been consumed since the process was started, and the process name (CMD).
$ ps -f
UID PID PPID C STIME TTY TIME COMMAND
root 14931 136 0 08:37:48 ttys0 0:00 rlogind
sartin 14932 14931 0 08:37:50 ttys0 0:00 -sh
sartin 15339 14932 7 16:32:29 ttys0 0:00 ps f
The header includes the following information:
UID Login name of the user
PID Process ID
PPID Parent process ID
C An index of recent processor utilization, used by kernel for scheduling
STIME Starting time of the process in hours, minutes and seconds
TTY Terminal ID number
TIME Cumulative CPU time consumed by the process
CMD The name of the command being executed
TTY TIME
CMD
0:34
41:55 init
sh
272
2:47
sched
cron
Pag e 4 4
10CS44
term/12 20:04 vi
Fork: A process in UNIX is created with the fork system call, which creates a copy of the
process that invokes it. The process image is identical to that of the calling process,
except for a few parameters like the PID. The child gets a new PID.
Exec: The forked child overwrites its own image with the code and data of the new
program. This mechanism is called exec, and the child process is said to exec a new
program, using one of the family of exec system calls. The PID and PPID of the execd
process remain unchanged.
Wait: The parent then executes the wait system call to wait for the child to complete. It
picks up the exit status of the child and continues with its other functions. Note that a
parent need not decide to wait for the child to terminate.
To get a better idea of this, let us explain with an example. When you enter ls to look at the
contents of your current working directory, UNIX does a series of things to create an environment
for ls and the run it:
The shell has UNIX perform a fork. This creates a new process that the shell will use to
run the ls program.
The shell has UNIX perform an exec of the ls program. This replaces the shell program
and data with the program and data for ls and then starts running that new program.
The ls program is loaded into the new process context, replacing the text and data of the
shell.
The ls program performs its task, listing the contents of the current directory. In the
meanwhile, the shell executes wait system call for ls to complete.
When a process is forked, the child has a different PID and PPID from its parent. However, it
inherits most of the attributes of the parent. The important attributes that are inherited are:
User name of the real and effective user (RUID and EUID): the owner of the process.
The real owner is the user issuing the command, the effective user is the one determining
access to system resources. RUID and EUID are usually the same, and the process has
the same access rights the issuing user would have.
Real and effective group owner (RGID and EGID): The real group owner of a process is
the primary group of the user who started the process. The effective group owner is
usually the same, except when SGID access mode has been applied to a file.
The current directory from where the process was run.
The file descriptors of all files opened by the parent process.
Environment variables like HOME, PATH.
The inheritance here means that the child has its own copy of these parameters and thus can alter
the environment it has inherited. But the modified environment is not available to the parent
process.
Pag e 4 5
10CS44
H o w th e S he l l i s c r e a te d ?
init
getty
fork
shell
login
fork-exec
fork-exec
When the system moves to multiuser mode, init forks and execs a getty for every active
communication port.
Each one of these gettys prints the login prompt on the respective terminal and then goes off
to sleep.
When a user tries to log in, getty wakes up and fork-execs the login program to verify login
name and password entered.
On successful login, login for-execs the process representing the login shell.
init goes off to sleep, waiting for the children to terminate. The processes getty and login
overlay themselves.
When the user logs out, it is intimated to init, which then wakes up and spawns another getty
for that line to monitor the next login.
Pag e 4 6
10CS44
It is possible for the parent itself to die before the child dies. In such case, the child becomes an
orphan and the kernel makes init the parent of the orphan. When this adopted child dies, init
waits for its death.
Note:
1. Observe that the shell acknowledges the background command with two numbers. First
number [1] is the job ID of this command. The other number 1413 is the PID.
2. When you specify a command line in a pipeline to run in the background, all the
commands are run in the background, not just the last command.
3. The shell remains the parent of the background process.
Pag e 4 7
10CS44
If you try to run a command with nohup and havent redirected the standard error, UNIX
automatically places any error messages in a file named nohup.out in the directory from which
the command was run.
In the following command, the sorted file and any error messages are placed in the file nohup.out.
$ nohup sort sales.dat &
1252
Sending output to nohup.out
Note that the shell has returned the PID (1252) of the process.
When the user logs out, the child turns into an orphan. The kernel handles such situations by
reassigning the PPID of the orphan to the systems init process (PID 1) - the parent of all shells.
When the user logs out, init takes over the parentage of any process run with nohup. In this way,
you can kill a parent (the shell) without killing its child.
Additional Points
When you run a command in the background, the shell disconnects the standard input from the
keyboard, but does not disconnect its standard output from the screen. So, output from the
command, whenever it occurs, shows up on screen. It can be confusing if you are entering
another command or using another program. Hence, make sure that both standard output and
standard error are redirected suitably.
$ find . name *.log print> log_file 2> err.dat &
OR
Important:
1. You should relegate time-consuming or low-priority jobs to the background.
2. If you log out while a background job is running, it will be terminated.
Pag e 4 8
10CS44
Background jobs execute without a terminal attached and are usually run in the background for
two reasons:
1. the job is expected to take a relatively long time to finish, and
2. the job's results are not needed immediately.
Interactive processes, however, are usually shells where the speed of execution is critical because
it directly affects the system's apparent response time. It would therefore be nice for everyone
(others as well as you) to let interactive processes have priority over background work.
nice values are system dependent and typically range from 1 to 19.
A high nice value implies a lower priority. A program with a high nice number is friendly to other
programs, other users and the system; it is not an important job. The lower the nice number, the
more important a job is and the more resources it will take without sharing them.
Example:
$ nice wc l hugefile.txt
OR
Pag e 4 9
10CS44
Its using the wrong files for input or output because of an operator or
programming error.
If the process to be stopped is a background process, use the kill command to get out of these
situations. To stop a command that isnt in the background, press <ctrl-c>.
To use kill, use either of these forms:
kill PID(s)
OR
OR
The system variable $! stores the PID of the last background job. You can kill the last background
job without knowing its PID by specifying $ kill $!
Note: You can kill only those processes that you own; You cant kill processes of other
users. To kill all background jobs, enter kill 0.
Job Control
A job is a name given to a group of processes that is typically created by piping a series of
commands using pipeline character. You can use job control facilities to manipulate jobs. You
can use job control facilities to,
1. Relegate a job to the background (bg)
2. Bring it back to the foreground (fg)
3. List the active jobs (jobs)
4. Suspend a foreground job ([Ctrl-z])
5. Kill a job (kill)
The following examples demonstrate the different job control facilities.
Assume a process is taking a long time. You can suspend it by pressing [Ctrl-z].
[1] + Suspended
wc l hugefile.txt
Pag e 5 0
10CS44
A suspended job is not terminated. You can now relegate it to background by,
$ bg
You can start more jobs in the background any time:
$ sort employee.dat > sortedlist.dat &
[2]
530
540
[2] - Running
[1]
wc l hugefile.txt
Suspended
$ fg %sort
Pag e 5 1
10CS44
at hh:mm
24-hour clock
at hh:mm month day year
at -l
at r job_id
Pag e 5 2
10CS44
To sort a collection of files, print the results, and notify the user named boss that the job is done,
enter the following commands:
$ batch
sort /usr/sales/reports/* | lp
echo Files printed, Boss! | mailx -sJob done boss
The system returns the following response:
job 7789001234.b at Fri Sep 7 11:43:09 2007
The date and time listed are the date and time you pressed <Ctrl-d> to complete the batch
command. When the job is complete, check your mail; anything that the commands normally
display is mailed to you. Note that any job scheduled with batch command goes into a special at
queue.
Pag e 5 3
10CS44
Range
----------------------------------------------------------------------------------------------minute
00 through 59
hour
day-of-month
01 through 31
month-of-year 01 through 12
day-of-week
----------------------------------------------------------------------------------------------The first five fields are time option fields. You must specify all five of these fields. Use an
asterisk (*) in a field if you want to ignore that field.
Examples:
00-10 17 * 3.6.9.12 5 find / -newer .last_time print >backuplist
In the above entry, the find command will be executed every minute in the first 10 minutes after 5
p.m. every Friday of the months March, June, September and December of every year.
30 07 * * 01 sort /usr/wwr/sales/weekly |mail -sWeekly Sales srm
In the above entry, the sort command will be executed with /usr/www/sales/weekly as argument
and the output is mailed to a user named srm at 7:30 a.m. each Monday.
Pag e 5 4
10CS44
The time command executes the specified command and displays the time usage on the terminal.
Example: You can find out the time taken to perform a sorting operation by preceding the sort
command with time.
$ time sort employee.dat > sortedlist.dat
real
0m29.811s
user
0m1.370s
sys
0m9.990s
where,
the real time is the clock elapsed from the invocation of the command until its termination.
the user time shows the time spent by the program in executing itself.
the sys time indicates the time used by the kernel in doing work on behalf of a user process.
The sum of user time and sys time actually represents the CPU time. This could be significantly
less than the real time on a heavily loaded system.
3.3.
Introduction
The UNIX environment can be highly customized by manipulating the settings of the shell.
Commands can be made to change their default behavior, environment variables can be
redefined, the initialization scripts can be altered to obtain a required shell environment. This
chapter discusses different ways and approaches for customizing the environment.
Objectives
The Shell
Environment Variables
Common Environment Variables
Command Aliases (bash and korn)
Command History Facility (bash and korn)
In-Line Command Editing (bash and korn)
Miscellaneous Features (bash and korn)
The Initialization Scripts
The Shell
The UNIX shell is both an interpreter as well as a scripting language. An interactive shell turns
noninteractive when it executes a script.
Pag e 5 5
10CS44
Bourne Shell This shell was developed by Steve Bourne. It is the original UNIX shell. It has
strong programming features, but it is a weak interpreter.
C Shell This shell was developed by Bill Joy. It has improved interpretive features, but it
wasnt suitable for programming.
Korn Shell This shell was developed by David Korn. It combines best features of the bourne
and C shells. It has features like aliases, command history. But it lacks some features of the C
shell.
Bash Shell This was developed by GNU. It can be considered as a superset that combined the
features of Korn and C Shells. More importantly, it conforms to POSIX shell specification.
Environment Variables
We already mentioned a couple of environment variables, such as PATH and HOME. Until now,
we only saw examples in which they serve a certain purpose to the shell. But there are many other
UNIX utilities that need information about you in order to do a good job.
What other information do programs need apart from paths and home directories? A lot of
programs want to know about the kind of terminal you are using; this information is stored in the
TERM variable. The shell you are using is stored in the SHELL variable, the operating system
type in OS and so on. A list of all variables currently defined for your session can be viewed
entering the env command.
The environment variables are managed by the shell. As opposed to regular shell variables,
environment variables are inherited by any program you start, including another shell. New
processes are assigned a copy of these variables, which they can read, modify and pass on in turn
to their own child processes.
The set statement display all variables available in the current shell, but env command displays
only environment variables. Note than env is an external command and runs in a child process.
There is nothing special about the environment variable names. The convention is to use
uppercase letters for naming one.
Stored information
size of the shell history file in number of lines
HOME
HOSTNAME
LOGNAME
login name
MAIL
MANPATH
Pag e 5 6
10CS44
PATH
PS1
PS2
primary prompt
secondary prompt
PWD
SHELL
TERM
UID
terminal type
user ID
USER
MAILCHECK
CDPATH
Pag e 5 7
10CS44
Bash and Korn also support a history facility that treats a previous command as an event and
associates it with a number. This event number is represented as !.
$ PS1=[!]
$ PS1=[! $PWD]
[42] _
[42 /home/srm/progs] _
$ PS1=\h>
saturn> _
Aliases
Bash and korn support the use of aliases that let you assign shorthand names to frequently used
commands. Aliases are defined using the alias command. Here are some typical aliases that one
may like to use:
Command History
Bash and Korn support a history feature that treats a previous command as an event and
associates it with an event number. Using this number you can recall previous commands, edit
them if required and reexecute them.
The history command displays the history list showing the event number of every previously
executed command. With bash, the complete history list is displayed, while with korn, the last 16
commands. You can specify a numeric argument to specify the number of previous commands to
display, as in, history 5 (in bash) or history -5 (korn).
Pag e 5 8
10CS44
By default, bash stores all previous commands in $HOME/.bash_history and korn stores them in
$HOME/.sh_history. When a command is entered and executed, it is appended to the list
maintained in the file.
The command with event number 38 is displayed and executed (Use r 38 in korn)
$ !38:p
$ !!
$ !-2
in bash
$ r cp doc=txt
in korn
Pag e 5 9
10CS44
Variable HISTFILE determines the filename that saves the history list. Bash uses two variables
HISTSIZE for setting the size of the history list in memory and HISTFILESIZE for setting the
size of disk file. Korn uses HISTSIZE for both the purposes.
1. Using set o
The set statement by default displays the variables in the current shell, but in Bash and Korn, it
can make several environment settings with o option.
File Overwriting(noclobber): The shells > symbol overwrites (clobbers) an existing file, and o
prevent such accidental overwriting, use the noclobber argument:
set o noclobber
Now, if you redirect output of a command to an existing file, the shell will respond with a
message that says it cannot overwrite existing file or file already exists. To override this
protection, use the | after the > as in,
head n 5 emp.dat >| file1
Accidental Logging out (ignoreeof): The [Ctrl-d] key combination has the effect of terminating
the standard input as well as logging out of the system. In case you accidentally pressed [Ctrl-d]
twice while terminating the standard input, it will log you off! The ignoreeof keyword offers
protection from accidental logging out:
set o ignoreeof
But note that you can logout only by using exit command.
A set option is turned off with set +o keyword. To reverse the noclobber feature, use
set +o noclobber
Pag e 6 0
10CS44
2. Tilde Substitution
The ~ acts as a shorthand representation for the home directory. A configuration file like .profile
that exists in the home directory can be referred to both as $HOME/.profile and ~/.profile.
You can also toggle between the directory you switched to most recently and your current
directory. This is done with the ~- symbols (or simply -, a hyphen). For example, either of the
following commands change to your previous directory:
cd ~ -
OR
cd
The Profile
When logging into an interactive login shell, login will do the authentication, set the environment
and start your shell. In the case of bash, the next step is reading the general profile from /etc, if
that file exists. bash then looks for ~/.bash_profile, ~/.bash_login and ~/.profile, in that order, and
reads and executes commands from the first one that exists and is readable. If none exists,
/etc/bashrc is applied.
When a login shell exits, bash reads and executes commands from the file, ~/.bash_logout, if it
exists.
The profile contains commands that are meant to be executed only once in a session. It can also
be used to customize the operating environment to suit user requirements. Every time you change
the profile file, you should either log out and log in again or You can execute it by using a special
command (called dot).
$ . .profile
The rc File
Normally the profiles are executed only once, upon login. The rc files are designed to be executed
every time a separate shell is created. There is no rc file in Bourne, but bash and korn use one.
This file is defined by an environment variable BASH_ENV in Bash and ENV in Korn.
Pag e 6 1
10CS44
export BASH_ENV=$HOME/.bashrc
export ENV=$HOME/.kshrc
Korn automatically executes .kshrc during login if ENV is defined. Bash merely ensures that a
sub-shell executes this file. If the login shell also has to execute this file then a separate entry
must be added in the profile:
. ~/.bashrc
The rc file is used to define command aliases, variable settings, and shell options. Some sample
entries of an rc file are
alias cp=cp i
alias rm=rm i
set o noclobber
set o ignoreeof
set o vi
The rc file will be executed after the profile. However, if the BASH_ENV or ENV variables are
not set, the shell executes only the profile.
Pag e 6 2
10CS44
UNIT 4
4.1. MORE FILE ATTRIBUTES
Apart from permissions and ownership, a UNIX file has several other attributes, and in
this chapter, we look at most of the remaining ones. A file also has properties related to its time
stamps and links. It is important to know how these attributes are interpreted when applied to a
directory or a device.
This chapter also introduces the concepts of file system. It also looks at the inode, the
lookup table that contained almost all file attributes. Though a detailed treatment of the file
systems is taken up later, knowledge of its basics is essential to our understanding of the
significance of some of the file attributes. Basic file attributes has helped us to know about - ls l
to display file attributes (properties), listing of a specific directory, ownership and group
ownership and different file permissions. ls l provides attributes like permissions, links, owner,
group owner, size, date and the file name.
File Systems and inodes
The hard disk is split into distinct partitions, with a separate file system in each partition.
Every file system has a directory structure headed by root.
n partitions = n file systems = n separate root directories
All attributes of a file except its name and contents are available in a table inode (index
node), accessed by the inode number. The inode contains the following attributes of a file:
File type
File permissions
Number of links
The UID of the owner
The GID of the group owner
File size in bytes
Date and time of last modification
Date and time of last access
Date and time of last change of the inode
An array of pointers that keep track of all disk blocks used by the file
Please note that, neither the name of the file nor the inode number is stored in the inode. To know
inode number of a file:
ls -il tulec05
9059 -rw-r--r-- 1 kumar metal 51813 Jan 31 11:15 tulec05
Where, 9059 is the inode number and no other file can have the same inode number in the same
file system.
Pag e 6 3
10CS44
Hard Links
The link count is displayed in the second column of the listing. This count is normally 1, but the
following files have two links,
-rwxr-xr-- 2 kumar metal 163 Jull 13 21:36 backup.sh
-rwxr-xr-- 2 kumar metal 163 Jul 13 21:36 restore.sh
All attributes seem to be identical, but the files could still be copies. Its the link count that seems
to suggest that the files are linked to each other. But this can only be confirmed by using the i
option to ls.
ls -li backup.sh restore.sh
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 backup.sh
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 restore.sh
Pag e 6 4
10CS44
returns an error when the destination file exists. Use the f option to force the removal of the
existing link before creation of the new one
Symbolic Links
Unlike the hard linked, a symbolic link doesnt have the files contents, but simply
provides the pathname of the file that actually has the contents.
ln -s note note.sym
ls -li note note.sym
9948 -rw-r--r-- 1 kumar group 80 feb 16 14:52 note
9952 lrwxrwxrwx 1 kumar group 4 feb16 15:07note.sym ->note
Where, l indicate symbolic link file category. -> indicates note.sym contains the pathname for the
filename note. Size of symbolic link is only 4 bytes; it is the length of the pathname of note.
Its important that this time we indeed have two files, and they are not identical.
Removing note.sym wont affect us much because we can easily recreate the link. But if we
remove note, we would lose the file containing the data. In that case, note.sym would point to a
nonexistent file and become a dangling symbolic link.
Symbolic links can also be used with relative pathnames. Unlike hard links, they can also
span multiple file systems and also link directories. If you have to link all filenames in a directory
to another directory, it makes sense to simply link the directories. Like other files, a symbolic link
has a separate directory entry with its own inode number. This means that rm can remove a
symbolic link even if its points to a directory.
A symbolic link has an inode number separate from the file that it points to. In most
cases, the pathname is stored in the symbolic link and occupies space on disk. However, Linux
Pag e 6 5
10CS44
uses a fast symbolic link which stores the pathname in the inode itself provided it doesnt exceed
60 characters.
The Directory
A directory has its own permissions, owners and links. The significance of the file
attributes change a great deal when applied to a directory. For example, the size of a directory is
in no way related to the size of files that exists in the directory, but rather to the number of files
housed by it. The higher the number of files, the larger the directory size. Permission acquires a
different meaning when the term is applied to a directory.
ls -l -d progs
drwxr-xr-x 2 kumar metal 320 may 9 09:57 progs
The default permissions are different from those of ordinary files. The user has all
permissions, and group and others have read and execute permissions only. The permissions of a
directory also impact the security of its files. To understand how that can happen, we must know
what permissions for a directory really mean.
Read permission
Read permission for a directory means that the list of filenames stored in that directory is
accessible. Since ls reads the directory to display filenames, if a directorys read permission is
removed, ls wont work. Consider removing the read permission first from the directory progs,
ls -ld progs
drwxr-xr-x 2 kumar metal 128 jun 18 22:41 progs
chmod -r progs ; ls progs
progs: permission denied
Write permission
We cant write to a directory file. Only the kernel can do that. If that were possible, any
user could destroy the integrity of the file system. Write permission for a directory implies that
you are permitted to create or remove files in it. To try that out, restore the read permission and
remove the write permission from the directory before you try to copy a file to it.
chmod 555 progs ; ls ld progs
dr-xr-xr-x 2 kumar metal 128 jun 18 22:41 progs
cp emp.lst progs
cp: cannot create progs/emp.lst: permission denied
Pag e 6 6
10CS44
The write permission for a directory determines whether we can create or remove files in
it because these actions modify the directory
Whether we can modify a file depends on whether the file itself has write permission.
Changing a file doesn't modify its directory entry
Execute permission
If a single directory in the pathname doesnt have execute permission, then it cant be
searched for the name of the next directory. Thats why the execute privilege of a directory is
often referred to as the search permission. A directory has to be searched for the next directory, so
the cd command wont work if the search permission for the directory is turned off.
chmod 666 progs ; ls ld progs
drw-rw-rw- 2 kumar metal 128 jun 18 22:41 progs
cd progs
permission denied to search and execute it
ls -l
ls lu
Pag e 6 7
10CS44
The access time is displayed when ls -l is combined with the -u option. Knowledge of files
modification and access times is extremely important for the system administrator. Many of the
tools used by them look at these time stamps to decide whether a particular file will participate in
a backup or not.
Pag e 6 8
10CS44
selecton_criteria
action
where,
The path_list comprCSEs one or more subdirectories separated by white space. There can also be
a
host of selection_criteria that you use to match a file, and multiple actions to dispose of the file.
This makes the command difficult to use initially, but it is a program that every user must master
since it lets him make file selection under practically any condition.
Pag e 6 9
10CS44
pr : paginating files
We know that,
cat dept.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
pr command adds suitable headers, footers and formatted text. pr adds five lines of margin at the
top and bottom. The header shows the date and time of last modification of the file along with the
filename and page number.
pr dept.lst
May 06 10:38 1997 dept.lst page 1
01:accounts:6213
02:progs:5423
03:marketing:6521
04:personnel:2365
05:production:9876
06:sales:1006
blank lines
Pag e 7 0
10CS44
pr options
The different options for pr command are:
-k prints k (integer) columns
-t to suppress the header and footer
-h to have a header of users choice
-d double spaces input
-n will number each line and helps in debugging
-on offsets the lines by n spaces and increases left margin of page
pr +10 chap01
starts printing from page 10
pr -l 54 chap01
this option sets the page length to 54
Pag e 7 1
10CS44
Use tail f when we are running a program that continuously writes to a file, and we want to see
how the file is growing. We have to terminate this command with the interrupt key.
cut slitting a file vertically
It is used for slitting the file vertically. head -n 5 emp.lst | tee shortlist will select the first five
lines of emp.lst and saves it to shortlist. We can cut by using -c option with a list of column
numbers, delimited by a comma (cutting columns).
cut -c 6-22,24-32 shortlist
cut -c -3,6-22,28-34,55- shortlist
The expression 55- indicates column number 55 to end of line. Similarly, -3 is the same as 1-3.
Most files dont contain fixed length lines, so we have to cut fields rather than columns (cutting
fields).
-d for the field delimiter
-f for the field list
cut -d \ | -f 2,3 shortlist | tee cutlist1
will display the second and third columns of shortlist and saves the output in cutlist1.
here | is escaped to prevent it as pipeline character
Pag e 7 2
10CS44
Where each field will be separated by the delimiter |. Even though paste uses at least two files for
concatenating lines, the data for one file can be supplied through the standard input.
Joining lines (-s)
Let us consider that the file address book contains the details of three persons
cat addressbook
paste -s addressbook -to print in one single line
paste -s -d | | \n addressbook -are used in a circular manner
sort options
The important sort options are:
-tchar
-k n
-k m,n
-k m.n
-u
-n
sorts numerically
-r
-f
-m list
-c
-o flname
sort t| k 2 shortlist
Pag e 7 3
10CS44
or
sort t| k 2r shortlist
sort order can be revered with this r option.
sort t| k 3,3 k 2,2 shortlist
sorting on secondary key is also possible as shown above.
sort t| k 5.7,5.8 shortlist
we can also specify a character position with in a field to be the beginning of sort as
shown above (sorting on columns).
sort n numfile
when sort acts on numericals, strange things can happen. When we sort a file containing
only numbers, we get a curious result. This can be overridden by n (numeric) option.
cut d | f3 emp.lst | sort u | tee desigx.lst
Removing repeated lines can be possible using u option as shown above. If we cut out
the designation filed from emp.lst, we can pipe it to sort to find out the unique designations that
occur in the file.
Other sort options are:
sort o sortedlist k 3 shortlist
sort o shortlist shortlist
sort c shortlist
sort t | c k 2 shortlist
sort m foo1 foo2 foo3
Pag e 7 4
10CS44
simply fetches one copy of each line and writes it to the standard output. Since uniq requires a
sorted file as input, the general procedure is to sort a file and pipe its output to uniq. The
following pipeline also produces the same output, except that the output is saved in a file:
sort dept.lst | uniq uniqlist
Different uniq options are :
Selecting the nonrepeated lines (-u)
cut d | f3 emp.lst | sort | uniq u
Selecting the duplicate lines (-d)
cut d | f3 emp.lst | sort | uniq d
Counting frequency of occurrence (-c)
cut d | f3 emp.lst | sort | uniq c
Pag e 7 5
10CS44
Pag e 7 6
10CS44
UNIT 5
5.1. Filters Using Regular Expression
grep and sed
We often need to search a file for a pattern, either to see the lines containing (or not
containing) it or to have it replaced with something else. This chapter discusses two important
filters that are specially suited for these tasks grep and sed. grep takes care of all search
requirements we may have. sed goes further and can even manipulate the individual characters in
a line. In fact sed can de several things, some of then quite well.
grep searching for a pattern
It scans the file / input for a pattern and displays lines containing the pattern, the line
numbers or filenames where the pattern occurs. Its a command from a special family in UNIX
for handling search requirements.
grep options pattern filename(s)
grep sales emp.lst
will display lines containing sales from the file emp.lst. Patterns with and without quotes is
possible. Its generally safe to quote the pattern. Quote is mandatory when pattern involves more
than one word. It returns the prompt in case the pattern cant be located.
grep president emp.lst
When grep is used with multiple filenames, it displays the filenames along with the output.
grep director emp1.lst emp2.lst
Where it shows filename followed by the contents
grep options
grep is one of the most important UNIX commands, and we must know the options that
POSIX requires grep to support. Linux supports all of these options.
-i
-v
-n
-c
-l
-e exp
Pag e 7 7
-f file
-E
-F
10CS44
g*
A single character
Pag e 7 8
10CS44
.*
[pqr]
a single character p, q or r
[c1-c2]
The *
The asterisk refers to the immediately preceding character. * indicates zero or more occurrences
of the previous character.
g* nothing or g, gg, ggg, etc.
grep [aA]gg*[ar][ar]wal emp.lst
Notice that we dont require to use e option three times to get the same output!!!!!
The dot
A dot matches a single character. The shell uses ? Character to indicate that.
.*
Pag e 7 9
10CS44
grep ^2 emp.lst
Selects lines where emp_id starting with 2
grep 7$ emp.lst
Selects lines where emp_salary ranges between 7000 to 7999
grep ^[^2] emp.lst
Selects lines where emp_id doesnt start with 2
ch?
exp1|exp2
(x1|x2)x3
Pag e 8 0
10CS44
Line Addressing
sed 3q emp.lst
Just similar to head n 3 emp.lst. Selects first three lines and quits
sed n 1,2p emp.lst
p prints selected lines as well as all lines. To suppress this behavior, we use n whenever we use
p command
sed n $p emp.lst
Selects last line of the file
sed n 9,11p emp.lst
Selecting lines from anywhere of the file, between lines from 9 to 11
sed n 1,2p
7,9p
$p emp.lst
Pag e 8 1
10CS44
Context Addressing
We can specify one or more patterns to locate lines
sed n /director/p emp.lst
We can also specify a comma-separated pair of context addresses to select a group of lines.
sed n /dasgupta/,/saxena/p emp.lst
Line and context addresses can also be mixed
sed n 1,/dasgupta/p emp.lst
Pag e 8 2
10CS44
Text Editing
text.
sed supports inserting (i), appending (a), changing (c) and deleting (d) commands for the
$ sed 1i\
> #include <stdio.h>\
> #include <unistd.h>
> foo.c > $$
Pag e 8 3
10CS44
Will add two include lines in the beginning of foo.c file. Sed identifies the line without the \ as
the last line of input. Redirected to $$ temporary file. This technique has to be followed when
using the a and c commands also. To insert a blank line after each line of the file is printed
(double spacing text), we have,
sed a\
emp.lst
or
Substitution (s)
Substitution is the most important feature of sed, and this is one job that sed does
exceedingly well.
[address]s/expression1/expression2/flags
Just similar to the syntax of substitution in vi editor, we use it in sed also.
sed s/|/:/ emp.lst | head n 2
2233:a.k.shukla |gm |sales |12/12/52|6000
9876:jai sharma |director|production|12/03/50|7000
Only the first instance of | in a line has been replaced. We need to use the g (global) flag
to replace all the pipes.
sed s/|/:/g emp.lst | head n 2
We can limit the vertical boundaries too by specifying an address (for first three lines only).
sed 1,3s/|/:/g emp.lst
Replace the word director with member in the first five lines of emp.lst
sed 1,5s/director/member/ emp.lst
sed also uses regular expressions for patterns to be substituted. To replace all occurrence of
agarwal, aggarwal and agrawal with simply Agarwal, we have,
sed s/[Aa]gg*[ar][ar]wal/Agarwal/g emp.lst
Pag e 8 4
10CS44
We can also use ^ and $ with the same meaning. To add 2 prefix to all emp-ids,
sed s/^/2/ emp.lst | head n 1
22233 | a.k.shukla | gm | sales | 12/12/52 | 6000
To add .00 suffix to all salary,
sed s/$/.00/ emp.lst | head n 1
2233 | a.k.shukla | gm | sales | 12/12/52 | 6000.00
Performing multiple substitutions
sed s/<I>/<EM>/g
s/<B>/<STRONG>/g
s/<U>/<EM>/g form.html
An instruction processes the output of the previous instruction, as sed is a stream editor and
works on data stream
sed s/<I>/<EM>/g
s/<EM>/<STRONG>/g form.html
When a g is used at the end of a substitution instruction, the change is performed globally along
the line. Without it, only the left most occurrence is replaced. When there are a group of
instructions to execute, you should place these instructions in a file instead and use sed with the
f option.
Compressing multiple spaces
sed s/*|/|/g emp.lst | tee empn.lst | head n 3
2233|a.k.shukla|g.m|sales|12/12/52|6000
9876|jai sharma|director|production|12/03/50|7000
5678|sumit chakrobarty|dgm|mrking|19/04/43|6000
The remembered patterns
Consider the below three lines which does the same job
sed s/director/member/ emp.lst
sed /director/s//member/ emp.lst
sed /director/s/director/member/ emp.lst
Pag e 8 5
10CS44
The // representing an empty regular expression is interpreted to mean that the search and
substituted patterns are the same
sed s/|//g emp.lst
The interval RE - { }
sed and grep uses IRE that uses an integer to specify the number of characters preceding a
pattern. The IRE uses an escaped pair of curly braces and takes three forms:
ch\{m\} the ch can occur m times
ch\{m,n\} ch can occur between m and n times
ch\{m,\} ch can occur at least m times
The value of m and n can't exceed 255. Let teledir.txt maintains landline and mobile phone
numbers. To select only mobile numbers, use IRE to indicate that a numerical can occur 10 times.
grep [0-9]\{10\} teledir.txt
Line length between 101 and 150
grep ^.\{101,150\}$ foo
Line length at least 101
sed n /.{101,\}/p foo
Pag e 8 6
10CS44
Pag e 8 7
10CS44
UNIT 6
6.1. Essential Shell Programming
Definition:
Shell is an agency that sits between the user and the UNIX system.
Description:
Shell is the one which understands all user directives and carries them out. It processes the
commands issued by the user. The content is based on a type of shell called Bourne shell.
Shell Scripts
When groups of command have to be executed regularly, they should be stored in a file, and the
file itself executed as a shell script or a shell program by the user. A shell program runs in
interpretive mode. It is not complied with a separate executable file as with a C program but each
statement is loaded into memory when it is to be executed. Hence shell scripts run slower than the
programs written in high-level language. .sh is used as an extension for shell scripts. However the
use of extension is not mandatory.
Shell scripts are executed in a separate child shell process which may or may not be same as the
login shell.
Example: script.sh
#! /bin/sh
# script.sh: Sample Shell Script
echo Welcome to Shell Programming
echo Todays date : `date`
echo This months calendar:
cal `date +%m 20%y`
Pag e 8 8
10CS44
$ chmod +x script.sh
Then invoke the script name as:
$ script.sh
Once this is done, we can see the following output :
Welcome to Shell Programming
Todays date: Mon Oct 8 08:02:45 IST 2007
This months calendar:
October 2007
Su
Mo
Tu
We
Th
Fr
Sa
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
My Shell: /bin/Sh
As stated above the child shell reads and executes each statement in interpretive mode. We can
also explicitly spawn a child of your choice with the script name as argument:
sh script.sh
Note: Here the script neither requires a executable permission nor an interpreter line.
Pag e 8 9
10CS44
Example: A shell script that uses read to take a search string and filename from the
terminal.
#! /bin/sh
# emp1.sh: Interactive version, uses read to accept two inputs
#
echo Enter the pattern to be searched: \c
# No newline
read pname
echo Enter the file to be used: \c
read fname
echo Searching for pattern $pname from the file $fname
grep $pname $fname
echo Selected records shown above
Running of the above script by specifying the inputs when the script pauses twice:
$ emp1.sh
Enter the pattern to be searched : director
Enter the file to be used: emp.lst
Searching for pattern director from the file emp.lst
9876
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
Significance
Pag e 9 0
10CS44
$1, $2
$#
$0
$*
$ @
$?
$$
$!
Pag e 9 1
10CS44
&& delimits two commands. cmd 2 executed only when cmd1 succeeds.
Example1:
$ grep director emp.lst && echo Pattern found
Output:
9876
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
Pattern found
Example 2:
$ grep clerk emp.lst || echo Pattern not found
Output:
Pattern not found
Example 3:
grep $1 $2 || exit 2
echo Pattern Found Job Over
The if Conditional
The if statement makes two way decisions based on the result of a condition. The following forms
of if are available in the shell:
Form 1
Form 2
if command is successful
if command is successful
then
then
execute commands
fi
Form 3
if command is successful
then
execute commands
execute commands
elif command is successful
el s e
execute commands
then...
fi
else...
fi
Pag e 9 2
10CS44
If the command succeeds, the statements within if are executed or else statements in else block
are executed (if else present).
Example:
#! /bin/sh
if grep ^$1 /etc/passwd 2>/dev/null
then
echo Pattern Found
else
echo Pattern Not Found
fi
Output1:
$ emp3.sh ftp
ftp: *.325:15:FTP User:/Users1/home/ftp:/bin/true
Pattern Found
Output2:
$ emp3.sh mail
Pattern Not Found
While: Looping
To carry out a set of instruction repeatedly shell offers three features namely while, until and for.
Syntax:
while condition is true
do
Commands
done
The commands enclosed by do and done are executed repeatedly as long as condition is true.
Example:
Pag e 9 3
10CS44
#! /bin/usr
ans=y
while [$ans=y]
do
echo Enter the code and description : \c > /dev/tty
read code description
echo $code $description >>newlist
echo Enter any more [Y/N]
r e a d a ny
case $any in
Y* | y* ) answer =y;;
N* | n*) answer = n;;
*) answer=y;;
esac
done
Input:
Enter the code and description : 03 analgestics
Enter any more [Y/N] :y
Enter the code and description : 04 antibiotics
Enter any more [Y/N] : [Enter]
Enter the code and description : 05 OTC drugs
Enter any more [Y/N] : n
Output:
$ cat newlist
03 | analgestics
Pag e 9 4
10CS44
04 | antibiotics
05 | OTC drugs
Test doesnt display any output but simply returns a value that sets the parameters $?
Numeric Comparison
Operator
Meaning
-eq
Equal to
-ne
Not equal to
-gt
Greater than
-ge
-lt
Less than
-le
Operators always begin with a (Hyphen) followed by a two word character word and enclosed
on either side by whitespace.
Numeric comparison in the shell is confined to integer values only, decimal values are simply
truncated.
Ex:
$x=5;y=7;z=7.2
1. $test $x eq $y; echo $?
1
2. $test $x lt $y; echo $?
0
Not equal
True
Pag e 9 5
10CS44
7.2 is equal to 7
Pag e 9 6
10CS44
String Comparison
Test command is also used for testing strings. Test can be used to compare strings with the
following set of comparison operators as listed below.
Test
True if
s1=s2
String s1=s2
s1!=s2
-n stg
-z stg
st g
s1= =s2
String s1=s2
Pag e 9 7
10CS44
Output2:
$emp1.sh
Enter the string to be searched :root
Enter the filename to be searched :/etc/passwd
Root:x:0:1:Super-user:/:/usr/bin/bash
When we run the script with arguments emp1.sh bypasses all the above activities and calls
emp.sh to perform all validation checks
$emp1.sh jai
You didnt enter two arguments
$emp1.sh jai emp,lst
9878|jai sharma|director|sales|12/03/56|70000
$emp1.sh jai sharma emp.lst
You didnt enter two arguments
Pag e 9 8
10CS44
Because $* treats jai and sharma are separate arguments. And $# makes a wrong argument count.
Solution is replace $* with $@ (with quote and then run the script.
File Tests
Test can be used to test various file attributes like its type (file, directory or symbolic links) or its
permission (read, write. Execute, SUID, etc).
Example:
$ ls l emp.lst
-rw-rw-rw-
Ordinary file
0
$ [-x emp.lst] ; echo $?
Not an executable.
1
$ [! -w emp.lst] || echo False that file not writeable
False that file is not writable.
Example: filetest.sh
#! /bin/usr
#
if [! e $1] : then
Echo File doesnot exist
elif [! r S1]; then
Echo File not readable
elif[! w $1]; then
Echo File not writable
el s e
Echo File is both readable and writable\
fi
Pag e 9 9
10CS44
Output:
$ filetest.sh emp3.lst
File does not exist
$ filetest.sh emp.lst
File is both readable and writable
The following table depicts file-related Tests with test:
Test
True if
-f file
-r file
-w file
-x file
-d file
-s file
-e file
-u file
-k file
-L file
f1 nt f2
f1 ot f2
f1 ef f2
Page 100
10CS44
Esac
Case first matches expression with pattern1. if the match succeeds, then it executes
commands1, which may be one or more commands. If the match fails, then pattern2 is
matched and so forth. Each command list is terminated with a pair of semicolon and the
entire construct is closed with esac (reverse of case).
Example:
#! /bin/sh
#
ech o
Menu\n
1. List of files\n2. Processes of user\n3. Todays Date
4. Users of system\n5.Quit\nEnter your option: \c
read ch o i ce
case $choice in
1) ls l;;
2) ps f ;;
3) date ;;
4) who ;;
5) exit ;;
*) echo Invalid option
esac
Output
$ m en u . s h
Menu
1. List of files
2. Processes of user
Dept. of CSE, SJBIT
Page 101
10CS44
3. Todays Date
4. Users of system
5. Quit
Enter your option: 3
Mon Oct 8 08:02:45 IST 2007
Note:
case can not handle relational and file test, but it matches strings with compact code. It is
very effective when the string is fetched by command substitution.
Page 102
10CS44
$ expr $y/$x
1
$ expr 13%5
3
expr is also used with command substitution to assign a variable.
Example1:
$ x=6 y=2 : z=`expr $x+$y`
$ ech o $ z
8
Example2:
$ x=5
$ x=`expr $x+1`
$ echo $x
6
Page 103
10CS44
String Handling:
expr is also used to handle strings. For manipulating strings, expr uses two expressions
separated by a colon (:). The string to be worked upon is closed on the left of the colon
and a regular expression is placed on its right. Depending on the composition of the
expression expr can perform the following three functions:
1. Determine the length of the string.
2. Extract the substring.
3. Locate the position of a character in a string.
1. Length of the string:
The regular expression .* is used to print the number of characters
matching the pattern .
Example1:
$ expr abcdefg : .*
7
Example2:
while echo Enter your name: \c ;do
read n am e
if [`expe $name :.*` -gt 20] ; then
echo Name is very long
else
break
fi
done
2. Extracting a substring:
expr can extract a string enclosed by the escape characters \ (and \).
Example:
$ st=2007
$ expr $st :..\(..\)
07
Page 104
10CS44
Example:
$ stg = abcdefgh ; expr $stg : [^d]*d
4
Extracts the position of character d
$0: Calling a Script by Different Names
There are a number of UNIX commands that can be used to call a file by different names
and doing different things depending on the name by which it is called. $0 can also be to
call a script by different names.
Example:
#! /bin/sh
#
lastfile=`ls t *.c |head -1`
co m m an d = $ 0
exe=`expr $lastfile: \(.*\).c`
cas e $ co m m an d i n
*runc) $exe ;;
*vic) vi $lastfile;;
*comc) cc o $exe $lastfile &&
Echo $lastfile compiled successfully;;
esac
After this create the following three links:
ln comc.sh comc
ln comc.sh runc
ln comc.sh vic
Output:
$ co m c
hello.c compiled successfully.
While: Looping
To carry out a set of instruction repeatedly shell offers three features namely while, until and for.
Synatx:
while condition is true
do
Commands
done
Page 105
10CS44
The commands enclosed by do and done are executed repadetedly as long as condition is true.
Example:
#! /bin/usr
ans=y
while [$ans=y]
do
echo Enter the code and description : \c > /dev/tty
read code description
echo $code $description >>newlist
echo Enter any more [Y/N]
r e a d a ny
case $any in
Y* | y* ) answer =y;;
N* | n*) answer = n;;
*) answer=y;;
esac
done
Input:
Enter the code and description : 03 analgestics
Enter any more [Y/N] :y
Enter the code and description : 04 antibiotics
Enter any more [Y/N] : [Enter]
Enter the code and description : 05 OTC drugs
Enter any more [Y/N] : n
Output:
$ cat newlist
Page 106
10CS44
03 | analgestics
04 | antibiotics
05 | OTC drugs
(2)
while [ ! -r $1 ] ; do
sleep $2
done
list here comprCSEs a series of character strings. Each string is assigned to variable
specified.
Example:
for file in ch1 ch2; do
> cp $file ${file}.bak
> echo $file copied to $file.bak
done
Output:
ch1 copied to ch1.bak
ch2 copied to ch2.bak
Sources of list:
List from variables: Series of variables are evaluated by the shell before
executing the loop
Page 107
10CS44
Example:
$ for var in $PATH $HOME; do echo $var ; done
Output:
/bin:/usr/bin;/home/local/bin;
/home/user1
List from wildcards: Here the shell interprets the wildcards as filenames.
Example:
for file in *.htm *.html ; do
sed s/strong/STRONG/g
s/img src/IMG SRC/g $file > $$
mv $$ $file
done
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
Page 108
10CS44
Example1:
$basename /home/user1/test.pl
Ouput:
test.pl
Example2:
$basename test2.doc doc
Ouput:
test2
Example3: Renaming filename extension from .txt to .doc
for file in *.txt ; do
leftname=`basename $file .txt` Stores left part of filename
m v $ fi l e $ { l e ft n a m e } . d o c
done
set and shift: Manipulating the Positional Parameters
The set statement assigns positional parameters $1, $2 and so on, to its arguments. This is used
for picking up individual fields from the output of a program.
Example 1:
$ set 9876 2345 6213
$
This assigns the value 9876 to the positional parameters $1, 2345 to $2 and 6213 to $3. It also
sets the other parameters $# and $*.
Example 2:
$ set `date`
$ echo $*
Mon Oct 8 08:02:45 IST 2007
Example 3:
$ echo The date today is $2 $3, $6
Page 109
10CS44
$ echo $@
Mon Oct 8 08:02:45 IST 2007
$ echo $1 $2 $3
Mon Oct 8
$shift
$echo $1 $2 $3
Mon Oct 8 08:02:45
Shifts 2 places
$shift 2
$echo $1 $2 $3
08:02:45 IST 2007
Example 2: emp.sh
#! /bin/sh
Case $# in
Page 110
10CS44
9876
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
Page 111
10CS44
Jai Sharma
Director
Productions
2356
Rohit
Director
Sales
Page 112
10CS44
Example: To remove all temporary files named after the PID number of the shell:
trap rm $$* ; echo Program Interrupted ; exit HUP INT TERM
trap is a signal handler. It first removes all files expanded from $$*, echoes a message and finally
terminates the script when signals SIGHUP (1), SIGINT (2) or SIGTERM(15) are sent to the
shell process running the script.
A script can also be made to ignore the signals by using a null command list.
Example:
trap 1 2 15
Programs
1)
#!/bin/sh
IFS=|
While echo enter dept code:\c; do
Read dcode
Set -- `grep ^$dcode<<limit
01|CSE|22
02|CSE|45
03|ECE|25
04|TCE|58
limit`
Case $# in
3) echo dept name :$2 \n emp-id:$3\n
*) echo invalid code;continue
es ac
done
Output:
$valcode.sh
Page 113
10CS44
Page 114
10CS44
Echo length is $x
Done
4)
This is a non-recursive shell script that accepts any number of arguments and prints them in a
reverse order.
For example if A B C are entered then output is C B A.
#!/bin/sh
if [ $# -lt 2 ]; then
echo "please enter 2 or more arguments"
exit
fi
for x in $@
do
y=$x" "$y
done
echo "$y"
Run1:
[root@localhost shellprgms]# sh sh1a.sh 1 2 3 4 5 6 7
7654321
Run2: [root@localhost shellprgms]# sh ps1a.sh this is an argument
argument an is this
5)
The following shell script to accept 2 file names checks if the permission for these files are
identical and if they are not identical outputs each filename followed by permission.
#!/bin/sh
if [ $# -lt 2 ]
then
Page 115
10CS44
Page 116
10CS44
exit
fi
if [ -d $1 ]
then
ls -lR $1|grep -v ^d|cut -c 34-43,56-69|sort -n|tail -1>fn1
echo "file name is `cut -c 10- fn1`"
echo " the size is `cut -c -9 fn1`"
el s e
echo "invalid dir name"
fi
Run1:
[root@localhost shellprgms]# sh 3a.sh
file name is a.out
the size is
12172
7)This shell script that accepts valid log-in names as arguments and prints their corresponding
home directories. If no arguments are specified, print a suitable error message.
if [ $# -lt 1 ]
then
echo " Invlaid Arguments....... "
exit
fi
for x in "$@"
do
grep -w "^$x" /etc/passwd | cut -d ":" -f 1,6
done
Run1:
Page 117
10CS44
Page 118
10CS44
3 4
5 6 7
8 9
** 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
Page 119
10CS44
29 30 31
10) This shell script implements terminal locking. Prompt the user for a password after accepting,
prompt for confirmation, if match occurs it must lock and ask for password, if it matches terminal
must be unlocked
t r a p " 1 2 3 5 20
clear
echo -e \nenter password to lock terminal:"
stty -echo
read keynew
stty echo
echo -e \nconfirm password:"
stty -echo
read keyold
stty echo
if [ $keyold = $keynew ]
then
echo "terminal locked!"
while [ 1 ]
do
echo "retype the password to unlock:"
stty -echo
read key
if [ $key = $keynew ]
then
stty echo
echo "terminal unlocked!"
stty sane
Page 120
10CS44
exit
fi
echo "invalid password!"
done
el s e
echo " passwords do not match!"
fi
stty sane
Run1:
Page 121
10CS44
UNIT 7
7.1. Awk- An Advanced Filter
Introduction
awk is a programmable, pattern-matching, and processing tool available in UNIX. It
works equally well with text and numbers. It derives its name from the first letter of the
last name of its three authors namely Alfred V. Aho, Peter J. Weinberger and Brian W.
Kernighan.
Simple awk Filtering
awk is not just a command, but a programming language too. In other words, awk utility
is a pattern scanning and processing language. It searches one or more files to see if they
contain lines that match specified patterns and then perform associated actions, such as
writing the line to the standard output or incrementing a counter each time it finds a
match.
Sy n t ax :
awk option selection_criteria {action} file(s)
Here, selection_criteria filters input and selects lines for the action component to act
upon. The selection_criteria is enclosed within single quotes and the action within the
curly braces. Both the selection_criteria and action forms an awk program.
Example: $ awk /manager/ { print } emp.lst
Output:
9876
Jai Sharma
Manager
Productions
2356
Rohit
Manager
Sales
5683
Rakesh
Manager
Marketing
In the above example, /manager/ is the selection_criteria which selects lines that are processed in
the action section i.e. {print}. Since the print statement is used without any field specifiers, it
prints the whole line.
Note: If no selection_criteria is used, then action applies to all lines of the file.
Since printing is the default action of awk, any one of the following three forms can be used:
awk /manager/ emp.lst
Page 122
10CS44
Rohit
Manager
Sales
5683
Rakesh
Manager
Marketing
Manager
Productions
Rahul
Accountant |
Productions
Rakesh
Clerk
Productions
In the above example, comma (,) is used to delimit field specifications to ensure that each field is
separated from the other by a space so that the program produces a readable output.
Note: We can also specify the number of lines we want using the built-in variable NR as
illustrated in the following example:
Example: awk F | NR==2, NR==4 { print NR, $2, $3, $4 } emp.lst
Output:
Page 123
10CS44
Jai Sharma
Manager
Productions
Rahul
Accountant
Productions
Rakesh
Clerk
Productions
R Kumar
Manager
Sunil kumaar
Accountant
Anil Kummar
Clerk
Here, the name and designation have been printed in spaces 20 and 12 characters wide
respectively.
Note: The printf requires \n to print a newline after each line.
Page 124
10CS44
prints Hello
prints two tabs followed by the string Hello and
sounds a beep.
String concatenation can also be performed. Awk does not provide any operator for this,
however strings can be concatenated by simply placing them side-by-side.
Example 1: z = "Hello" "World"
print z
Example 3: x = UNIX
y = LINUX
print x & y
Page 125
10CS44
Expressions also have true and false values associated with them. A nonempty string or
any positive number has true value.
This is true if c is a nonempty string or
positive number.
Example: if(c)
The Comparison Operators
awk also provides the comparison operators like >, <, >=, <= ,==, !=, etc..,
Example 1 : $ awk F | $3 == manager || $3 == chairman {
> printf %-20s %-12s %d\n, $2, $3, $5} emp.lst
Output:
ganesh
chairman
15000
jai sharma
manager
9000
rohit
manager
8750
rakesh
manager
8500
The
for two
filed ($3). The second attempted only if (||) the first match fails.
Note: awk uses the || and && logical operators as in C and UNIX shell.
Accountant
7000
Anil Kummar
Clerk
6000
Page 126
10CS44
Rahul
Accountant
7000
Rakesh
Clerk
6000
The above example illustrates the use of != and && operators. Here all the employee
records other than that of manager and chairman are displayed.
~ and !~ : The Regular Expression Operators:
The operators ~ and !~ work only with field specifiers like $1, $2, etc.,.
For instance, to locate g.m s the following command does not display the expected output,
because the word g.m. is embedded in d.g.m or c.g.m.
$ awk F | $3 ~ /g.m./ {printf ..
prints fields including g.m like g.m, d.g.m and c.g.m
To avoid such unexpected output, awk provides two operators ^ and $ that indicates the
beginning and end of the filed respectively. So the above command should be modified as
follows:
$ awk F | $3 ~ /^g.m./ {printf ..
prints fields including g.m only and not d.g.m or c.g.m
The following table depicts the comparison and regular expression matching operators.
Operator
Significance
<
Less than
<=
==
Equal to
!=
Not equal to
Page 127
10CS44
>=
>
Greater than
!~
Number Comparison:
Awk has the ability to handle numbers (integer and floating type). Relational test or comparisons
can also be performed on them.
Example: $ awk F | $5 > 7500 {
> printf %-20s %-12s %d\n, $2, $3, $5} emp.lst
Output:
In
ganesh
chairman
15000
jai sharma
manager
9000
rohit
manager
8750
rakesh
manager
8500
chairman
15000
30/12/1950
jai sharma
manager
9000
01/01/1980
rohit
manager
8750
10/05/1975
Page 128
10CS44
rakesh
manager
8500
20/05/1975
Rahul
Accountant
6000
01/10/1980
Anil
Clerk
5000
20/05/1980
In the above example, the details of employees getting salary greater than 7500 or whose year of
birth is 1980 are displayed.
Number Processing
Numeric computations can be performed in awk using the arithmetic operators like +, -, /, *, %
(modulus). One of the main feature of awk w.r.t. number processing is that it can handle even
decimal numbers, which is not possible in shell.
Example: $ awk F | $3 == manager {
> printf %-20s %-12s %d\n, $2, $3, $5, $5*0.4} emp.lst
Output:
jai sharma
manager
9000
3600
rohit
manager
8750
3500
rakesh
manager
8500
3250
Variables
Awk allows the user to use variables of there choice. You can now print a serial number,
using the variable kount, and apply it those directors drawing a salary exceeding 6700:
$ awk F| $3 == director && $6 > 6700 {
kount =kount+1
printf %3f %20s %-12s %d\n, kount,$2,$3,$6 } empn.lst
The initial value of kount was 0 (by default). Thats why the first line is correctly
assigned the number 1. awk also accepts the C- style incrementing forms:
Kount ++
Kount +=2
Printf %3d\n, ++kount
THE f OPTION: STORING awk PROGRAMS INA FILE
Page 129
10CS44
You should holds large awk programs in separate file and provide them with the
.awk extension for easier identification. Lets first store the previous program in the file
empawk.awk:
$ cat empawk.awk
Observe that this time we havent used quotes to enclose the awk program. You
can now use awk with the f filename option to obtain the same output:
Awk F| f empawk.awk empn.lst
THE BEGIN AND END SECTIONS
Awk statements are usully applied to all lines selected by the address, and if there
are no addresses, then they are applied to every line of input. But, if you have to print
something before processing the first line, for example, a heading, then the BEGIN
section can be used gainfully. Similarly, the end section useful in printing some totals
after processing is over.
The BEGIN and END sections are optional and take the form
BEGIN {action}
E N D { act i o n }
These two sections, when present, are delimited by the body of the awk program. You
can use them to print a suitable heading at the beginning and the average salary at the
end. Store this program, in a separate file empawk2.awk
Like the shell, awk also uses the # for providing comments. The BEGIN section
prints a suitable heading , offset by two tabs (\t\t), while the END section prints the
average pay (tot/kount) for the selected lines. To execute this program, use the f option:
$awk F| f empawk2.awk empn.lst
Like all filters, awk reads standard input when the filename is omitted. We can make awk
behave like a simple scripting language by doing all work in the BEGIN section. This is
how you perform floating point arithmetic:
$ awk BEGIN {printf %f\n, 22/7 }
3.142857
This is something that you cant do with expr. Depending on the version of the awk the
prompt may be or may not be returned, which means that awk may still be reading
standard input. Use [ctrl-d] to return the prompt.
BUILT-IN VARIABLES
Awk has several built-in variables. They are all assigned automatically, though it
is also possible for a user to reassign some of them. You have already used NR, which
signifies the record number of the current line. Well now have a brief look at some of the
other variable.
Dept. of CSE, SJBIT
Page 130
10CS44
The FS Variable: as stated elsewhere, awk uses a contiguous string of spaces as the
default field delimeter. FS redefines this field separator, which in the sample database
happens to be the |. When used at all, it must occur in the BEGIN section so that the body
of the program knows its value before it starts processing:
BEGIN {FS=|}
This is an alternative to the F option which does the same thing.
The OFS Variable: when you used the print statement with comma-separated arguments,
each argument was separated from the other by a space. This is awks default output field
separator, and can reassigned using the variable OFS in the BEGIN section:
BEGIN { OFS=~ }
When you reassign this variable with a ~ (tilde), awk will use this character for delimiting
the print arguments. This is a useful variable for creating lines with delimited fields.
The NF variable: NF comes in quite handy for cleaning up a database of lines that dont
contain the right number of fields. By using it on a file, say emp.lst, you can locate those
lines not having 6 fields, and which have crept in due to faulty data entry:
$awk BEGIN { FS = | }
NF !=6 {
Print Record No , NR, has , fields} empx.lst
The FILENAME Variable: FILENAME stores the name of the current file being
processed. Like grep and sed, awk can also handle multiple filenames in the command
line. By default, awk doesnt print the filename, but you can instruct it to do so:
$6<4000 {print FILENAME, $0 }
With FILENAME, you can device logic that does different things depending on the file
that is processed.
ARRAYS
An array is also a variable except that this variable can store a set of values or
elements. Each element is accessed by a subscript called the index. Awk arrays are
different from the ones used in other programming languages in many respects:
They are not formally defined. An array is considered declared the
moment it is used.
Array elements are initialized to zero or an empty string unless initialized
explicitly.
Arrays expand automatically.
The index can be virtually any thing: it can even be a string.
Dept. of CSE, SJBIT
Page 131
10CS44
In the program empawk3.awk, we use arrays to store the totals of the basic pay, da, hra
and gross pay of the sales and marketing people. Assume that the da is 25%, and hra 50%
of basic pay. Use the tot[] array to store the totals of each element of pay, and also the
gross pay:
Note that this time we didnt match the pattern sales and marketing specifically in a field.
We could afford to do that because the patterns occur only in the fourth field, and theres
no scope here for ambiguity. When you run the program, it outputs the average of the two
el em en t s o f p ay :
$ awk f empawk3.awk empn.lst
C-programmers will find the syntax quite comfortable to work with except that awk simplifies a
number of things that require explicit specifications in C. there are no type declarations, no
initialization and no statement terminators.
Associative arrays
Even though we used integers as subscripts in the tot [ ] array, awk doesnt treat array
indexes as integers. Awk arrays are associative, where information is held as key-value pairs. The
index is the key that is saved internally as a string. When we set an array element using
mon[1]=mon, awk converts the number 1 to a string. Theres no specified order in which the
array elements are stored. As the following example suggests, the index 1 is different from
01 :
$ awk BEGIN {
direction [N] = North ; direction [S] ;
direction [E] = East ; direction [W] = West ;]
printf(N is %s and W is %s \n, direction[N], direction [W]);
Page 132
10CS44
FUNCTIONS
Awk has several built in functions, performing both arithmetic and string operations. The
arguments are passed to a function in C-style, delimited by commas and enclosed by a matched
pair of parentheses. Even though awk allows use of functions with and without parentheses (like
printf and printf()), POSIX discourages use of functions without parentheses.
Some of these functions take a variable number of arguments, and one (length) uses no arguments
as a variant form. The functions are adequately explained here so u can confidently use them in
perl which often uses identical syntaxes.
There are two arithmetic functions which a programmer will except awk to offer. int calculates
the integral portion of a number (without rounding off),while sqrt calculates square root of a
number. awk also has some of the common string handling function you can hope to find in any
language. There are:
length: it determines the length of its arguments, and if no argument is present, the enire line is
assumed to be the argument. You can use length (without any argument) to locate lines whose
length exceeds 1024 characters:
awk F| length > 1024 empn.lst
you can use length with a field as well. The following program selects those people who have
short names:
awk F| length ($2) < 11 empn.lst
index(s1, s2): it determines the position of a string s2within a larger string s1. This function is
especially useful in validating single character fields. If a field takes the values a, b, c, d or e you
can use this function n to find out whether this single character field can be located within a string
abcde:
Page 133
10CS44
x = index (abcde, b)
This returns the value 2.
substr (stg, m, n): it extracts a substring from a string stg. m represents the starting point of
extraction and n indicates the number of characters to be extracted. Because string values can also
be used for computation, the returned string from this function can be used to select those born
between 1946 and 1951:
awk F| substr($5, 7, 2) > 45 && substr($5, 7, 2) < 52 empn.lst
2365|barun sengupta|director|personel|11/05/47|7800|2365
3564|sudhir ararwal|executive|personnel|06/07/47|7500|2365
4290|jaynth Choudhury|executive|production|07/09/50|6000|9876
9876|jai sharma|director|production|12/03/50|7000|9876
you can never get this output with either sed and grep because regular expressions can never
match the numbers between 46 and 51. Note that awk does indeed posses a mechanism of
identifying the type of expression from its context. It identified the date field string for using
substr and then converted it to a number for making a numeric comparison.
split(stg, arr, ch): it breaks up a string stg on the delimiter ch and stores the fields in an array
arr[]. Heres how yo can convert the date field to the format YYYYMMDD:
$awk F | {split($5, ar, /); print 19ar[3]ar[2]ar[1]} empn.lst
19521212
19501203
19431904
..
You can also do it with sed, but this method is superior because it explicitly picks up the fifth
field, whereas sed would transorm the only date field that it finds.
system: you may want to print the system date at the beging of the report. For running a UNIX
command within a awk, youll have to use the system function. Here are two examples:
BEGIN {
system(tput clear)
system(date)
Page 134
10CS44
Description
int(x)
sqrt(x)
length
length(x)
returns length of x
substr(stg, m, n)
index(1s, s2)
System(cmd)
The if statement can be used when the && and || are found to be inadequate for certain
tasks. Its behavior is well known to all programmers. The statement here takes the form:
If (condition is true) {
Statement
} else {
Statement
}
Like in C, none of the control flow constructs need to use curly braces if theres only one
statement to be executed. But when there are multiple actions take, the statement must be
enclosed within a pair of curly braces. Moreover, the control command must be enclosed in
parentheses.
Most of the addresses that have been used so far reflect the logic normally used in the if
statement. In a previous example, you have selected lines where the basic pay exceeded 7500, by
using the condition as the selection criteria:
$6 > 7500 {
Page 135
10CS44
An alternative form of this logic places the condition inside the action component rather
than the selection criteria. But this form requires the if statement:
Awk F | { if ($6 > 7500) printf .
if can be used with the comparison operators and the special symbols ~ and !~ to match a regular
expression. When used in combination with the logical operators || and &&, awk programming
becomes quite easy and powerful. Some of the earlier pattern matching expressions are rephrased
in the following, this time in the form used by if:
if ( NR > = 3 && NR <= 6 )
if ( $3 == director || $3 == chairman )
if ( $3 ~ /^g.m/ )
if ( $2 !~ / [aA]gg?[ar]+wal/ )
if ( $2 ~[cC]ho[wu]dh?ury|sa[xk]s?ena/ )
To illustrate the use of the optional else statement, lets assume that the dearness
allowance is 25% of basic pay when the latter is less than 600, and 1000 otherwCSE. The if-else
structure that implants this logic looks like this:
If ( $6 < 6000 )
da = 0.25*$6
else
da = 1000
You can even replace the above if construct with a compact conditional structure:
$6 < 6000 ? da = 0.25*$6 : da = 1000
This is the form that C and perl use to implement the logic of simple if-else construct.
The ? and : act as separator of the two actions.
When you have more than one statement to be executed, they must be bounded by a pair
of curly braces (as in C). For example, if the factors determining the hra and da are in turn
dependent on the basic pay itself, then you need to use terminators:
If ( $6 < 6000 ) {
hra = 0.50*$6
da = 0.25*$6
}else {
Page 136
10CS44
hra = 0.40*$6
da = 1000
}
Page 137
10CS44
commamds
Here, k is the subscript of the array arr. Because k can also be a string, we can use this loop to
print all environment variables. We simply have to pick up each subscript of the ENVIRON
array:
$ nawk BIGIN {
>for ( key in ENVIRON )
>print key = ENVIRON [key]
>}
LOGNAME=praveen
MAIL=/var/mail/Praveen
PATH=/usr/bin::/usr/local/bin::/usr/ccs/bin
TERM=xterm
HOME=/home/praveen
SHELL=/bin/bash
Because the index is actually a string, we can use any field as index. We can even use elements of
the array counters. Using our sample databases, we can display the count of the employees,
grouped according to the disgnation ( the third field ). You can use the string value of $3 as the
subscript of the array kount[]:
$awk F| { kount[$3]++ }
>END { for ( desig in kount)
>print desig, kount[desig] } empn.lst
g.m
chairman
executive
director 4
manager
d.g.m
The program here analyzes the databases to break up of the employees, grouped on their
designation. The array kount[] takes as its subscript non-numeric values g.m., chairman,
executive, etc.. for is invoked in the END section to print the subscript (desig) and the number of
Page 138
10CS44
occurrence of the subscript (kount[desig]). Note that you dont need to sort the input file to print
the report!
Page 139
10CS44
if(flag)
{
x[i]=$0;
printf "%s \n",x[i];
i++;
}
}
Run1:
[root@localhost shellprgms]$ cat >for7.txt
hello
world
world
hello
this
is
this
Output:
[root@localhost shellprgms]$ awk -F "|" -f 11.awk for7.txt
hello
world
this
is
2)awk script to print the transpose of a matrix.
BEGIN{
system(tput clear)
count =0
Page 140
10CS44
}
{
split($0,a);
for(j=1;j<=NF;j++)
{ count = count+1
arr[count] =a[j]
}
K=NF
}
END{
printf(Transpose\n);
for(j=1;j<=K;j++)
{
for(i=j; i<=count; i=i+K)
{
printf(%s\t, arr[i]);
}
printf(\n);
}
}
Run1:
[root@localhost shellprgms]$ qwk f p8.awk
2
Page 141
10CS44
Transpose
2
3)Awk script that folds long line into 40 columns. Thus any line that exceeds 40 Characters
must be broken after 40th and is to be continued with the residue. The inputs to be supplied
through a text file created by the user.
BEGIN{
start=1; }
{ len=length;
for(i=$0; length(i)>40; len-=40)
{
print substr(i,1,40) "\\"
i=substr(i,41,len);
}
print i; }
Run1:
[root@localhost shellprgms]$ awk -F "|" -f 15.awk sample.txt
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaaaaa
aaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaa
Output:
[root@localhost shellprgms]$ cat sample.txt
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaa
Page 142
10CS44
aaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4)This is an awk program to provide extra spaces at the end of the line so that the line length is
maintained as 127.
awk { y=127 length($0)
printf %s, $0
if(y > 0)
for(i=0;i<y;i++)
printf %s,
printf \n
} foo
5)A file contains a fixed number of fields in the form of space-delimited numbers. This is an awk
program to print the lines as well as total of its rows.
awk { split($0,a)
for (i=1;i<=NF;i++) {
row[NR]+=a[$i]
}
printf %s, $0
printf %d\n, row[NR]
} f oo
Page 143
10CS44
UNIT 8
8.1. Perl The Mater Manipulator
Introduciton
The following sections tell you what Perl is, the variables and operators in perl, the string
handling functions. The chapter also discusses file handling in perl as also the lists, arrays and
associative arrays (hashes) that have made perl a popular scripting language. One or two lines of
code in perl accomplish many lines of code in a high level language. We finally discuss writing
subroutines in perl.
Objectives
perl preliminaries
The chop function
Variables and Operators
String handling functions
Specifying filenames in a command line
$_(Default Variable)
$. (Current Line Number) and .. (The Range Operator)
Lists and Arrays
ARGV[]: Command Line Arguments
foreach: Looping Through a List
split: Splitting into a List or Array
join: Joining a List
dec2bin.pl: Converting a Decimal Number to Binary
grep: Searching an Array for a Pattern
Associative Arrays
Regular Expressions and Substitution
File Handling
Subroutines
Conclusion
Perl preliminaries
Perl: Perl stands for Practical Extraction and Reporting Language. The language was developed
by Larry Wall. Perl is a popular programming language because of its powerful pattern matching
capabilities, rich library of functions for arrays, lists and file handling. Perl is also a popular
choice for developing CGI (Common Gateway Interface) scripts on the www (World Wide Web).
Perl is a simple yet useful programming language that provides the convenience of shell scripts
and the power and flexibility of high-level programming languages. Perl programs are interpreted
and executed directly, just as shell scripts are; however, they also contain control structures and
operators similar to those found in the C programming language. This gives you the ability to
write useful programs in a very short time.
Page 144
10CS44
Page 145
10CS44
Perl variables have no type and need no initialization. However we need to precede the variable
name with a $ for both variable initialization as well as evaluation.
Example:
$var=10;
print $var;
Comparison Operators
Perl supports operators similar to C for performing numeric comparison. It also provides
operators for performing string comparison, unlike C where we have to use either strcmp() or
strcmpi() for string comparison. The are listed next.
Numeric comparison
String comparison
==
eq
!=
ne
>
gt
<
lt
>=
ge
<=
le
Page 146
10CS44
Example:
$a = Info" . sys";
# $a is now Infosys"
# $a is now RRRRR"
Page 147
10CS44
Page 148
10CS44
Example:
perl ne print if ($. < 4) in.dat
perl ne print if ($. > 7 && $. < 11) in.dat # is similar to sed n 8,10p
.. is the range operator.
Example:
perl ne print if (1..3) in.dat # Prints lines 1 to 3 from in.dat
perl ne print if (8..10) in.dat # Prints lines 8 to 10 from in.dat
You can also use compound conditions for selecting multiple segments from a file.
Example: if ((1..2) || (13..15)) { print ;} # Prints lines 1 to 2 and 13 to 15
Page 149
10CS44
Arrays
Perl allows you to store lists in special variables designed for that purpose. These variables are
called array variables. Note that arrays in perl need not contain similar type of data. Also arrays in
perl can dynamically grow or shrink at run time.
@array = (1, 2, 3); # Here, the list (1, 2, 3) is assigned to the array variable @array.
Perl uses @ and $ to distinguish array variables from scalar variables, the same name can be used
in an array variable and in a scalar variable:
$var = 1;
@var = (11, 27.1, "a string");
Here, the name var is used in both the scalar variable $var and the array variable @var. These are
two completely separate variables. You retrieve value of the scalar variable by specifying $var,
and of that of array at index 1 as $var[1] respectively.
Following are some of the examples of arrays with their description.
x = 27;
@y = @x;
@x = (2, 3, 4);
@y = (1, @x, 5);
# the list (2, 3, 4) is substituted for @x, and the resulting list
# (1, 2, 3, 4,5) is assigned to @y.
$len = @y;
@y evaluates to the
Page 150
10CS44
pop(@list);
# Adds 1, 2 and 3 - 1 2 3 4 5
Page 151
10CS44
sqrt($number) .
\n);
}
You can even use the following code segment for performing the same task. Here note the use of
$_ as a default variable.
foreach (@ARGV) {
print(The square root of $_ is . sqrt() . \);
}
Another Example
#!/usr/bin/perl
@list = ("This", "is", "a", "list", "of", "words");
print("Here are the words in the list: \n");
foreach $temp (@list) {
print("$temp ");
}
print("\n");
Here, the loop defined by the foreach statement executes once for each element in the list @list.
The resulting output is
Here are the words in the list:
This is a list of words
The current element of the list being used as the counter is stored in a special scalar variable,
which in this case is $temp. This variable is special because it is only defined for the statements
inside the foreach loop.
perl has a for loop as well whose syntax similar to C.
Example:
for($i=0 ; $i < 3 ; $i++) { . . .
Page 152
10CS44
split breaks up a line or expression into fields. These fields are assigned either to variables or an
array.
Syntax:
($var1, $var2, $var3 . ) = split(/sep/, str);
@arr = split(/sep/, str);
It splits the string str on the pattern sep. Here sep can be a regular expression or a literal string.
str is optional, and if absent, $_ is used as default. The fields resulting from the split are assigned
to a set of variables , or to an array.
Page 153
10CS44
}
$binary_num = join(,@bit_arr);
print (Binary form of $temp is $binary_num\n);
splice(@bit_arr, 0, $#bit_arr+1);
}
The output of the above script (assuming script name is dec2bin.pl) is,
$ dec2bin.pl 10
Binary form of 10 is 1010
$ dec2bin.pl 8 12 15 10
Binary form of 8 is 1000
Binary form of 12 is 1100
Binary form of 15 is 1111
Binary form of 10 is 1010
$
Associative Arrays
In ordinary arrays, you access an array element by specifying an integer as the index:
@fruits = (9, 23, 11);
$count = $fruits[0];
# $count is now 9
In associative arrays, you do not have to use numbers such as 0, 1, and 2 to access array elements.
When you define an associative array, you specify the scalar values you want to use to access the
elements of the array. For example, here is a definition of a simple associative array:
%fruits=("apple", 9, "banana", 23, "cherry", 11);
Page 154
10CS44
It alternates the array subscripts and values in a comma separated strings. i.e., it is basically a
key-value pair, where you can refer to a value by specifying the key.
$fruits{apple} will retrieve 9. $fruits{banana} will retrieve 23 and so on.
Note the use of {} instead of [] here.
There are two associative array functions, keys and values.
keys: Holds the list of subscripts in a separate array.
values: Holds the value of each element in another array.
Normally, keys returns the key strings in a random sequence. To order the list alphabetically, use
sort function with keys.
1. foreach $key (sort(keys %region)) { # sorts on keys in the associative array, region
2. @key_list = reverse sort keys %region; # reverse sorts on keys in assoc. array, region
$val =~ s/a+/xyz/;
$val =~ s/a/b/g;
# replace all a's with b's;It also uses the g flag for global
# substitution
Here, the s prefix indicates that the pattern between the first / and the second is to be replaced by
the string between the second / and the third.
Here, any character matched by the first pattern is replaced by the corresponding character in the
second pattern.
Page 155
10CS44
7. Some sets are so common that special characters exist to represent them:
\d matches any digit, and is equivalent to [0-9].
\D doesnt match a digit, same as [^0-9].
\w matches any character that can appear in a variable name; it is equivalent to
[A-Za-z0-9_].
\W doesnt match a word character, same as [^a-zA-Z0-9_]
\s matches any whitespace (any character not visible on the screen); it is equivalent to [
\r\t\n\f].
perl accepts the IRE and TRE used by grep and sed, except that the curly braces and
parenthesis are not escaped.
For example, to locate lines longer than 512 characters using IRE:
perl ne print if /.{513,}/ filename # Note that we didnt escape the curly braces
Page 156
10CS44
File Handling
To access a file on your UNIX file system from within your Perl program, you must perform the
following steps:
1. First, your program must open the file. This tells the system that your Perl program wants to
access the file.
2. Then, the program can either read from or write to the file, depending on how you have
opened the file.
3. Finally, the program can close the file. This tells the system that your program no longer needs
access to the file.
To open a file we use the open() function.
open(INFILE, /home/srm/input.dat);
INFILE is the file handle. The second argument is the pathname. If only the filename is supplied,
the file is assumed to be in the current working directory.
open(OUTFILE,>report.dat);
Page 157
10CS44
close(INFILE);
close(OUTFILE);
File Tests
perl has an elaborate system of file tests that overshadows the capabilities of Bourne shell and
even find command that we have already seen. You can perform tests on filenames to see whether
the file is a directory file or an ordinary file, whether the file is readable, executable or writable,
and so on. Some of the file tests are listed next, along with a description of what they do.
if -d filename
if -e filename
if -f filename
True if it is a file
if -l filename
if -s filename
if -w filename
if -x filename
if -z filename
if -B filename
if -T filename
Subroutines
The use of subroutines results in a modular program. We already know the advantages of modular
approach. (They are code reuse, ease of debugging and better readability).
Frequently used segments of code can be stored in separate sections, known as subroutines. The
general form of defining a subroutine in perl is:
sub procedure_name {
# Body of the subroutine
}
Example: The following is a routine to read a line of input from a file and break it into words.
sub get_words {
$inputline = <>;
Page 158
10CS44
Return Values
In perl subroutines, the last value seen by the subroutine becomes the subroutine's return value.
That is the reason why we could refer to the array variable @words in the calling routine.
Page 159