Advanced Editing On UNIX - Kernighan
Advanced Editing On UNIX - Kernighan
Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
This paper is meant to help secretaries, typists and programmers to make effective use of
the UNIX†
facilities for preparing and editing text. It provides explanations and examples of
•special characters, line addressing and global commands in the editor ed;
•commands for ‘‘cut and paste’’ operations on files and parts of files, including the mv, cp, cat and rm commands,
and the r, w, m and t commands of the editor;
•editing scripts and editor-based programs like grep and sed.
Although the treatment is aimed at non-programmers, new users with any background should find helpful hints on
how to get their jobs done more easily.
November 2, 1997
Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974
-2-
-3-
-4-
The $ sign here provides context to make specific to count. What now?
which comma we mean. Without it, of course, the This is where the metacharacter ‘∗’ comes in
s command would operate on the first comma to handy. A character followed by a star stands for as
produce many consecutive occurrences of that character as
possible. To refer to all the spaces at once, say
Now is the time. for all good men,
s/x ∗y/x y/
As another example, to convert
The construction ‘ ∗’ means ‘as many spaces as
Now is the time.
possible’. Thus ‘x ∗y’ means ‘an x, as many
into spaces as possible, then a y’.
The star can be used with any character, not just
Now is the time?
space. If the original example was instead
as we did earlier, we can use
text x−−−−−−−−y text
s/.$/?/
then all ‘−’ signs can be replaced by a single space
Like ‘.’, the ‘$’ has multiple meanings depending with the command
on context. In the line
s/x−∗y/x y/
$s/$/$/
Finally, suppose that the line was
the first ‘$’ refers to the last line of the file, the sec-
text x..................y text
ond refers to the end of that line, and the third is a
literal dollar sign, to be added to that line. Can you see what trap lies in wait for the unwary?
If you blindly type
The Circumflex ‘ˆ’
s/x.∗y/x y/
The circumflex (or hat or caret) ‘ˆ’ stands for the
beginning of the line. For example, suppose you what will happen? The answer, naturally, is that it
are looking for a line that begins with ‘the’. If you depends. If there are no other x’s or y’s on the
simply say line, then everything works, but it’s blind luck, not
good management. Remember that ‘.’ matches
/the/
any single character? Then ‘.∗’ matches as many
you will in all likelihood find several lines that single characters as possible, and unless you’re
contain ‘the’ in the middle before arriving at the careful, it can eat up a lot more of the line than you
one you want. But with expected. If the line was, for example, like this:
/ˆthe/ text x text x................y text y text
you narrow the context, and thus arrive at the then saying
desired one more easily.
s/x.∗y/x y/
The other use of ‘ˆ’ is of course to enable you to
insert something at the beginning of a line: will take everything from the first ‘x’ to the last
‘y’, which, in this example, is undoubtedly more
s/ˆ/ /
than you wanted.
places a space at the beginning of the current line. The solution, of course, is to turn off the special
Metacharacters can be combined. To search for a meaning of ‘.’ with ‘\\.’:
line that contains only the characters
s/x\\.∗y/x y/
.PP Now everything works, for ‘\\.∗’ means ‘as many
you can use the command periods as possible’.
There are times when the pattern ‘.∗’ is exactly
/ˆ\\.PP$/
what you want. For example, to change
Now is the time for all good men ....
The Star ‘∗’
Suppose you have a line that looks like this: into
text x y text Now is the time.
where text stands for lots of text, and there are use ‘.∗’ to eat up everything after the ‘for’:
some indeterminate number of spaces between the
s/ for.∗/./
x and the y. Suppose the job is to replace all the
spaces between x and y by a single space. The line There are a couple of additional pitfalls associated
is too long to retype, and there are too many spaces with ‘∗’ that you should be aware of. Most notable
-- --
-5-
-6-
and the ‘&’ will stand for ‘the’. Of course this s/ very /\\
isn’t much of a saving if the thing matched is just .ul\\
‘the’, but if it is something truly long or awful, or very\\
if it is something like ‘.∗’ which matches a lot of /
text, you can save some tedious typing. There is
converts the line into four shorter lines, preceding
also much less chance of making a typing error in
the word ‘very’ by the line ‘.ul’, and eliminating
the replacement text. For example, to parenthesize
the spaces around the ‘very’, all at the same time.
a line, regardless of its length,
When a newline is substituted in, dot is left point-
s/.∗/(&)/ ing at the last line created.
The ampersand can occur more than once on the
Joining Lines
right side:
Lines may also be joined together, but this is done
s/the/& best and & worst/ with the j command instead of s. Given the lines
makes Now is
the time
Now is the best and the worst time
and supposing that dot is set to the first of them,
and
then the command
s/.∗/&? &!!/
j
converts the original line into
joins them together. No blanks are added, which is
Now is the time? Now is the time!! why we carefully showed a blank at the beginning
of the second line.
To get a literal ampersand, naturally the backslash
All by itself, a j command joins line dot to line
is used to turn off the special meaning:
dot+1, but any contiguous set of lines can be
s/ampersand/\\&/ joined. Just specify the starting and ending line
numbers. For example,
converts the word into the symbol. Notice that ‘&’
is not special on the left side of a substitute, only 1,$jp
on the right side.
joins all the lines into one big one and prints it.
(More on line numbers in Section 3.)
Substituting Newlines
ed provides a facility for splitting a single line
Rearranging a Line with \( ... \)
into two or more shorter lines by ‘substituting in a
(This section should be skipped on first reading.)
newline’. As the simplest example, suppose a line
Recall that ‘&’ is a shorthand that stands for what-
has gotten unmanageably long because of editing
ever was matched by the left side of an s com-
(or merely because it was unwisely typed). If it
mand. In much the same way you can capture sep-
looks like
arate pieces of what was matched; the only differ-
text xy text ence is that you have to specify on the left side just
what pieces you’re interested in.
you can break it between the ‘x’ and the ‘y’ like
Suppose, for instance, that you have a file of lines
this:
that consist of names in the form
s/xy/x\\
Smith, A. B.
y/
Jones, C.
This is actually a single command, although it is
and so on, and you want the initials to precede the
typed on two lines. Bearing in mind that ‘\\’ turns
name, as in
off special meanings, it seems relatively intuitive
that a ‘\\’ at the end of a line would make the new- A. B. Smith
line there no longer special. C. Jones
You can in fact make a single line into several
It is possible to do this with a series of editing
lines with this same mechanism. As a large exam-
commands, but it is tedious and error-prone. (It is
ple, consider underlining the word ‘very’ in a long
instructive to figure out how it is done, though.)
line by splitting ‘very’ onto a separate line, and
The alternative is to ‘tag’ the pieces of the pattern
preceding it by the roff or nroff formatting com-
(in this case, the last name, and the initials), and
mand ‘.ul’.
then rearrange the pieces. On the left side of a
text a very big text substitution, if part of the pattern is enclosed
between \( and \), whatever matched that part is
The command
remembered, and available for use on the right
-- --
-7-
side. On the right side, the symbol ‘\\1’ refers to As another example,
whatever matched the first \(...\\) pair, ‘\\2’ to the
.−3,.+3p
second \(...\\), and so on.
The command prints from three lines before where you are now
(at line dot) to three lines after, thus giving you a
1,$s/ˆ\\([ˆ,]∗\\), ∗\\(.∗\\)/\\2 \1/
bit of context. By the way, the ‘+’ can be omitted:
although hard to read, does the job. The first \(...\\)
matches the last name, which is any string up to
.−3,.3p
the comma; this is referred to on the right side with is absolutely identical in meaning.
‘\\1’. The second \(...\\) is whatever follows the Another area in which you can save typing effort
comma and any spaces, and is referred to as ‘\\2’. in specifying lines is to use ‘−’ and ‘+’ as line
Of course, with any editing sequence this compli- numbers by themselves.
cated, it’s foolhardy to simply run it and hope.
−
The global commands g and v discussed in section
4 provide a way for you to print exactly those lines by itself is a command to move back up one line in
which were affected by the substitute command, the file. In fact, you can string several minus signs
and thus verify that it did what you wanted in all together to move back up that many lines:
cases.
−−−
3. LINE ADDRESSING IN THE EDITOR moves up three lines, as does ‘−3’. Thus
The next general area we will discuss is that of
−3,+3p
line addressing in ed, that is, how you specify what
lines are to be affected by editing commands. We is also identical to the examples above.
have already used constructions like Since ‘−’ is shorter than ‘.−1’, constructions like
1,$s/x/y/ −,.s/bad/good/
to specify a change on all lines. And most users are useful. This changes ‘bad’ to ‘good’ on the
are long since familiar with using a single newline previous line and on the current line.
(or return) to print the next line, and with ‘+’ and ‘−’ can be used in combination with
searches using ‘/.../’ and ‘?...?’, and with ‘$’. The
/thing/
search
to find a line that contains ‘thing’. Less familiar,
/thing/−−
surprisingly enough, is the use of
finds the line containing ‘thing’, and positions you
?thing?
two lines before it.
to scan backwards for the previous occurrence of
‘thing’. This is especially handy when you realize Repeated Searches
that the thing you want to operate on is back up the Suppose you ask for the search
page from where you are currently editing.
/horrible thing/
The slash and question mark are the only charac-
ters you can use to delimit a context search, though and when the line is printed you discover that it
you can use essentially any character in a substi- isn’t the horrible thing that you wanted, so it is
tute command. necessary to repeat the search again. You don’t
have to re-type the search, for the construction
Address Arithmetic
//
The next step is to combine the line numbers like
‘.’, ‘$’, ‘/.../’ and ‘?...?’ with ‘+’ and ‘−’. Thus is a shorthand for ‘the previous thing that was
searched for’, whatever it was. This can be
$−1
repeated as many times as necessary. You can also
is a command to print the next to last line of the go backwards:
current file (that is, one line before line ‘$’). For
??
example, to recall how far you got in a previous
editing session, searches for the same thing, but in the reverse
direction.
$−5,$p
Not only can you repeat the search, but you can
prints the last six lines. (Be sure you understand use ‘//’ as the left side of a substitute command, to
why it’s six, not five.) If there aren’t six, of mean ‘the most recent pattern’.
course, you’ll get an error message.
/horrible thing/
.... ed prints line with ‘horrible thing’ ...
−− −−
-8-
-9-
- 10 -
mnemonic significance to the letter ‘v’.) So task is to change ‘x’ to ‘y’ and ‘a’ to ‘b’ on all
lines that contain ‘thing’. Then
v/ˆ\\./p
g/thing/s/x/y/\\
prints all the lines that don’t begin with ‘.’ — the
s/a/b/
actual text lines.
The command that follows g or v can be anything: is sufficient. The ‘\\’ signals the g command that
the set of commands continues on the next line; it
g/ˆ\\./d
terminates on the first line that does not end with
deletes all lines that begin with ‘.’, and ‘\\’. (As a minor blemish, you can’t use a substitute
command to insert a newline within a g com-
g/ˆ$/d
mand.)
deletes all empty lines. You should watch out for this problem: the com-
Probably the most useful command that can fol- mand
low a global is the substitute command, for this
g/x/s//y/\\
can be used to make a change and print each
s/a/b/
affected line for verification. For example, we
could change the word ‘Unix’ to ‘UNIX’ every- does not work as you expect. The remembered
where, and verify that it really worked, with pattern is the last pattern that was actually
executed, so sometimes it will be ‘x’ (as expected),
g/Unix/s//UNIX/gp
and sometimes it will be ‘a’ (not expected). You
Notice that we used ‘//’ in the substitute command must spell it out, like this:
to mean ‘the previous pattern’, in this case, ‘Unix’.
g/x/s/x/y/\\
The p command is done on every line that matches
s/a/b/
the pattern, not just those on which a substitution
took place. It is also possible to execute a, c and i commands
The global command operates by making two under a global command; as with other multi-line
passes over the file. On the first pass, all lines that constructions, all that is needed is to add a ‘\\’ at
match the pattern are marked. On the second pass, the end of each line except the last. Thus to add a
each marked line in turn is examined, dot is set to ‘.nf ’ and ‘.sp’ command before each ‘.EQ’ line,
that line, and the command executed. This means type
that it is possible for the command that follows a g
g/ˆ\\.EQ/i\\
or v to use addresses, set dot, and so on, quite
freely.
.nf\\
.sp
g/ˆ\\.PP/+
There is no need for a final line containing a ‘.’ to
prints the line that follows each ‘.PP’ command terminate the i command, unless there are further
(the signal for a new paragraph in some formatting commands being done under the global. On the
packages). Remember that ‘+’ means ‘one line other hand, it does no harm to put it in either.
past dot’. And
5. CUT AND PASTE WITH UNIX COM-
g/topic/?ˆ\\.SH?1
MANDS
searches for each line that contains ‘topic’, scans One editing area in which non-programmers seem
backwards until it finds a line that begins ‘.SH’ (a not very confident is in what might be called ‘cut
section heading) and prints the line that follows and paste’ operations — changing the name of a
that, thus showing the section headings under file, making a copy of a file somewhere else, mov-
which ‘topic’ is mentioned. Finally, ing a few lines from one place to another in a file,
inserting one file in the middle of another, splitting
g/ˆ\\.EQ/+,/ˆ\\.EN/−p
a file into pieces, and splicing two or more files
prints all the lines that lie between lines beginning together.
with ‘.EQ’ and ‘.EN’ formatting commands. Yet most of these operations are actually quite
The g and v commands can also be preceded by easy, if you keep your wits about you and go cau-
line numbers, in which case the lines searched are tiously. The next several sections talk about cut
only those in the range specified. and paste. We will begin with the UNIX commands
for moving entire files around, then discuss ed
Multi-line Global Commands commands for operating on pieces of files.
It is possible to do more than one command under
the control of a global command, although the syn-
tax for expressing the operation is not especially
natural or pleasant. As an example, suppose the
-- --
- 11 -
- 12 -
Answer: for most purposes, no. You might reason- ed remembers the name of the file, and any subse-
ably ask why there are two programs in that case, quent e, r or w commands that don’t contain a file-
since cat is obviously all you need. The answer is name will refer to this remembered file. Thus
that cp will do some other things as well, which
ed file1
you can investigate for yourself by reading the
... (editing) ...
manual. For now we’ll stick to simple usages.
w (writes back in file1)
e file2 (edit new file, without leaving editor)
Adding Something to the End of a File
... (editing on file2) ...
Sometimes you want to add one file to the end of
w (writes back on file2)
another. We have enough building blocks now that
you can do it; in fact before reading further it (and so on) does a series of edits on various files
would be valuable if you figured out how. To be without ever leaving ed and without typing the
specific, how would you use cp, mv and/or cat to name of any file more than once. (As an aside, if
add the file ‘good1’ to the end of the file ‘good’? you examine the sequence of commands here, you
You could try can see why many UNIX systems use e as a syn-
onym for ed.)
cat good good1 >temp
You can find out the remembered file name at any
mv temp good
time with the f command; just type f without a file
which is probably most direct. You should also name. You can also change the name of the
understand why remembered file name with f; a useful sequence is
cat good good1 >good ed precious
f junk
doesn’t work. (Don’t practice with a good
... (editing) ...
‘good’!)
The easy way is to use a variant of >, called >>. which gets a copy of a precious file, then uses f to
In fact, >> is identical to > except that instead of guarantee that a careless w command won’t clob-
clobbering the old file, it simply tacks stuff on at ber the original.
the end. Thus you could say
Inserting One File into Another
cat good1 >>good
Suppose you have a file called ‘memo’, and you
and ‘good1’ is added to the end of ‘good’. (And if want the file called ‘table’ to be inserted just after
‘good’ didn’t exist, this makes a copy of ‘good1’ the reference to Table 1. That is, in ‘memo’ some-
called ‘good’.) where is a line that says
Table 1 shows that ...
6. CUT AND PASTE WITH THE EDITOR and the data contained in ‘table’ has to go there,
Now we move on to manipulating pieces of files probably so it will be formatted properly by nroff
— individual lines or groups of lines. This is or troff. Now what?
another area where new users seem unsure of This one is easy. Edit ‘memo’, find ‘Table 1’, and
themselves. add the file ‘table’ right there:
ed memo
Filenames
/Table 1/
The first step is to ensure that you know the ed
Table 1 shows that ... [response from ed]
commands for reading and writing files. Of course
you can’t go very far without knowing r and w.
.r table
Equally useful, but less well known, is the ‘edit’ The critical line is the last one. As we said earlier,
command e. Within ed, the command the r command reads a file; here you asked for it to
be read in right after line dot. An r command
e newfile
without any address adds lines at the end, so it is
says ‘I want to edit a new file called newfile, with- the same as $r.
out leaving the editor.’ The e command discards
whatever you’re currently working on and starts Writing out Part of a File
over on newfile. It’s exactly the same as if you had The other side of the coin is writing out part of the
quit with the q command, then re-entered ed with document you’re editing. For example, maybe you
a new file name, except that if you have a pattern want to split out into a separate file that table from
remembered, then a command like // will still the previous example, so it can be formatted and
work. tested separately. Suppose that in the file being
If you enter ed with the command edited we have
ed file .TS
...[lots of stuff]
−− −−
- 13 -
- 14 -
number precedes the k, that line is marked. (The You can really do any UNIX command, including
mark name must be a single lower case letter.) another ed. (This is quite common, in fact.) In
Now you can refer to the marked line with the this case, you can even do another !.
address
7. SUPPORTING TOOLS
′x
There are several tools and techniques that go
Marks are most useful for moving things around. along with the editor, all of which are relatively
Find the first line of the block to be moved, and easy once you know how ed works, because they
mark it with ′a. Then find the last line and mark it are all based on the editor. In this section we will
with ′b. Now position yourself at the place where give some fairly cursory examples of these tools,
the stuff is to go and say more to indicate their existence than to provide a
complete tutorial. More information on each can
′a,′bm.
be found in [3].
Bear in mind that only one line can have a particu-
lar mark name associated with it at any given time. Grep
Sometimes you want to find all occurrences of
Copying Lines some word or pattern in a set of files, to edit them
We mentioned earlier the idea of saving a line that or perhaps just to verify their presence or absence.
was hard to type or used often, so as to cut down It may be possible to edit each file separately and
on typing time. Of course this could be more than look for the pattern of interest, but if there are
one line; then the saving is presumably even many files this can get very tedious, and if the files
greater. are really big, it may be impossible because of lim-
ed provides another command, called t (for ‘trans- its in ed.
fer’) for making a copy of a group of one or more The program grep was invented to get around
lines at any point. This is often easier than writing these limitations. The search patterns that we have
and reading. described in the paper are often called ‘regular
The t command is identical to the m command, expressions’, and ‘grep’ stands for
except that instead of moving lines it simply dupli-
g/re/p
cates them at the place you named. Thus
That describes exactly what grep does — it prints
1,$t$
every line in a set of files that contains a particular
duplicates the entire contents that you are editing. pattern. Thus
A more common use for t is for creating a series of
grep ′thing′ file1 file2 file3 ...
lines that differ only slightly. For example, you
can say finds ‘thing’ wherever it occurs in any of the files
‘file1’, ‘file2’, etc. grep also indicates the file in
a
which the line was found, so you can later edit it if
.......... x ......... (long line)
you like.
. The pattern represented by ‘thing’ can be any pat-
t. (make a copy)
tern you can use in the editor, since grep and ed
s/x/y/ (change it a bit)
use exactly the same mechanism for pattern
t. (make third copy)
searching. It is wisest always to enclose the pat-
s/y/z/ (change it a bit)
tern in the single quotes ′...′ if it contains any non-
and so on. alphabetic characters, since many such characters
also mean something special to the UNIX command
The Temporary Escape ‘!’ interpreter (the ‘shell’). If you don’t quote them,
Sometimes it is convenient to be able to temporar- the command interpreter will try to interpret them
ily escape from the editor to do some other UNIX before grep gets a chance.
command, perhaps one of the file copy or move There is also a way to find lines that don’t contain
commands discussed in section 5, without leaving a pattern:
the editor. The ‘escape’ command ! provides a
grep −v ′thing′ file1 file2 ...
way to do this.
If you say finds all lines that don’t contains ‘thing’. The −v
must occur in the position shown. Given grep and
!any UNIX command
grep −v, it is possible to do things like selecting all
your current editing state is suspended, and the lines that contain some combination of patterns.
UNIX command you asked for is executed. When For example, to get all lines that contain ‘x’ but not
the command finishes, ed will signal you by print- ‘y’:
ing another !; at that point you can resume editing.
grep x file... | grep −v y
−− −−
- 15 -
Sed
sed (‘stream editor’) is a version of the editor with
restricted capabilities but which is capable of pro-
cessing unlimited amounts of input. Basically sed
copies its input to its output, applying one or more
editing commands to each line of input.
As an example, suppose that we want to do the
‘Unix’ to ‘UNIX’ part of the example given above,
but without rewriting the files. Then the command
sed ′s/Unix/UNIX/g′ file1 file2 ...
applies the command ‘s/Unix/UNIX/g’ to all lines
from ‘file1’, ‘file2’, etc., and copies all lines to the
output. The advantage of using sed in such a case
is that it can be used with input too large for ed to
handle. All the output can be collected in one
place, either in a file or perhaps piped into another
program.
If the editing transformation is so complicated that
more than one editing command is needed, com-
mands can be supplied from a file, or on the com-
mand line, with a slightly more complex syntax.
To take commands from a file, for example,
sed −f cmdfile input−files...
sed has further capabilities, including conditional
testing and branching, which we cannot go into
here.