0% found this document useful (0 votes)
84 views16 pages

Advanced Editing On UNIX - Kernighan

This document provides a summary of advanced editing techniques using the UNIX editor (ed). It discusses special characters, line addressing, global commands, and other tips to help secretaries, typists and programmers more efficiently prepare and edit text using ed. Examples are given for using special characters in searches and substitutions, listing lines to view non-printing characters, folding long lines, and removing non-printing characters using regular expressions. The document aims to help non-technical users better utilize the powerful editing tools in ed.

Uploaded by

Lucas Dutra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views16 pages

Advanced Editing On UNIX - Kernighan

This document provides a summary of advanced editing techniques using the UNIX editor (ed). It discusses special characters, line addressing, global commands, and other tips to help secretaries, typists and programmers more efficiently prepare and edit text using ed. Examples are given for using special characters in searches and substitutions, listing lines to view non-printing characters, folding long lines, and removing non-printing characters using regular expressions. The document aims to help non-technical users better utilize the powerful editing tools in ed.

Uploaded by

Lucas Dutra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Advanced Editing on UNIX

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

ABSTRACT

This paper is meant to help secretaries, typists and programmers to make effective use of
the UNIX†
facilities for preparing and editing text. It provides explanations and examples of
•special characters, line addressing and global commands in the editor ed;
•commands for ‘‘cut and paste’’ operations on files and parts of files, including the mv, cp, cat and rm commands,
and the r, w, m and t commands of the editor;
•editing scripts and editor-based programs like grep and sed.
Although the treatment is aimed at non-programmers, new users with any background should find helpful hints on
how to get their jobs done more easily.

November 2, 1997

†UNIX is a Trademark of Bell Laboratories.


-- --

Advanced Editing on UNIX

Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974

1. INTRODUCTION The List command ‘l’


provides remarkably effective tools for text edit- ed provides two commands for printing the con-
ing, that by itself is no guarantee that everyone will tents of the lines you’re editing. Most people are
automatically make the most effective use of them. familiar with p, in combinations like
In particular, people who are not computer special-
1,$p
ists — typists, secretaries, casual users — often
use the system less effectively than they might. to print all the lines you’re editing, or
This document is intended as a sequel to A Tuto-
s/abc/def/p
rial Introduction to the UNIX Text Editor [1], pro-
viding explanations and examples of how to edit to change ‘abc’ to ‘def’ on the current line. Less
with less effort. (You should also be familiar with familiar is the list command l (the letter ‘l ’), which
the material in UNIX For Beginners [2].) Further gives slightly more information than p. In particu-
information on all commands discussed here can lar, l makes visible characters that are normally
be found in The UNIX Programmer’s Manual [3]. invisible, such as tabs and backspaces. If you list a
Examples are based on observations of users and line that contains some of these, l will print each
the difficulties they encounter. Topics covered tab as −> and each backspace as − <. This makes it
include special characters in searches and substi- much easier to correct the sort of typing mistake
tute commands, line addressing, the global com- that inserts extra spaces adjacent to tabs, or inserts
mands, and line moving and copying. There are a backspace followed by a space.
also brief discussions of effective use of related The l command also ‘folds’ long lines for printing
tools, like those for file manipulation, and those — any line that exceeds 72 characters is printed on
based on ed, like grep and sed. multiple lines; each printed line except the last is
A word of caution. There is only one way to learn terminated by a backslash \, so you can tell it was
to use something, and that is to use it. Reading a folded. This is useful for printing long lines on
description is no substitute for trying something. short terminals.
A paper like this one should give you ideas about Occasionally the l command will print in a line a
what to try, but until you actually try something, string of numbers preceded by a backslash, such as
you will not learn it. \07 or \16. These combinations are used to make
visible characters that normally don’t print, like
2. SPECIAL CHARACTERS form feed or vertical tab or bell. Each such combi-
The editor ed is the primary interface to the sys- nation is a single character. When you see such
tem for many people, so it is worthwhile to know characters, be wary — they may have surprising
how to get the most out of ed for the least effort. meanings when printed on some terminals. Often
The next few sections will discuss shortcuts and their presence means that your finger slipped while
labor-saving devices. Not all of these will be you were typing; you almost never want them.
instantly useful to any one person, of course, but a
few will be, and the others should give you ideas to The Substitute Command ‘s’
store away for future use. And as always, until Most of the next few sections will be taken up
you try these things, they will remain theoretical with a discussion of the substitute command s.
knowledge, not something you have confidence in. Since this is the command for changing the con-
tents of individual lines, it probably has the most
complexity of any ed command, and the most
Although UNIX† potential for effective use.
As the simplest place to begin, recall the meaning
†UNIX is a Trademark of Bell Laboratories. of a trailing g after a substitute command. With
s/this/that/
−− −−

-2-

and a single character, as in


s/this/that/g x+y
x−y
the first one replaces the first ‘this’ on the line with
x y
‘that’. If there is more than one ‘this’ on the line,
x.y
the second form with the trailing g changes all of
them. and so on. (We will use to stand for a space
Either form of the s command can be followed by whenever we need to make it visible.)
p or l to ‘print’ or ‘list’ (as described in the previ- Since ‘.’ matches a single character, that gives you
ous section) the contents of the line: a way to deal with funny characters printed by l.
Suppose you have a line that, when printed with
s/this/that/p
the l command, appears as
s/this/that/l
s/this/that/gp .... th\\07is ....
s/this/that/gl
and you want to get rid of the \07 (which repre-
are all legal, and mean slightly different things. sents the bell character, by the way).
Make sure you know what the differences are. The most obvious solution is to try
Of course, any s command can be preceded by
s/\\07//
one or two ‘line numbers’ to specify that the sub-
stitution is to take place on a group of lines. Thus but this will fail. (Try it.) The brute force solution,
which most people would now take, is to re-type
1,$s/mispell/misspell/
the entire line. This is guaranteed, and is actually
changes the first occurrence of ‘mispell’ to ‘mis- quite a reasonable tactic if the line in question isn’t
spell’ on every line of the file. But too big, but for a very long line, re-typing is a bore.
This is where the metacharacter ‘.’ comes in
1,$s/mispell/misspell/g
handy. Since ‘\\07’ really represents a single char-
changes every occurrence in every line (and this is acter, if we say
more likely to be what you wanted in this particu-
s/th.is/this/
lar case).
You should also notice that if you add a p or l to the job is done. The ‘.’ matches the mysterious
the end of any of these substitute commands, only character between the ‘h’ and the ‘i’, whatever it
the last line that got changed will be printed, not is.
all the lines. We will talk later about how to print Bear in mind that since ‘.’ matches any single
all the lines that were modified. character, the command
s/./,/
The Undo Command ‘u’
Occasionally you will make a substitution in a converts the first character on a line into a ‘,’,
line, only to realize too late that it was a ghastly which very often is not what you intended.
mistake. The ‘undo’ command u lets you ‘undo’ As is true of many characters in ed, the ‘.’ has
the last substitution: the last line that was substi- several meanings, depending on its context. This
tuted can be restored to its previous state by typing line shows all three:
the command
.s/././
u
The first ‘.’ is a line number, the number of the
line we are editing, which is called ‘line dot’. (We
The Metacharacter ‘.’ will discuss line dot more in Section 3.) The sec-
As you have undoubtedly noticed when you use ond ‘.’ is a metacharacter that matches any single
ed, certain characters have unexpected meanings character on that line. The third ‘.’ is the only one
when they occur in the left side of a substitute that really is an honest literal period. On the right
command, or in a search for a particular line. In side of a substitution, ‘.’ is not special. If you
the next several sections, we will talk about these apply this command to the line
special characters, which are often called
Now is the time.
‘metacharacters’.
The first one is the period ‘.’. On the left side of a the result will be
substitute command, or in a search with ‘/.../’, ‘.’
stands for any single character. Thus the search
.ow is the time.
which is probably not what you intended.
/x.y/
finds any line where ‘x’ and ‘y’ occur separated by
-- --

-3-

The Backslash ‘\\’ As an exercise, before reading further, find two


Since a period means ‘any character’, the question substitute commands each of which will convert
naturally arises of what to do when you really want the line
a period. For example, how do you convert the
\x\\.\y
line
into the line
Now is the time.
\x\\y
into
Here are several solutions; verify that each works
Now is the time?
as advertised.
The backslash ‘\\’ does the job. A backslash turns
s/\\\\.//
off any special meaning that the next character
s/x../x/
might have; in particular, ‘\\.’ converts the ‘.’ from
s/..y/y/
a ‘match anything’ into a period, so you can use it
to replace the period in A couple of miscellaneous notes about back-
slashes and special characters. First, you can use
Now is the time.
any character to delimit the pieces of an s com-
like this: mand: there is nothing sacred about slashes. (But
you must use slashes for context searching.) For
s/\\./?/
instance, in a line that contains a lot of slashes
The pair of characters ‘\\.’ is considered by ed to be already, like
a single real period.
//exec //sys.fort.go // etc...
The backslash can also be used when searching
for lines that contain a special character. Suppose you could use a colon as the delimiter — to delete
you are looking for a line that contains all the slashes, type
.PP s:/::g
The search Second, if # and @ are your character erase and
line kill characters, you have to type \# and \@;
/.PP/
this is true whether you’re talking to ed or any
isn’t adequate, for it will find a line like other program.
When you are adding text with a or i or c, back-
THE APPLICATION OF ...
slash is not special, and you should only put in one
because the ‘.’ matches the letter ‘A’. But if you backslash for each one you really want.
say
The Dollar Sign ‘$’
/\\.PP/
The next metacharacter, the ‘$’, stands for ‘the
you will find only lines that contain ‘.PP’. end of the line’. As its most obvious use, suppose
The backslash can also be used to turn off special you have the line
meanings for characters other than ‘.’. For exam-
Now is the
ple, consider finding a line that contains a back-
slash. The search and you wish to add the word ‘time’ to the end.
Use the $ like this:
/\\/
s/$/ time/
won’t work, because the ‘\\’ isn’t a literal ‘\\’, but
instead means that the second ‘/’ no longer to get
delimits the search. But by preceding a backslash
Now is the time
with another one, you can search for a literal back-
slash. Thus Notice that a space is needed before ‘time’ in the
substitute command, or you will get
/\\\/
Now is thetime
does work. Similarly, you can search for a forward
slash ‘/’ with As another example, replace the second comma in
the following line with a period without altering
/\\//
the first:
The backslash turns off the meaning of the imme-
Now is the time, for all good men,
diately following ‘/’ so that it doesn’t terminate the
/.../ construction prematurely. The command needed is
s/,$/./
−− −−

-4-

The $ sign here provides context to make specific to count. What now?
which comma we mean. Without it, of course, the This is where the metacharacter ‘∗’ comes in
s command would operate on the first comma to handy. A character followed by a star stands for as
produce many consecutive occurrences of that character as
possible. To refer to all the spaces at once, say
Now is the time. for all good men,
s/x ∗y/x y/
As another example, to convert
The construction ‘ ∗’ means ‘as many spaces as
Now is the time.
possible’. Thus ‘x ∗y’ means ‘an x, as many
into spaces as possible, then a y’.
The star can be used with any character, not just
Now is the time?
space. If the original example was instead
as we did earlier, we can use
text x−−−−−−−−y text
s/.$/?/
then all ‘−’ signs can be replaced by a single space
Like ‘.’, the ‘$’ has multiple meanings depending with the command
on context. In the line
s/x−∗y/x y/
$s/$/$/
Finally, suppose that the line was
the first ‘$’ refers to the last line of the file, the sec-
text x..................y text
ond refers to the end of that line, and the third is a
literal dollar sign, to be added to that line. Can you see what trap lies in wait for the unwary?
If you blindly type
The Circumflex ‘ˆ’
s/x.∗y/x y/
The circumflex (or hat or caret) ‘ˆ’ stands for the
beginning of the line. For example, suppose you what will happen? The answer, naturally, is that it
are looking for a line that begins with ‘the’. If you depends. If there are no other x’s or y’s on the
simply say line, then everything works, but it’s blind luck, not
good management. Remember that ‘.’ matches
/the/
any single character? Then ‘.∗’ matches as many
you will in all likelihood find several lines that single characters as possible, and unless you’re
contain ‘the’ in the middle before arriving at the careful, it can eat up a lot more of the line than you
one you want. But with expected. If the line was, for example, like this:
/ˆthe/ text x text x................y text y text
you narrow the context, and thus arrive at the then saying
desired one more easily.
s/x.∗y/x y/
The other use of ‘ˆ’ is of course to enable you to
insert something at the beginning of a line: will take everything from the first ‘x’ to the last
‘y’, which, in this example, is undoubtedly more
s/ˆ/ /
than you wanted.
places a space at the beginning of the current line. The solution, of course, is to turn off the special
Metacharacters can be combined. To search for a meaning of ‘.’ with ‘\\.’:
line that contains only the characters
s/x\\.∗y/x y/
.PP Now everything works, for ‘\\.∗’ means ‘as many
you can use the command periods as possible’.
There are times when the pattern ‘.∗’ is exactly
/ˆ\\.PP$/
what you want. For example, to change
Now is the time for all good men ....
The Star ‘∗’
Suppose you have a line that looks like this: into
text x y text Now is the time.
where text stands for lots of text, and there are use ‘.∗’ to eat up everything after the ‘for’:
some indeterminate number of spaces between the
s/ for.∗/./
x and the y. Suppose the job is to replace all the
spaces between x and y by a single space. The line There are a couple of additional pitfalls associated
is too long to retype, and there are too many spaces with ‘∗’ that you should be aware of. Most notable
-- --

-5-

is the fact that ‘as many as possible’ means zero or ].


more. The fact that zero is a legitimate possibility The construction
is sometimes rather surprising. For example, if our
[0123456789]
line contained
matches any single digit — the whole thing is
text xy text x y text
called a ‘character class’. With a character class,
and we said the job is easy. The pattern ‘[0123456789]∗’
matches zero or more digits (an entire number), so
s/x ∗y/x y/
1,$s/ˆ[0123456789]∗//
the first ‘xy’ matches this pattern, for it consists of
an ‘x’, zero spaces, and a ‘y’. The result is that the deletes all digits from the beginning of all lines.
substitute acts on the first ‘xy’, and does not touch Any characters can appear within a character
the later one that actually contains some interven- class, and just to confuse the issue there are essen-
ing spaces. tially no special characters inside the brackets;
The way around this, if it matters, is to specify a even the backslash doesn’t have a special meaning.
pattern like To search for special characters, for example, you
can say
/x ∗y/
/[.\$ˆ[]/
which says ‘an x, a space, then as many more
spaces as possible, then a y’, in other words, one or Within [...], the ‘[’ is not special. To get a ‘]’ into
more spaces. a character class, make it the first character.
The other startling behavior of ‘∗’ is again related It’s a nuisance to have to spell out the digits, so
to the fact that zero is a legitimate number of you can abbreviate them as [0−9]; similarly, [a−z]
occurrences of something followed by a star. The stands for the lower case letters, and [A−Z] for
command upper case.
As a final frill on character classes, you can spec-
s/x∗/y/g
ify a class that means ‘none of the following char-
when applied to the line acters’. This is done by beginning the class with a
‘ˆ’:
abcdef
[ˆ0−9]
produces
stands for ‘any character except a digit’. Thus you
yaybycydyeyfy
might find the first line that doesn’t begin with a
which is almost certainly not what was intended. tab or space by a search like
The reason for this behavior is that zero is a legal
/ˆ[ˆ(space)(tab)]/
number of matches, and there are no x’s at the
beginning of the line (so that gets converted into a Within a character class, the circumflex has a spe-
‘y’), nor between the ‘a’ and the ‘b’ (so that gets cial meaning only if it occurs at the beginning.
converted into a ‘y’), nor ... and so on. Make sure Just to convince yourself, verify that
you really want zero matches; if not, in this case
/ˆ[ˆˆ]/
write
finds a line that doesn’t begin with a circumflex.
s/xx∗/y/g
‘xx∗’ is one or more x’s. The Ampersand ‘&’
The ampersand ‘&’ is used primarily to save typ-
The Brackets ‘[ ]’ ing. Suppose you have the line
Suppose that you want to delete any numbers that
Now is the time
appear at the beginning of all lines of a file. You
might first think of trying a series of commands and you want to make it
like
Now is the best time
1,$s/ˆ1∗//
Of course you can always say
1,$s/ˆ2∗//
1,$s/ˆ3∗// s/the/the best/
and so on, but this is clearly going to take forever but it seems silly to have to repeat the ‘the’. The
if the numbers are at all long. Unless you want to ‘&’ is used to eliminate the repetition. On the
repeat the commands over and over until finally all right side of a substitute, the ampersand means
numbers are gone, you must get all the digits on ‘whatever was just matched’, so you can say
one pass. This is the purpose of the brackets [ and
s/the/& best/
−− −−

-6-

and the ‘&’ will stand for ‘the’. Of course this s/ very /\\
isn’t much of a saving if the thing matched is just .ul\\
‘the’, but if it is something truly long or awful, or very\\
if it is something like ‘.∗’ which matches a lot of /
text, you can save some tedious typing. There is
converts the line into four shorter lines, preceding
also much less chance of making a typing error in
the word ‘very’ by the line ‘.ul’, and eliminating
the replacement text. For example, to parenthesize
the spaces around the ‘very’, all at the same time.
a line, regardless of its length,
When a newline is substituted in, dot is left point-
s/.∗/(&)/ ing at the last line created.
The ampersand can occur more than once on the
Joining Lines
right side:
Lines may also be joined together, but this is done
s/the/& best and & worst/ with the j command instead of s. Given the lines
makes Now is
the time
Now is the best and the worst time
and supposing that dot is set to the first of them,
and
then the command
s/.∗/&? &!!/
j
converts the original line into
joins them together. No blanks are added, which is
Now is the time? Now is the time!! why we carefully showed a blank at the beginning
of the second line.
To get a literal ampersand, naturally the backslash
All by itself, a j command joins line dot to line
is used to turn off the special meaning:
dot+1, but any contiguous set of lines can be
s/ampersand/\\&/ joined. Just specify the starting and ending line
numbers. For example,
converts the word into the symbol. Notice that ‘&’
is not special on the left side of a substitute, only 1,$jp
on the right side.
joins all the lines into one big one and prints it.
(More on line numbers in Section 3.)
Substituting Newlines
ed provides a facility for splitting a single line
Rearranging a Line with \( ... \)
into two or more shorter lines by ‘substituting in a
(This section should be skipped on first reading.)
newline’. As the simplest example, suppose a line
Recall that ‘&’ is a shorthand that stands for what-
has gotten unmanageably long because of editing
ever was matched by the left side of an s com-
(or merely because it was unwisely typed). If it
mand. In much the same way you can capture sep-
looks like
arate pieces of what was matched; the only differ-
text xy text ence is that you have to specify on the left side just
what pieces you’re interested in.
you can break it between the ‘x’ and the ‘y’ like
Suppose, for instance, that you have a file of lines
this:
that consist of names in the form
s/xy/x\\
Smith, A. B.
y/
Jones, C.
This is actually a single command, although it is
and so on, and you want the initials to precede the
typed on two lines. Bearing in mind that ‘\\’ turns
name, as in
off special meanings, it seems relatively intuitive
that a ‘\\’ at the end of a line would make the new- A. B. Smith
line there no longer special. C. Jones
You can in fact make a single line into several
It is possible to do this with a series of editing
lines with this same mechanism. As a large exam-
commands, but it is tedious and error-prone. (It is
ple, consider underlining the word ‘very’ in a long
instructive to figure out how it is done, though.)
line by splitting ‘very’ onto a separate line, and
The alternative is to ‘tag’ the pieces of the pattern
preceding it by the roff or nroff formatting com-
(in this case, the last name, and the initials), and
mand ‘.ul’.
then rearrange the pieces. On the left side of a
text a very big text substitution, if part of the pattern is enclosed
between \( and \), whatever matched that part is
The command
remembered, and available for use on the right
-- --

-7-

side. On the right side, the symbol ‘\\1’ refers to As another example,
whatever matched the first \(...\\) pair, ‘\\2’ to the
.−3,.+3p
second \(...\\), and so on.
The command prints from three lines before where you are now
(at line dot) to three lines after, thus giving you a
1,$s/ˆ\\([ˆ,]∗\\), ∗\\(.∗\\)/\\2 \1/
bit of context. By the way, the ‘+’ can be omitted:
although hard to read, does the job. The first \(...\\)
matches the last name, which is any string up to
.−3,.3p
the comma; this is referred to on the right side with is absolutely identical in meaning.
‘\\1’. The second \(...\\) is whatever follows the Another area in which you can save typing effort
comma and any spaces, and is referred to as ‘\\2’. in specifying lines is to use ‘−’ and ‘+’ as line
Of course, with any editing sequence this compli- numbers by themselves.
cated, it’s foolhardy to simply run it and hope.

The global commands g and v discussed in section
4 provide a way for you to print exactly those lines by itself is a command to move back up one line in
which were affected by the substitute command, the file. In fact, you can string several minus signs
and thus verify that it did what you wanted in all together to move back up that many lines:
cases.
−−−
3. LINE ADDRESSING IN THE EDITOR moves up three lines, as does ‘−3’. Thus
The next general area we will discuss is that of
−3,+3p
line addressing in ed, that is, how you specify what
lines are to be affected by editing commands. We is also identical to the examples above.
have already used constructions like Since ‘−’ is shorter than ‘.−1’, constructions like
1,$s/x/y/ −,.s/bad/good/
to specify a change on all lines. And most users are useful. This changes ‘bad’ to ‘good’ on the
are long since familiar with using a single newline previous line and on the current line.
(or return) to print the next line, and with ‘+’ and ‘−’ can be used in combination with
searches using ‘/.../’ and ‘?...?’, and with ‘$’. The
/thing/
search
to find a line that contains ‘thing’. Less familiar,
/thing/−−
surprisingly enough, is the use of
finds the line containing ‘thing’, and positions you
?thing?
two lines before it.
to scan backwards for the previous occurrence of
‘thing’. This is especially handy when you realize Repeated Searches
that the thing you want to operate on is back up the Suppose you ask for the search
page from where you are currently editing.
/horrible thing/
The slash and question mark are the only charac-
ters you can use to delimit a context search, though and when the line is printed you discover that it
you can use essentially any character in a substi- isn’t the horrible thing that you wanted, so it is
tute command. necessary to repeat the search again. You don’t
have to re-type the search, for the construction
Address Arithmetic
//
The next step is to combine the line numbers like
‘.’, ‘$’, ‘/.../’ and ‘?...?’ with ‘+’ and ‘−’. Thus is a shorthand for ‘the previous thing that was
searched for’, whatever it was. This can be
$−1
repeated as many times as necessary. You can also
is a command to print the next to last line of the go backwards:
current file (that is, one line before line ‘$’). For
??
example, to recall how far you got in a previous
editing session, searches for the same thing, but in the reverse
direction.
$−5,$p
Not only can you repeat the search, but you can
prints the last six lines. (Be sure you understand use ‘//’ as the left side of a substitute command, to
why it’s six, not five.) If there aren’t six, of mean ‘the most recent pattern’.
course, you’ll get an error message.
/horrible thing/
.... ed prints line with ‘horrible thing’ ...
−− −−

-8-

s//good/p without specifying any line number for the substi-


tute command or for the second append command.
To go backwards and change a line, say
Or you can say
??s//good/
a
Of course, you can still use the ‘&’ on the right ... text ...
hand side of a substitute to stand for whatever got ... horrible botch ... (major error)
matched: .
c (replace entire line)
//s//& &/p
... fixed up line ...
finds the next occurrence of whatever you searched
You should experiment to determine what hap-
for last, replaces it by two copies of itself, then
pens if you add no lines with a, c or i.
prints the line just to verify that it worked.
The r command will read a file into the text being
edited, either at the end if you give no address, or
Default Line Numbers and the Value of Dot
after the specified line if you do. In either case,
One of the most effective ways to speed up your
dot points at the last line read in. Remember that
editing is always to know what lines will be
you can even say 0r to read a file in at the begin-
affected by a command if you don’t specify the
ning of the text. (You can also say 0a or 1i to start
lines it is to act on, and on what line you will be
adding text at the beginning.)
positioned (i.e., the value of dot) when a command
The w command writes out the entire file. If you
finishes. If you can edit without specifying unnec-
precede the command by one line number, that line
essary line numbers, you can save a lot of typing.
is written, while if you precede it by two line num-
As the most obvious example, if you issue a
bers, that range of lines is written. The w com-
search command like
mand does not change dot: the current line remains
/thing/ the same, regardless of what lines are written.
This is true even if you say something like
you are left pointing at the next line that contains
‘thing’. Then no address is required with com- /ˆ\\.AB/,/ˆ\\.AE/w abstract
mands like s to make a substitution on that line, or
which involves a context search.
p to print it, or l to list it, or d to delete it, or a to
Since the w command is so easy to use, you
append text after it, or c to change it, or i to insert
should save what you are editing regularly as you
text before it.
go along just in case the system crashes, or in case
What happens if there was no ‘thing’? Then you
you do something foolish, like clobbering what
are left right where you were — dot is unchanged.
you’re editing.
This is also true if you were sitting on the only
The least intuitive behavior, in a sense, is that of
‘thing’ when you issued the command. The same
the s command. The rule is simple — you are left
rules hold for searches that use ‘?...?’; the only dif-
sitting on the last line that got changed. If there
ference is the direction in which you search.
were no changes, then dot is unchanged.
The delete command d leaves dot pointing at the
To illustrate, suppose that there are three lines in
line that followed the last deleted line. When line
the buffer, and you are sitting on the middle one:
‘$’ gets deleted, however, dot points at the new line
‘$’. x1
The line-changing commands a, c and i by default x2
all affect the current line — if you give no line x3
number with them, a appends text after the current
Then the command
line, c changes the current line, and i inserts text
before the current line. −,+s/x/y/p
a, c, and i behave identically in one respect —
prints the third line, which is the last one changed.
when you stop appending, changing or inserting,
But if the three lines had been
dot points at the last line entered. This is exactly
what you want for typing and editing on the fly. x1
For example, you can say y2
y3
a
... text ... and the same command had been issued while dot
... botch ... (minor error) pointed at the second line, then the result would be
. to change and print only the first line, and that is
s/botch/correct/ (fix botched line) where dot would be set.
a
... more text ...
-- --

-9-

Semicolon ‘;’ that.


Searches with ‘/.../’ and ‘?...?’ start at the current Closely related is searching for the second previ-
line and move forward or backward respectively ous occurrence of something, as in
until they either find the pattern or get back to the
?something?;??
current line. Sometimes this is not what is wanted.
Suppose, for example, that the buffer contains lines Printing the third or fourth or ... in either direction
like this: is left as an exercise.
Finally, bear in mind that if you want to find the
. first occurrence of something in a file, starting at
. an arbitrary place within the file, it is not sufficient
. to say
ab
. 1;/thing/
. because this fails if ‘thing’ occurs on line 1. But it
. is possible to say
bc
. 0;/thing/
. (one of the few places where 0 is a legal line num-
Starting at line 1, one would expect that the com- ber), for this starts the search at line 1.
mand
Interrupting the Editor
/a/,/b/p
As a final note on what dot gets set to, you should
prints all the lines from the ‘ab’ to the ‘bc’ inclu- be aware that if you hit the interrupt or delete or
sive. Actually this is not what happens. Both rubout or break key while ed is doing a command,
searches (for ‘a’ and for ‘b’) start from the same things are put back together again and your state is
point, and thus they both find the line that contains restored as much as possible to what it was before
‘ab’. The result is to print a single line. Worse, if the command began. Naturally, some changes are
there had been a line with a ‘b’ in it before the ‘ab’ irrevocable — if you are reading or writing a file
line, then the print command would be in error, or making substitutions or deleting lines, these will
since the second line number would be less than be stopped in some clean but unpredictable state in
the first, and it is illegal to try to print lines in the middle (which is why it is not usually wise to
reverse order. stop them). Dot may or may not be changed.
This is because the comma separator for line num- Printing is more clear cut. Dot is not changed
bers doesn’t set dot as each address is processed; until the printing is done. Thus if you print until
each search starts from the same place. In ed, the you see an interesting line, then hit delete, you are
semicolon ‘;’ can be used just like comma, with not sitting on that line or even near it. Dot is left
the single difference that use of a semicolon forces where it was when the p command was started.
dot to be set at that point as the line numbers are
being evaluated. In effect, the semicolon ‘moves’ 4. GLOBAL COMMANDS
dot. Thus in our example above, the command The global commands g and v are used to perform
one or more editing commands on all lines that
/a/;/b/p
either contain (g) or don’t contain (v) a specified
prints the range of lines from ‘ab’ to ‘bc’, because pattern.
after the ‘a’ is found, dot is set to that line, and As the simplest example, the command
then ‘b’ is searched for, starting beyond that line.
g/UNIX/p
This property is most often useful in a very simple
situation. Suppose you want to find the second prints all lines that contain the word ‘UNIX’. The
occurrence of ‘thing’. You could say pattern that goes between the slashes can be any-
thing that could be used in a line search or in a
/thing/
substitute command; exactly the same rules and
//
limitations apply.
but this prints the first occurrence as well as the As another example, then,
second, and is a nuisance when you know very
g/ˆ\\./p
well that it is only the second one you’re interested
in. The solution is to say prints all the formatting commands in a file (lines
that begin with ‘.’).
/thing/;//
The v command is identical to g, except that it
This says to find the first occurrence of ‘thing’, set operates on those line that do not contain an occur-
dot to that line, then find the second and print only rence of the pattern. (Don’t look too hard for
-- --

- 10 -

mnemonic significance to the letter ‘v’.) So task is to change ‘x’ to ‘y’ and ‘a’ to ‘b’ on all
lines that contain ‘thing’. Then
v/ˆ\\./p
g/thing/s/x/y/\\
prints all the lines that don’t begin with ‘.’ — the
s/a/b/
actual text lines.
The command that follows g or v can be anything: is sufficient. The ‘\\’ signals the g command that
the set of commands continues on the next line; it
g/ˆ\\./d
terminates on the first line that does not end with
deletes all lines that begin with ‘.’, and ‘\\’. (As a minor blemish, you can’t use a substitute
command to insert a newline within a g com-
g/ˆ$/d
mand.)
deletes all empty lines. You should watch out for this problem: the com-
Probably the most useful command that can fol- mand
low a global is the substitute command, for this
g/x/s//y/\\
can be used to make a change and print each
s/a/b/
affected line for verification. For example, we
could change the word ‘Unix’ to ‘UNIX’ every- does not work as you expect. The remembered
where, and verify that it really worked, with pattern is the last pattern that was actually
executed, so sometimes it will be ‘x’ (as expected),
g/Unix/s//UNIX/gp
and sometimes it will be ‘a’ (not expected). You
Notice that we used ‘//’ in the substitute command must spell it out, like this:
to mean ‘the previous pattern’, in this case, ‘Unix’.
g/x/s/x/y/\\
The p command is done on every line that matches
s/a/b/
the pattern, not just those on which a substitution
took place. It is also possible to execute a, c and i commands
The global command operates by making two under a global command; as with other multi-line
passes over the file. On the first pass, all lines that constructions, all that is needed is to add a ‘\\’ at
match the pattern are marked. On the second pass, the end of each line except the last. Thus to add a
each marked line in turn is examined, dot is set to ‘.nf ’ and ‘.sp’ command before each ‘.EQ’ line,
that line, and the command executed. This means type
that it is possible for the command that follows a g
g/ˆ\\.EQ/i\\
or v to use addresses, set dot, and so on, quite
freely.
.nf\\
.sp
g/ˆ\\.PP/+
There is no need for a final line containing a ‘.’ to
prints the line that follows each ‘.PP’ command terminate the i command, unless there are further
(the signal for a new paragraph in some formatting commands being done under the global. On the
packages). Remember that ‘+’ means ‘one line other hand, it does no harm to put it in either.
past dot’. And
5. CUT AND PASTE WITH UNIX COM-
g/topic/?ˆ\\.SH?1
MANDS
searches for each line that contains ‘topic’, scans One editing area in which non-programmers seem
backwards until it finds a line that begins ‘.SH’ (a not very confident is in what might be called ‘cut
section heading) and prints the line that follows and paste’ operations — changing the name of a
that, thus showing the section headings under file, making a copy of a file somewhere else, mov-
which ‘topic’ is mentioned. Finally, ing a few lines from one place to another in a file,
inserting one file in the middle of another, splitting
g/ˆ\\.EQ/+,/ˆ\\.EN/−p
a file into pieces, and splicing two or more files
prints all the lines that lie between lines beginning together.
with ‘.EQ’ and ‘.EN’ formatting commands. Yet most of these operations are actually quite
The g and v commands can also be preceded by easy, if you keep your wits about you and go cau-
line numbers, in which case the lines searched are tiously. The next several sections talk about cut
only those in the range specified. and paste. We will begin with the UNIX commands
for moving entire files around, then discuss ed
Multi-line Global Commands commands for operating on pieces of files.
It is possible to do more than one command under
the control of a global command, although the syn-
tax for expressing the operation is not especially
natural or pleasant. As an example, suppose the
-- --

- 11 -

Changing the Name of a File Removing a File


You have a file named ‘memo’ and you want it to If you decide you are really done with a file for-
be called ‘paper’ instead. How is it done? ever, you can remove it with the rm command:
The UNIX program that renames files is called mv
rm savegood
(for ‘move’); it ‘moves’ the file from one name to
another, like this: throws away (irrevocably) the file called
‘savegood’.
mv memo paper
That’s all there is to it: mv from the old name to Putting Two or More Files Together
the new name. The next step is the familiar one of collecting two
or more files into one big one. This will be
mv oldname newname
needed, for example, when the author of a paper
Warning: if there is already a file around with the decides that several sections need to be combined
new name, its present contents will be silently into one. There are several ways to do it, of which
clobbered by the information from the other file. the cleanest, once you get used to it, is a program
The one exception is that you can’t move a file to called cat. (Not all programs have two-letter
itself — names.) cat is short for ‘concatenate’, which is
exactly what we want to do.
mv x x
Suppose the job is to combine the files ‘file1’ and
is illegal. ‘file2’ into a single file called ‘bigfile’. If you say
cat file
Making a Copy of a File
Sometimes what you want is a copy of a file — an the contents of ‘file’ will get printed on your termi-
entirely fresh version. This might be because you nal. If you say
want to work on a file, and yet save a copy in case
cat file1 file2
something gets fouled up, or just because you’re
paranoid. the contents of ‘file1’ and then the contents of
In any case, the way to do it is with the cp com- ‘file2’ will both be printed on your terminal, in that
mand. (cp stands for ‘copy’; the system is big on order. So cat combines the files, all right, but it’s
short command names, which are appreciated by not much help to print them on the terminal — we
heavy users, but sometimes a strain for novices.) want them in ‘bigfile’.
Suppose you have a file called ‘good’ and you Fortunately, there is a way. You can tell the sys-
want to save a copy before you make some dra- tem that instead of printing on your terminal, you
matic editing changes. Choose a name — want the same information put in a file. The way
‘savegood’ might be acceptable — then type to do it is to add to the command line the character
> and the name of the file where you want the out-
cp good savegood
put to go. Then you can say
This copies ‘good’ onto ‘savegood’, and you now
cat file1 file2 >bigfile
have two identical copies of the file ‘good’. (If
‘savegood’ previously contained something, it gets and the job is done. (As with cp and mv, you’re
overwritten.) putting something into ‘bigfile’, and anything that
Now if you decide at some time that you want to was already there is destroyed.)
get back to the original state of ‘good’, you can say This ability to ‘capture’ the output of a program is
one of the most useful aspects of the system. For-
mv savegood good
tunately it’s not limited to the cat program — you
(if you’re not interested in ‘savegood’ any more), can use it with any program that prints on your ter-
or minal. We’ll see some more uses for it in a
moment.
cp savegood good
Naturally, you can combine several files, not just
if you still want to retain a safe copy. two:
In summary, mv just renames a file; cp makes a
cat file1 file2 file3 ... >bigfile
duplicate copy. Both of them clobber the ‘target’
file if it already exists, so you had better be sure collects a whole bunch.
that’s what you want to do before you do it. Question: is there any difference between
cp good savegood
and
cat good >savegood
−− −−

- 12 -

Answer: for most purposes, no. You might reason- ed remembers the name of the file, and any subse-
ably ask why there are two programs in that case, quent e, r or w commands that don’t contain a file-
since cat is obviously all you need. The answer is name will refer to this remembered file. Thus
that cp will do some other things as well, which
ed file1
you can investigate for yourself by reading the
... (editing) ...
manual. For now we’ll stick to simple usages.
w (writes back in file1)
e file2 (edit new file, without leaving editor)
Adding Something to the End of a File
... (editing on file2) ...
Sometimes you want to add one file to the end of
w (writes back on file2)
another. We have enough building blocks now that
you can do it; in fact before reading further it (and so on) does a series of edits on various files
would be valuable if you figured out how. To be without ever leaving ed and without typing the
specific, how would you use cp, mv and/or cat to name of any file more than once. (As an aside, if
add the file ‘good1’ to the end of the file ‘good’? you examine the sequence of commands here, you
You could try can see why many UNIX systems use e as a syn-
onym for ed.)
cat good good1 >temp
You can find out the remembered file name at any
mv temp good
time with the f command; just type f without a file
which is probably most direct. You should also name. You can also change the name of the
understand why remembered file name with f; a useful sequence is
cat good good1 >good ed precious
f junk
doesn’t work. (Don’t practice with a good
... (editing) ...
‘good’!)
The easy way is to use a variant of >, called >>. which gets a copy of a precious file, then uses f to
In fact, >> is identical to > except that instead of guarantee that a careless w command won’t clob-
clobbering the old file, it simply tacks stuff on at ber the original.
the end. Thus you could say
Inserting One File into Another
cat good1 >>good
Suppose you have a file called ‘memo’, and you
and ‘good1’ is added to the end of ‘good’. (And if want the file called ‘table’ to be inserted just after
‘good’ didn’t exist, this makes a copy of ‘good1’ the reference to Table 1. That is, in ‘memo’ some-
called ‘good’.) where is a line that says
Table 1 shows that ...
6. CUT AND PASTE WITH THE EDITOR and the data contained in ‘table’ has to go there,
Now we move on to manipulating pieces of files probably so it will be formatted properly by nroff
— individual lines or groups of lines. This is or troff. Now what?
another area where new users seem unsure of This one is easy. Edit ‘memo’, find ‘Table 1’, and
themselves. add the file ‘table’ right there:
ed memo
Filenames
/Table 1/
The first step is to ensure that you know the ed
Table 1 shows that ... [response from ed]
commands for reading and writing files. Of course
you can’t go very far without knowing r and w.
.r table
Equally useful, but less well known, is the ‘edit’ The critical line is the last one. As we said earlier,
command e. Within ed, the command the r command reads a file; here you asked for it to
be read in right after line dot. An r command
e newfile
without any address adds lines at the end, so it is
says ‘I want to edit a new file called newfile, with- the same as $r.
out leaving the editor.’ The e command discards
whatever you’re currently working on and starts Writing out Part of a File
over on newfile. It’s exactly the same as if you had The other side of the coin is writing out part of the
quit with the q command, then re-entered ed with document you’re editing. For example, maybe you
a new file name, except that if you have a pattern want to split out into a separate file that table from
remembered, then a command like // will still the previous example, so it can be formatted and
work. tested separately. Suppose that in the file being
If you enter ed with the command edited we have
ed file .TS
...[lots of stuff]
−− −−

- 13 -

.TE As we said, that’s the brute force way. The easier


way (often) is to use the move command m that ed
which is the way a table is set up for the tbl pro-
provides — it lets you do the whole set of opera-
gram. To isolate the table in a separate file called
tions at one crack, without any temporary file.
‘table’, first find the start of the table (the ‘.TS’
The m command is like many other ed commands
line), then write out the interesting part:
in that it takes up to two line numbers in front that
/ˆ\\.TS/ tell what lines are to be affected. It is also fol-
.TS [ed prints the line it found] lowed by a line number that tells where the lines
.,/ˆ\\.TE/w table are to go. Thus
and the job is done. If you are confident, you can line1, line2 m line3
do it all at once with
says to move all the lines between ‘line1’ and
/ˆ\\.TS/;/ˆ\\.TE/w table ‘line2’ after ‘line3’. Naturally, any of ‘line1’ etc.,
can be patterns between slashes, $ signs, or other
The point is that the w command can write out a
ways to specify lines.
group of lines, instead of the whole file. In fact,
Suppose again that you’re sitting at the first line of
you can write out a single line if you like; just give
the paragraph. Then you can say
one line number instead of two. For example, if
you have just typed a horribly complicated line and .,/ˆ\\.PP/−m$
you know that it (or something like it) is going to
That’s all.
be needed later, then save it — don’t re-type it. In
As another example of a frequent operation, you
the editor, say
can reverse the order of two adjacent lines by mov-
a ing the first one to after the second. Suppose that
...lots of stuff... you are positioned at the first. Then
...horrible line...
m+
.
.w temp does it. It says to move line dot to after one line
a after line dot. If you are positioned on the second
...more stuff... line,
. m−−
.r temp
a does the interchange.
...more stuff... As you can see, the m command is more succinct
. and direct than writing, deleting and re-reading.
When is brute force better anyway? This is a mat-
This last example is worth studying, to be sure you
ter of personal taste — do what you have most
appreciate what’s going on.
confidence in. The main difficulty with the m
command is that if you use patterns to specify both
Moving Lines Around
the lines you are moving and the target, you have
Suppose you want to move a paragraph from its
to take care that you specify them properly, or you
present position in a paper to the end. How would
may well not move the lines you thought you did.
you do it? As a concrete example, suppose each
The result of a botched m command can be a
paragraph in the paper begins with the formatting
ghastly mess. Doing the job a step at a time makes
command ‘.PP’. Think about it and write down
it easier for you to verify at each step that you
the details before reading on.
accomplished what you wanted to. It’s also a good
The brute force way (not necessarily bad) is to
idea to issue a w command before doing anything
write the paragraph onto a temporary file, delete it
complicated; then if you goof, it’s easy to back up
from its current position, then read in the tempo-
to where you were.
rary file at the end. Assuming that you are sitting
on the ‘.PP’ command that begins the paragraph,
Marks
this is the sequence of commands:
ed provides a facility for marking a line with a
.,/ˆ\\.PP/−w temp particular name so you can later reference it by
.,//−d name regardless of its actual line number. This can
$r temp be handy for moving lines, and for keeping track
of them as they move. The mark command is k;
That is, from where you are now (‘.’) until one
the command
line before the next ‘.PP’ (‘/ˆ\\.PP/−’) write onto
‘temp’. Then delete the same lines. Finally, read kx
‘temp’ at the end.
marks the current line with the name ‘x’. If a line
-- --

- 14 -

number precedes the k, that line is marked. (The You can really do any UNIX command, including
mark name must be a single lower case letter.) another ed. (This is quite common, in fact.) In
Now you can refer to the marked line with the this case, you can even do another !.
address
7. SUPPORTING TOOLS
′x
There are several tools and techniques that go
Marks are most useful for moving things around. along with the editor, all of which are relatively
Find the first line of the block to be moved, and easy once you know how ed works, because they
mark it with ′a. Then find the last line and mark it are all based on the editor. In this section we will
with ′b. Now position yourself at the place where give some fairly cursory examples of these tools,
the stuff is to go and say more to indicate their existence than to provide a
complete tutorial. More information on each can
′a,′bm.
be found in [3].
Bear in mind that only one line can have a particu-
lar mark name associated with it at any given time. Grep
Sometimes you want to find all occurrences of
Copying Lines some word or pattern in a set of files, to edit them
We mentioned earlier the idea of saving a line that or perhaps just to verify their presence or absence.
was hard to type or used often, so as to cut down It may be possible to edit each file separately and
on typing time. Of course this could be more than look for the pattern of interest, but if there are
one line; then the saving is presumably even many files this can get very tedious, and if the files
greater. are really big, it may be impossible because of lim-
ed provides another command, called t (for ‘trans- its in ed.
fer’) for making a copy of a group of one or more The program grep was invented to get around
lines at any point. This is often easier than writing these limitations. The search patterns that we have
and reading. described in the paper are often called ‘regular
The t command is identical to the m command, expressions’, and ‘grep’ stands for
except that instead of moving lines it simply dupli-
g/re/p
cates them at the place you named. Thus
That describes exactly what grep does — it prints
1,$t$
every line in a set of files that contains a particular
duplicates the entire contents that you are editing. pattern. Thus
A more common use for t is for creating a series of
grep ′thing′ file1 file2 file3 ...
lines that differ only slightly. For example, you
can say finds ‘thing’ wherever it occurs in any of the files
‘file1’, ‘file2’, etc. grep also indicates the file in
a
which the line was found, so you can later edit it if
.......... x ......... (long line)
you like.
. The pattern represented by ‘thing’ can be any pat-
t. (make a copy)
tern you can use in the editor, since grep and ed
s/x/y/ (change it a bit)
use exactly the same mechanism for pattern
t. (make third copy)
searching. It is wisest always to enclose the pat-
s/y/z/ (change it a bit)
tern in the single quotes ′...′ if it contains any non-
and so on. alphabetic characters, since many such characters
also mean something special to the UNIX command
The Temporary Escape ‘!’ interpreter (the ‘shell’). If you don’t quote them,
Sometimes it is convenient to be able to temporar- the command interpreter will try to interpret them
ily escape from the editor to do some other UNIX before grep gets a chance.
command, perhaps one of the file copy or move There is also a way to find lines that don’t contain
commands discussed in section 5, without leaving a pattern:
the editor. The ‘escape’ command ! provides a
grep −v ′thing′ file1 file2 ...
way to do this.
If you say finds all lines that don’t contains ‘thing’. The −v
must occur in the position shown. Given grep and
!any UNIX command
grep −v, it is possible to do things like selecting all
your current editing state is suspended, and the lines that contain some combination of patterns.
UNIX command you asked for is executed. When For example, to get all lines that contain ‘x’ but not
the command finishes, ed will signal you by print- ‘y’:
ing another !; at that point you can resume editing.
grep x file... | grep −v y
−− −−

- 15 -

(The notation | is a ‘pipe’, which causes the output Acknowledgement


of the first command to be used as input to the sec- I am grateful to Ted Dolotta for his careful read-
ond command; see [2].) ing and valuable suggestions.

Editing Scripts References


If a fairly complicated set of editing operations is [1]Brian W. Kernighan, A Tutorial Introduction to
to be done on a whole set of files, the easiest thing the UNIX Text Editor, Bell Laboratories internal
to do is to make up a ‘script’, i.e., a file that con- memorandum.
tains the operations you want to perform, then [2]Brian W. Kernighan, UNIX For Beginners, Bell
apply this script to each file in turn. Laboratories internal memorandum.
For example, suppose you want to change every [3]Ken L. Thompson and Dennis M. Ritchie, The
‘Unix’ to ‘UNIX’ and every ‘Gcos’ to ‘GCOS’ in UNIX Programmer’s Manual. Bell Laboratories.
a large number of files. Then put into the file
‘script’ the lines
g/Unix/s//UNIX/g
g/Gcos/s//GCOS/g
w
q
Now you can say
ed file1 <script
ed file2 <script
...
This causes ed to take its commands from the pre-
pared script. Notice that the whole job has to be
planned in advance.
And of course by using the UNIX command inter-
preter, you can cycle through a set of files automat-
ically, with varying degrees of ease.

Sed
sed (‘stream editor’) is a version of the editor with
restricted capabilities but which is capable of pro-
cessing unlimited amounts of input. Basically sed
copies its input to its output, applying one or more
editing commands to each line of input.
As an example, suppose that we want to do the
‘Unix’ to ‘UNIX’ part of the example given above,
but without rewriting the files. Then the command
sed ′s/Unix/UNIX/g′ file1 file2 ...
applies the command ‘s/Unix/UNIX/g’ to all lines
from ‘file1’, ‘file2’, etc., and copies all lines to the
output. The advantage of using sed in such a case
is that it can be used with input too large for ed to
handle. All the output can be collected in one
place, either in a file or perhaps piped into another
program.
If the editing transformation is so complicated that
more than one editing command is needed, com-
mands can be supplied from a file, or on the com-
mand line, with a slightly more complex syntax.
To take commands from a file, for example,
sed −f cmdfile input−files...
sed has further capabilities, including conditional
testing and branching, which we cannot go into
here.

You might also like