plan9fox/sys/doc/sam/sam.tut
2011-07-18 11:01:22 +02:00

1785 lines
40 KiB
Plaintext

.de P1
.KS
.DS
.ft CW
.ta 5n 10n 15n 20n 25n 30n 35n 40n 45n 50n 55n 60n 65n 70n 75n 80n
..
.de P2
.ft 1
.DE
.KE
..
.de CW
.lg 0
\%\&\\$3\f(CW\\$1\fP\&\\$2
.lg
..
.de WC
.lg 0
\%\&\\$3\f(CI\\$1\fP\&\\$2
.lg
..
.TL
A tutorial for the
.CW sam
.B
command language
.AU
Rob Pike
.AI
.MH
.AB
.CW sam
is an interactive text editor with a command language that makes heavy use
of regular expressions.
Although the language is syntactically similar to
.CW ed (1),
the details are interestingly different.
This tutorial introduces the command language, but does not discuss
the screen and mouse interface.
With apologies to those unfamiliar with the Ninth Edition Blit software,
it is assumed that the similarity of
.CW sam
to
.CW mux (9)
at this level makes
.CW sam 's
mouse language easy to learn.
.PP
The
.CW sam
command language applies identically to two environments:
when running
.CW sam
on an ordinary terminal
(\f2via\f1\f1
.CW sam\ -d ),
and in the command window of a
.I downloaded
.CW sam ,
that is, one using the bitmap display and mouse.
.AE
.SH
Introduction
.PP
This tutorial describes the command language of
.CW sam ,
an interactive text editor that runs on Blits and
some computers with bitmap displays.
For most editing tasks, the mouse-based editing features
are sufficient, and they are easy to use and to learn.
.PP
The command language is often useful, however, particularly
when making global changes.
Unlike the commands in
.CW ed ,
which are necessary to make changes,
.CW sam
commands tend to be used
only for complicated or repetitive editing tasks.
It is in these more involved uses that
the differences between
.CW sam
and other text editors are most evident.
.PP
.CW sam 's
language makes it easy to do some things that other editors,
including programs like
.CW sed
and
.CW awk ,
do not handle gracefully, so this tutorial serves partly as a
lesson in
.CW sam 's
manner of manipulating text.
The examples below therefore concentrate entirely on the language,
assuming that facility with the use of the mouse in
.CW sam
is at worst easy to pick up.
In fact,
.CW sam
can be run without the mouse at all (not
.I downloaded ),
by specifying the
.CW -d
flag, and it is this domain that the tutorial
occupies; the command language in these modes
are identical.
.PP
A word to the Unix adept:
although
.CW sam
is syntactically very similar to
.CW ed ,
it is fundamentally and deliberately different in design and detailed semantics.
You might use knowledge of
.CW ed
to predict how the substitute command works,
but you'd only be right if you had used some understanding of
.CW sam 's
workings to influence your prediction.
Be particularly careful about idioms.
Idioms form in curious nooks of languages and depend on
undependable peculiarities.
.CW ed
idioms simply don't work in
.CW sam :
.CW 1,$s/a/b/
makes one substitution in the whole file, not one per line.
.CW sam
has its own idioms.
Much of the purpose of this tutorial is to publish them
and make fluency in
.CW sam
a matter of learning, not cunning.
.PP
The tutorial depends on familiarity with regular expressions, although
some experience with a more traditional Unix editor may be helpful.
To aid readers familiar with
.CW ed ,
I have pointed out in square brackets [] some of
the relevant differences between
.CW ed
and
.CW sam .
Read these comments only if you wish
to understand the differences; the lesson is about
.CW sam ,
not
.CW sam
.I vs.
.CW ed .
Another typographic convention is that output appears in
.CW "this font,
while typed input appears as
.WC "slanty text.
.PP
Nomenclature:
.CW sam
keeps a copy of the text it is editing.
This copy is called a
.I file .
To avoid confusion, I have called the permanent storage on disc a
.I
Unix file.
.R
.SH
Text
.PP
To get started, we need some text to play with.
Any text will do; try something from
James Gosling's Emacs manual:
.P1
$ \f(CIsam -d
a
This manual is organized in a rather haphazard manner. The first
several sections were written hastily in an attempt to provide a
general introduction to the commands in Emacs and to try to show
the method in the madness that is the Emacs command structure.
\&.
.ft
.P2
.WC "sam -d
starts
.CW sam
running.
The
.CW a
command adds text until a line containing just a period, and sets the
.I
current text
.R
(also called
.I dot )
to what was typed \(em everything between the
.CW a
and the period.
.CW ed "" [
would leave dot set to only the last line.]
The
.CW p
command prints the current text:
.P1
.WC p
This manual is organized in a rather haphazard manner. The first
several sections were written hastily in an attempt to provide a
general introduction to the commands in Emacs and to try to show
the method in the madness that is the Emacs command structure.
.P2
[Again,
.CW ed
would print only the last line.]
The
.CW a
command adds its text
.I after
dot; the
.CW i
command is like
.CW a,
but adds the text
.I before
dot.
.P1
.ft CI
i
Introduction
\&.
p
.ft
Introduction
.P2
There is also a
.CW c
command that changes (replaces) the current text,
and
.CW d
that deletes it; these are illustrated below.
.PP
To see all the text, we can specify what text to print;
for the moment, suffice it to say that
.WC 0,$
specifies the entire file.
.CW ed "" [
users would probably type
.WC 1,$ ,
which in practice is the same thing, but see below.]
.P1
.WC 0,$p
Introduction
This manual is organized in a rather haphazard manner. The first
several sections were written hastily in an attempt to provide a
general introduction to the commands in Emacs and to try to show
the method in the madness that is the Emacs command structure.
.P2
Except for the
.CW w
command described below,
.I all
commands,
including
.CW p ,
set dot to the text they touch.
Thus,
.CW a
and
.CW i
set dot to the new text,
.CW p
to the text printed, and so on.
Similarly, all commands
(except
.CW w )
by default operate on the current
text [unlike
.CW ed ,
for which some commands (such as
.CW g )
default to the entire file].
.PP
Things are not going to get very interesting until we can
set dot arbitrarily.
This is done by
.I addresses ,
which specify a piece of the file.
The address
.CW 1 ,
for example, sets dot to the first line of the file.
.P1
.WC 1p
Introduction
.WC c
.WC Preamble
.WC .
.P2
The
.CW c
command didn't need to specify dot; the
.CW p
left it on line one.
It's therefore easy to delete the first line utterly;
the last command left dot set to line one:
.P1
.WC d
.WC 1p
This manual is organized in a rather haphazard manner. The first
.P2
(Line numbers change
to reflect changes to the file.)
.PP
The address \f(CW/\f2text\f(CW/\f1
sets dot to the first appearance of
.I text ,
after dot.
.CW ed "" [
matches the first line containing
.I text .]
If
.I text
is not found, the search restarts at the beginning of the file
and continues until dot.
.P1
.WC /Emacs/p
Emacs
.P2
It's difficult to indicate typographically, but in this example no newline appears
after
.CW Emacs :
the text to be printed is the string
.CW Emacs ', `
exactly.
(The final
.CW p
may be left off \(em it is the default command.
When downloaded, however, the default is instead to select the text,
to highlight it,
and to make it visible by moving the window on the file if necessary.
Thus,
.CW /Emacs/
indicates on the display the next occurrence of the text.)
.PP
Imagine we wanted to change the word
.CW haphazard
to
.CW thoughtless .
Obviously, what's needed is another
.CW c
command, but the method used so far to insert text includes a newline.
The syntax for including text without newlines is to surround the
text with slashes (which is the same as the syntax for
text searches, but what is going on should be clear from context).
The text must appear immediately after the
.CW c
(or
.CW a
or
.CW i ).
Given this, it is easy to make the required change:
.P1
.WC /haphazard/c/thoughtless/
.WC 1p
This manual is organized in a rather thoughtless manner. The first
.P2
[Changes can always be done with a
.CW c
command, even if the text is smaller than a line].
You'll find that this way of providing text to commands is much
more common than is the multiple-lines syntax.
If you want to include a slash
.CW /
in the text, just precede it with a backslash
.CW \e ,
and use a backslash to protect a backslash itself.
.P1
.WC /Emacs/c/Emacs\e\e360/
.WC 4p
general introduction to the commands in Emacs\e360 and to try to show
.P2
We could also make this particular change by
.P1
.WC /Emacs/a/\e\e360/
.P2
.PP
This is as good a place as any to introduce the
.CW u
command, which undoes the last command.
A second
.CW u
will undo the penultimate command, and so on.
.P1
.WC u
.WC 4p
general introduction to the commands in Emacs and to try to show
.WC u
.WC 3p
This manual is organized in a rather haphazard manner. The first
.P2
Undoing can only back up; there is no way to undo a previous
.CW u .
.SH
Addresses
.PP
We've seen the simplest forms of addresses, but there is more
to learn before we can get too much further.
An address selects a region in the file \(em a substring \(em
and therefore must define the beginning and the end of a region.
Thus, the address
.CW 13
selects from the beginning of line thirteen to the end of line thirteen, and
.CW /Emacs/
selects from the beginning of the word
.CW Emacs ' `
to the end.
.PP
Addresses may be combined with a comma:
.P1
13,15
.P2
selects lines thirteen through fifteen. The definition of the comma
operator is to select from the beginning of the left hand address (the
beginning of line 13) to the end of the right hand address (the
end of line 15).
.PP
A few special simple addresses come in handy:
.CW .
(a period) represents dot, the current text,
.CW 0
(line zero) selects the null string at the beginning of the file, and
.CW $
selects the null string at the end of the file
[not the last line of the file].
Therefore,
.P1
0,13
.P2
selects from the beginning of the file to the end of line thirteen,
.P1
\&.,$
.P2
selects from the beginning of the current text to the end of the file, and
.P1
0,$
.P2
selects the whole file [that is, a single string containing the whole file,
not a list of all the lines in the file].
.PP
These are all
.I absolute
addresses: they refer to specific places in the file.
.CW sam
also has relative addresses, which depend
on the value of dot,
and in fact we have already seen one form:
.CW /Emacs/
finds the first occurrence of
.CW Emacs
searching forwards from dot.
Which occurrence of
.CW Emacs
it finds depends on the value of dot.
What if you wanted the first occurrence
.CW before
dot? Just precede the pattern with a minus sign, which reverses the direction
of the search:
.P1
-/Emacs/
.P2
In fact, the complete syntax for forward searching is
.P1
+/Emacs/
.P2
but the plus sign is the default, and in practice is rarely used.
Here is an example that includes it for clarity:
.P1
0+/Emacs/
.P2
selects the first occurrence of
.CW Emacs
in the file; read it as ``go to line 0, then search forwards for
.CW Emacs .''
Since the
.CW +
is optional, this can be written
.CW 0/Emacs/ .
Similarly,
.P1
$-/Emacs/
.P2
finds the last occurrence in the file, so
.P1
0/Emacs/,$-/Emacs/
.P2
selects the text from the first to last
.CW Emacs ,
inclusive.
Slightly more interesting:
.P1
/Emacs/+/Emacs/
.P2
(there is an implicit
.CW .+
at the beginning) selects the second
.CW Emacs
following dot.
.PP
Line numbers may also be relative.
.P1
-2
.P2
selects the second previous line, and
.P1
+5
.P2
selects the fifth following line (here the plus sign is obligatory).
.PP
Since addresses may select (and dot may be) more than one line,
we need a definition of `previous' and `following:'
`previous' means
.I
before the beginning
.R
of dot, and `following'
means
.I
after the end
.R
of dot.
For example, if the file contains \f(CWA\f(CIAA\f(CWA\f1,
with dot set to the middle two
.CW A 's
(the slanting characters),
.CW -/A/
sets dot to the first
.CW A ,
and
.CW +/A/
sets dot to the last
.CW A .
Except under odd circumstances (such as when the only occurrence of the
text in the file is already the current text), the text selected by a
search will be disjoint from dot.
.PP
To select the
.CW "troff -ms
paragraph containing dot, however long it is, use
.P1
-/.PP/,/.PP/-1
.P2
which will include the
.CW .PP
that begins the paragraph, and exclude the one that ends it.
.PP
When typing relative line number addresses, the default number is
.CW 1 ,
so the above could be written slightly more simply:
.P1
-/.PP/,/.PP/-
.P2
.PP
What does the address
.CW +1-1
or the equivalent
.CW +-
mean? It looks like it does nothing, but recall that dot need not be a
complete line of text.
.CW +1
selects the line after the end of the current text, and
.CW -1
selects the line before the beginning. Therefore
.CW +1-1
selects the line before the line after the end of dot, that is,
the complete line containing the end of dot.
We can use this construction to expand a selection to include a complete line,
say the first line in the file containing
.CW Emacs :
.P1
.WC 0/Emacs/+-p
general introduction to the commands in Emacs and to try to show
.P2
The address
.CW +-
is an idiom.
.SH
Loops
.PP
Above, we changed one occurrence of
.CW Emacs
to
.CW Emacs\e360 ,
but if the name of the editor is really changing, it would be useful
to change
.I all
instances of the name in a single command.
.CW sam
provides a command,
.CW x
(extract), for just that job.
The syntax is
\f(CWx/\f2pattern\f(CW/\f2command\f1.
For each occurrence of the pattern in the selected text,
.CW x
sets dot to the occurrence and runs command.
For example, to change
.CW Emacs
to
.CW vi,
.P1
.WC 0,$x/Emacs/c/vi/
.WC 0,$p
This manual is organized in a rather haphazard manner. The first
several sections were written hastily in an attempt to provide a
general introduction to the commands in vi and to try to show
the method in the madness that is the vi command structure.
.P2
This
works by subdividing the current text
.CW 0,$ "" (
\(em the whole file) into appearances of its textual argument
.CW Emacs ), (
and then running the command that follows
.CW c/vi/ ) (
with dot set to the text.
We can read this example as, ``find all occurrences of
.CW Emacs
in the file, and for each one,
set the current text to the occurrence and run the command
.CW c/vi/ ,
which will replace the current text by
.CW vi. ''
[This command is somewhat similar to
.CW ed 's
.CW g
command. The differences will develop below, but note that the
default address, as always, is dot rather than the whole file.]
.PP
A single
.CW u
command is sufficient to undo an
.CW x
command, regardless of how many individual changes the
.CW x
makes.
.P1
.WC u
.WC 0,$p
This manual is organized in a rather haphazard manner. The first
several sections were written hastily in an attempt to provide a
general introduction to the commands in Emacs and to try to show
the method in the madness that is the Emacs command structure.
.P2
.PP
Of course,
.CW c
is not the only command
.CW x
can run. An
.CW a
command can be used to put proprietary markings on
.CW Emacs :
.P1
.WC 0,$x/Emacs/a/{TM}/
.WC /Emacs/+-p
general introduction to the commands in Emacs{TM} and to try to show
.P2
[There is no way to see the changes as they happen, as in
.CW ed 's
.CW g/Emacs/s//&{TM}/p ;
see the section on Multiple Changes, below.]
.PP
The
.CW p
command is also useful when driven by an
.CW x ,
but be careful that you say what you mean;
.P1
.WC 0,$x/Emacs/p
EmacsEmacs
.P2
since
.CW x
sets dot to the text in the slashes, printing only that text
is not going to be very
informative. But the command that
.CW x
runs can contain addresses. For example, if we want to print all
lines containing
.CW Emacs ,
just use
.CW +- :
.P1
.WC 0,$x/Emacs/+-p
general introduction to the commands in Emacs{TM} and to try to show
the method in the madness that is the Emacs{TM} command structure.
.P2
Finally, let's restore the state of the file with another
.CW x
command, and make use of a handy shorthand:
a comma in an address has its left side default to
.CW 0 ,
and its right side default to
.CW $ ,
so the easy-to-type address
.CW ,
refers to the whole file:
.P1
.WC ",x/Emacs/ /{TM}/d
.WC ,p
This manual is organized in a rather haphazard manner. The first
several sections were written hastily in an attempt to provide a
general introduction to the commands in Emacs and to try to show
the method in the madness that is the Emacs command structure.
.P2
Notice what this
.CW x
does: for each occurrence of Emacs,
find the
.CW {TM}
that follows, and delete it.
.PP
The `text'
.CW sam
accepts
for searches in addresses and in
.CW x
commands is not simple text, but rather
.I regular\ expressions.
Unix has several distinct interpretations of regular expressions.
The form used by
.CW sam
is that of
.CW egrep (1),
including parentheses
.CW ()
for grouping and an `or' operator
.CW |
for matching strings in parallel.
.CW sam
makes two extensions:
although
.CW .
(the most overloaded character in Unix) matches any character
.I except
newline, the regular expression
.CW @
(think of it as a big dot) matches any character, even newlines;
and the character sequence
.CW \en
matches a newline character.
Replacement text, such as used in the
.CW a
and
.CW c
commands, is still plain text, but the sequence
.CW \en
represents newline in that context, too.
.PP
Here is an example. Say we wanted to double space the document, that is,
turn every newline into two newlines.
The following all do the job:
.P1
.WC ",x/\en/ a/\en/
.WC ",x/\en/ c/\en\en/
.WC ",x/$/ a/\en/
.WC ",x/^/ i/\en/
.P2
The last example is slightly different, because it puts a newline
.I before
each line; the other examples place it after.
The first two examples manipulate newlines directly
[something outside
.CW ed 's
ken]; the last two
use regular expressions:
.CW $
is the empty string at the end of a line, while
.CW ^
is the empty string at the beginning.
.PP
These solutions all have a possible drawback: if there is already a blank line
(that is, two consecutive newlines), they make it much larger (four
consecutive newlines).
A better method is to extend every group of newlines by one:
.P1
.WC ",x/\en+/ a/\en/
.P2
The regular expression operator
.CW +
means `one or more;'
.CW \en+
is identical to
.CW \en\en* .
Thus, this example
takes every sequence of newlines and adds another
to the end.
.PP
A more common example is indenting a block of text by a tab stop.
The following all work,
although the first is arguably the cleanest (the blank text in slashes is a tab):
.P1
.WC ",x/^/a/ /
.WC ",x/^/c/ /
.WC ",x/.*\en/i/ /
.P2
The last example uses the pattern (idiom, really)
.CW .*\en
to match lines:
.CW .*
matches the longest possible string of non-newline characters.
Taking initial tabs away is just as easy:
.P1
.WC ",x/^ /d
.P2
In these examples I have specified an address (the whole file), but
in practice commands like these are more likely to be run without
an address, using the value of dot set by selecting text with the mouse.
.SH
Conditionals
.PP
The
.CW x
command is a looping construct:
for each match of a regular expression,
it extracts (sets dot to) the match and runs a command.
.CW sam
also has a conditional,
.CW g :
\f(CWg/\f2pattern\f(CW/\f2command\f1
runs the command if dot contains a match of the pattern
.I
without changing the value of dot.
.R
The inverse,
.CW v ,
runs the command if dot does
.I not
contain a match of the pattern.
(The letters
.CW g
and
.CW v
are historical and have no mnemonic significance. You might
think of
.CW g
as `guard.')
.CW ed "" [
users should read the above definitions very carefully; the
.CW g
command in
.CW sam
is fundamentally different from that in
.CW ed .]
Here is an example of the difference between
.CW x
and
.CW g:
.P1
,x/Emacs/c/vi/
.P2
changes each occurrence of the word
.CW Emacs
in the file to the word
.CW vi ,
but
.P1
,g/Emacs/c/vi/
.P2
changes the
.I "whole file
to
.CW vi
if there is the word
.CW Emacs
anywhere in the file.
.PP
Neither of these commands is particularly interesting in isolation,
but they are valuable when combined with
.CW x
and with themselves.
.SH
Composition
.PP
One way to think about the
.CW x
command is that, given a selection (a value of dot)
it iterates through interesting subselections (values of dot within).
In other words, it takes a piece of text and cuts it into smaller pieces.
But the text that it cuts up may already be a piece cut by a previous
.CW x
command or selected by a
.CW g .
.CW sam 's
most interesting property is the ability to define a sequence of commands
to perform a particular task.\(dg
.FS
\(dg
The obvious analogy with shell pipelines is only partially valid,
because the individual
.CW sam
commands are all working on the same text; it is only how the text is
sliced up that is changing.
.FE
A simple example is to change all occurrences of
.CW Emacs
to
.CW emacs ;
certainly the command
.P1
.WC ",x/Emacs/ c/emacs/
.P2
will work, but we can use an
.CW x
command to save retyping most of the word
.CW Emacs :
.P1
.WC ",x/Emacs/ x/E/ c/e/
.P2
(Blanks can be used
to separate commands on a line to make them easier to read.)
What this command does is find all occurrences of
.CW Emacs
.CW ,x/Emacs/ ), (
and then
.I
with dot set to that text,
.R
find all occurrences of the letter
.CW E
.CW x/E/ ), (
and then
.I
with dot set to that text,
.R
run the command
.CW c/e/
to change the character to lower case.
Note that the address for the command \(em the whole file, specified by a comma
\(em is only given to the leftmost
piece of the command; the rest of the pieces have dot set for them by
the execution of the pieces to their left.
.PP
As another simple example, consider a problem
solved above: printing all lines in the file containing the word
.CW Emacs:
.P1
.WC ",x/.*\en/ g/Emacs/p
general introduction to the commands in Emacs and to try to show
the method in the madness that is the Emacs command structure.
.P2
This command says to break the file into lines
.CW ,x/.*\en/ ), (
and for each line that contains the string
.CW Emacs
.CW g/Emacs/ ), (
run the command
.CW p
with dot set to the line (not the match of
.CW Emacs ),
which prints the line.
To save typing, because
.CW .*\en
is a common pattern in
.CW x
commands,
if the
.CW x
is followed immediately by a space, the pattern
.CW .*\en
is assumed.
Therefore, the above could be written more succinctly:
.P1
.WC ",x g/Emacs/p
.P2
The solution we used before was
.P1
.WC ,x/Emacs/+-p
.P2
which runs the command
.CW +-p
with dot set to each match of
.CW Emacs
in the file (recall that the idiom
.CW +-p
prints the line containing the end of dot).
.PP
The two commands usually produce the same result
(the
.CW +-p
form will print a line twice if it contains
.CW Emacs
twice). Which is better?
.CW ,x/Emacs/+-p
is easier to type and will be much faster if the file is large and
there are few occurrences of the string, but it is really an odd special case.
.CW ",x/.*\en/ g/Emacs/p
is slower \(em it breaks each line out separately, then examines
it for a match \(em but is conceptually cleaner, and generalizes more easily.
For example, consider the following piece of the Emacs manual:
.P1
command name="append-to-file", key="[unbound]"
Takes the contents of the current buffer and appends it to the
named file. If the files doesn't exist, it will be created.
command name="apropos", key="ESC-?"
Prompts for a keyword and then prints a list of those commands
whose short description contains that keyword. For example,
if you forget which commands deal with windows, just type
"@b[ESC-?]@t[window]@b[ESC]".
\&\f2and so on\f(CW
.P2
This text consists of groups of non-empty lines, with a simple format
for the text within each group.
Imagine that we wanted to find the description of the `apropos'
command.
The problem is to break the file into individual descriptions,
and then to find the description of `apropos' and to print it.
The solution is straightforward:
.P1
.WC ,x/(.+\en)+/\ g/command\ name="apropos"/p
command name="apropos", key="ESC-?"
Prompts for a keyword and then prints a list of those commands
whose short description contains that keyword. For example,
if you forget which commands deal with windows, just type
"@b[ESC-?]@t[window]@b[ESC]".
.P2
The regular expression
.CW (.+\en)+
matches one or more lines with one or more characters each, that is,
the text between blank lines, so
.CW ,x/(.+\en)+/
extracts each description; then
.CW g/command\ name="apropos"/
selects the description for `apropos' and
.CW p
prints it.
.PP
Imagine that we had a C program containing the variable
.CW n ,
but we wanted to change it to
.CW num .
This command is a first cut:
.P1
.WC ",x/n/ c/num/
.P2
but is obviously flawed: it will change all
.CW n 's
in the file, not just the
.I identifier
.CW n .
A better solution is to use an
.CW x
command to extract the identifiers, and then use
.CW g
to find the
.CW n 's:
.P1
.WC ",x/[a-zA-Z_][a-zA-Z_0-9]*/ g/n/ v/../ c/num/
.P2
It looks awful, but it's fairly easy to understand when read
left to right.
A C identifier is an alphabetic or underscore followed by zero or more
alphanumerics or underscores, that is, matches of the regular expression
.CW [a-zA-Z_][a-zA-Z_0-9]* .
The
.CW g
command selects those identifiers containing
.CW n ,
and the
.CW v
is a trick: it rejects those identifiers containing more than one
character. Hence the
.CW c/num/
applies only to free-standing
.CW n 's.
.PP
There is still a problem here:
we don't want to change
.CW n 's
that are part of the character constant
.CW \en .
There is a command
.CW y ,
complementary to
.CW x ,
that is just what we need:
\f(CWy/\f2pattern\f(CW/\f2command\f1
runs the command on the pieces of text
.I between
matches of the pattern;
if
.CW x
selects,
.CW y
rejects.
Here is the final command:
.P1
.WC ",y/\e\en/ x/[a-zA-Z_][a-zA-Z_0-9]*/ g/n/ v/../ c/num/
.P2
The
.CW y/\e\en/
(with backslash doubled to make it a literal character)
removes the two-character sequence
.CW \en
from consideration, so the rest of the command will not touch it.
There is more we could do here; for example, another
.CW y
could be prefixed to protect comments in the code.
I won't elaborate the example any further, but you should have
an idea of the way in which the looping and conditional commands
in
.CW sam
may be composed to do interesting things.
.SH
Grouping
.PP
There is another way to arrange commands.
By enclosing them in brace brackets
.CW {} ,
commands may be applied in parallel.
This example uses the
.CW =
command, which reports the line and character numbers of dot,
together with
.CW p ,
to report on appearances of
.CW Emacs
in our original file:
.P1
.WC ,p
This manual is organized in a rather haphazard manner. The first
several sections were written hastily in an attempt to provide a
general introduction to the commands in Emacs and to try to show
the method in the madness that is the Emacs command structure.
.ft CI
,x/Emacs/{
=
+-p
}
.ft
3; #171,#176
general introduction to the commands in Emacs and to try to show
4; #234,#239
the method in the madness that is the Emacs command structure.
.P2
(The number before the semicolon is the line number;
the numbers beginning with
.CW #
are character numbers.)
As a more interesting example, consider changing all occurrences of
.CW Emacs
to
.CW vi
and vice versa. We can type
.P1
.ft CI
,x/Emacs|vi/{
g/Emacs/ c/vi/
g/vi/ c/Emacs/
}
.ft
.P2
or even
.P1
.ft CI
,x/[a-zA-Z]+/{
g/Emacs/ v/....../ c/vi/
g/vi/ v/.../ c/Emacs/
}
.ft
.P2
to make sure we don't change strings embedded in words.
.SH
Multiple Changes
.PP
You might wonder why, once
.CW Emacs
has been changed to
.CW vi
in the above example,
the second command in the braces doesn't put it back again.
The reason is that the commands are run in parallel:
within any top-level
.CW sam
command, all changes to the file refer to the state of the file
before any of the changes in that command are made.
After all the changes have been determined, they are all applied
simultaneously.
.PP
This means, as mentioned, that commands within a compound
command see the state of the file before any of the changes apply.
This method of evaluation makes some things easier (such as the exchange of
.CW Emacs
and
.CW vi ),
and some things harder.
For instance, it is impossible to use a
.CW p
command to print the changes as they happen,
because they haven't happened when the
.CW p
is executed.
An indirect ramification is that changes must occur in forward
order through the file,
and must not overlap.
.SH
Unix
.PP
.CW sam
has a few commands to connect to Unix processes.
The simplest is
.CW ! ,
which runs the command with input and output connected to the terminal.
.P1
.WC !date
Wed May 28 23:25:21 EDT 1986
!
.P2
(When downloaded, the input is connected to
.CW /dev/null
and only the first few lines of output are printed;
any overflow is stored in
.CW $HOME/sam.err .)
The final
.CW !
is a prompt to indicate when the command completes.
.PP
Slightly more interesting is
.CW > ,
which provides the current text as standard input to the Unix command:
.P1
.WC "1,2 >wc
2 22 131
!
.P2
The complement of
.CW >
is, naturally,
.CW < :
it replaces the current text with the standard output of the Unix command:
.P1
.WC "1 <date
!
.WC 1p
Wed May 28 23:26:44 EDT 1986
.P2
The last command is
.CW | ,
which is a combination of
.CW <
and
.CW > :
the current text is provided as standard input to the Unix command,
and the Unix command's standard output is collected and used to
replace the original text.
For example,
.P1
.WC ",| sort
.P2
runs
.CW sort (1)
on the file, sorting the lines of the text lexicographically.
Note that
.CW < ,
.CW >
and
.CW |
are
.CW sam
commands, not Unix shell operators.
.PP
The next example converts all appearances of
.CW Emacs
to upper case using
.CW tr (1):
.P1
.WC ",x/Emacs/ | tr a-z A-Z
.P2
.CW tr
is run once for each occurrence of
.CW Emacs .
Of course, you could do this example more efficiently with a simple
.CW c
command, but here's a trickier one:
given a Unix mail box as input,
convert all the
.CW Subject
headers to distinct fortunes:
.P1
.WC ",x/^Subject:.*\en/ x/[^:]*\en/ < /usr/games/fortune
.P2
(The regular expression
.CW [^:]
refers to any character
.I except
.CW :
and newline; the negation operator
.CW ^
excludes newline from the list of characters.)
Again,
.CW /usr/games/fortune
is run once for each
.CW Subject
line, so each
.CW Subject
line is changed to a different fortune.
.SH
A few other text commands
.PP
For completeness, I should mention three other commands that
manipulate text. The
.CW m
command moves the current text to after the text specified by the
(obligatory) address after the command.
Thus
.P1
.WC "/Emacs/+- m 0
.P2
moves the next line containing
.CW Emacs
to the beginning of the file.
Similarly,
.CW t
(another historic character) copies the text:
.P1
.WC "/Emacs/+- t 0
.P2
would make, at the beginning of the file, a copy of the next line
containing
.CW Emacs .
.PP
The third command is more interesting: it makes substitutions.
Its syntax is
\f(CWs/\f2pattern\f(CW/\f2replacement\f(CW/\f1.
Within the current text, it finds the first occurrence of
the pattern and replaces it by the replacement text,
leaving dot set to the entire address of the substitution.
.P1
.WC 1p
This manual is organized in a rather haphazard manner. The first
.WC s/haphazard/thoughtless/
.WC p
This manual is organized in a rather thoughtless manner. The first
.P2
Occurrences of the character
.CW &
in the replacement text stand for the text matching the pattern.
.P1
.WC s/T/"&&&&"/
.WC p
"TTTT"his manual is organized in a rather thoughtless manner. The first
.P2
There are two variants. The first is that a number may be specified
after the
.CW s ,
to indicate which occurrence of the pattern to substitute; the default
is the first.
.P1
.WC s2/is/was/
.WC p
"TTTT"his manual was organized in a rather thoughtless manner. The first
.P2
The second is that suffixing a
.CW g
(global) causes replacement of all occurrences, not just the first.
.P1
.WC s/[a-zA-Z]/x/g
.WC p
"xxxx"xxx xxxxxx xxx xxxxxxxxx xx x xxxxxx xxxxxxxxxxx xxxxxxx xxx xxxxx
.P2
Notice that in all these examples
dot is left
set to the entire line.
.PP
[The substitute command is vital to
.CW ed,
because it is the only way to make changes within a line.
It is less valuable in
.CW sam ,
in which the concept of a line is much less important.
For example, many
.CW ed
substitution idioms are handled well by
.CW sam 's
basic commands. Consider the commands
.P1
s/good/bad/
s/good//
s/good/& bye/
.P2
which are equivalent in
.CW sam
to
.P1
/good/c/bad/
/good/d
/good/a/ bye/
.P2
and for which the context search is likely unnecessary because the desired
text is already dot.
Also, beware this
.CW ed
idiom:
.P1
1,$s/good/bad/
.P2
which changes the first
.CW good
on each line; the same command in
.CW sam
will only change the first one in the whole file.
The correct
.CW sam
version is
.P1
,x s/good/bad/
.P2
but what is more likely meant is
.P1
,x/good/ c/bad/
.P2
.CW sam
operates under different rules.]
.SH
Files
.PP
So far, we have only been working with a single file,
but
.CW sam
is a multi-file editor.
Only one file may be edited at a time, but
it is easy to change which file is the `current' file for editing.
To see how to do this, we need a
.CW sam
with a few files;
the easiest way to do this is to start it
with a list of Unix file names to edit.
.P1
$ \f(CIecho *.ms\f(CW
conquest.ms death.ms emacs.ms famine.ms slaughter.ms
$ \f(CIsam -d *.ms\f(CW
-. conquest.ms
.P2
(I'm sorry the Horsemen don't appear in liturgical order.)
The line printed by
.CW sam
is an indication that the Unix file
.CW conquest.ms
has been read, and is now the current file.
.CW sam
does not read the Unix file until
the associated
.CW sam
file becomes current.
.PP
The
.CW n
command prints the names of all the files:
.P1
.WC n
-. conquest.ms
- death.ms
- emacs.ms
- famine.ms
- slaughter.ms
.P2
This list is also available in the menu on mouse button 3.
The command
.CW f
tells the name of just the current file:
.P1
.WC f
-. conquest.ms
.P2
The characters to the left of the file name encode helpful information about
the file.
The minus sign becomes a plus sign if the file has a window open, and an
asterisk if more than one is open.
The period (another meaning of dot) identifies the current file.
The leading blank changes to an apostrophe if the file is different
from the contents of the associated Unix file, as far as
.CW sam
knows.
This becomes evident if we make a change.
.P1
.WC 1d
.WC f
\&'-. conquest.ms
.P2
If the file is restored by an undo command, the apostrophe disappears.
.P1
.WC u
.WC f
-. conquest.ms
.P2
The file name may be changed by providing a new name with the
.CW f
command:
.P1
.CW "f pestilence.ms
\&'-. pestilence.ms
.P2
.WC f
prints the new status of the file,
that is, it changes the name if one is provided, and prints the
name regardless.
A file name change may also be undone.
.P1
.WC u
.WC f
-. conquest.ms
.P2
.PP
When
.CW sam
is downloaded, the current file may be changed simply by selecting
the desired file from the menu (selecting the same file subsequently
cycles through the windows opened on the file).
Otherwise, the
.CW b
command can be used to choose the desired file:\(dg
.FS
\(dg A bug prevents the
.CW b
command from working when downloaded.
Because the menu is more convenient anyway, and
because the method
of choosing files from the command language is slated to change,
the bug hasn't been fixed.
.FE
.P1
.WC "b emacs.ms
-. emacs.ms
.P2
Again,
.CW sam
prints the name (actually, executes an implicit
.CW f
command) because the Unix file
.CW emacs.ms
is being read for the first time.
It is an error to ask for a file
.CW sam
doesn't know about, but the
.CW B
command will prime
.CW sam 's
menu with a new file, and make it current.
.P1
.WC "b flood.pic
?no such file `flood.pic'
.WC "B flood.pic
-. flood.pic
.WC n
- conquest.ms
- death.ms
- emacs.ms
- famine.ms
-. flood.pic
- slaughter.ms
.P2
Both
.CW b
and
.CW B
will accept a list of file names.
.CW b
simply takes the first file in the list, but
.CW B
loads them all.
The list may be typed on one line \(em
.P1
.WC "B devil.tex satan.tex 666.tex emacs.tex
.P2
\(em or generated by a Unix command \(em
.P1
.WC "B <echo *.tex
.P2
The latter form requires a Unix command;
.CW sam
does not understand the shell file name metacharacters, so
.CW "B *.tex
attempts to load a single file named
.CW *.tex .
(The
.CW <
form is of course derived from
.CW sam 's
.CW <
command.)
.CW echo
is not the only useful command to run subservient to
.CW B ;
for example,
.P1
.WC "B <grep -l Emacs *
.P2
will load only those files containing the string
.CW Emacs .
Finally, a special case: a
.CW B
with no arguments creates an empty, nameless file within
.CW sam .
.PP
The complement of
.CW B
is
.CW D :
.P1
.WC "D devil.tex satan.tex 666.tex emacs.tex
.P2
eradicates the files from
.CW sam 's
memory (not from the Unix machine's disc).
.CW D
without any file names removes the current file from
.CW sam .
.PP
There are three other commands that relate the current file
to Unix files.
The
.CW w
command writes the file to disc;
without arguments, it writes the entire file to the Unix file associated
with the current file in
.CW sam
(it is the only command whose default address is not dot).
Of course, you can specify an address to be written,
and a different file name, with the obvious syntax:
.P1
.WC "1,2w /tmp/revelations
/tmp/revelations: #44
.P2
.CW sam
responds with the file name and the number of characters written to the file.
The
.CW write
command on the button 3 menu is identical in function to an unadorned
.CW w
command.
.PP
The other two commands,
.CW e
and
.CW r ,
read data from Unix files.
The
.CW e
command clears out the current file,
reads the data from the named file (or uses the current file's old name if
none is explicitly provided), and sets the file name.
It's much like a
.CW B
command, but puts the information in the current file instead of a new one.
.CW e
without any file name is therefore an easy way to refresh
.CW sam 's
copy of a Unix file.
[Unlike in
.CW ed ,
.CW e
doesn't complain if the file is modified. The principle is not
to protect against things that can be undone if wrong.]
Since its job is to replace the whole text,
.CW e
never takes an address.
.PP
The
.CW r
command is like
.CW e ,
but it doesn't clear the file:
the text in the Unix file replaces dot, or the specified text if an
address is given.
.P1
.WC "r emacs.ms
.P2
has essentially the effect of
.P1
.WC "<cat emacs.ms
.P2
The commands
.CW r
and
.CW w
will set the name of the file if the current file has no name already defined;
.CW e
sets the name even if the file already has one.
.PP
There is a command, analogous to
.CW x ,
that iterates over files instead of pieces of text:
.CW X
(capital
.CW x ).
The syntax is easy; it's just like that of
.CW x
\(em \f(CWX/\f2pattern\f(CW/\f2command\f1.
(The complementary command is
.CW Y ,
analogous to
.CW y .)
The effect is to run the command in each file whose menu entry
(that is, whose line printed by an
.CW f
command) matches the pattern.
For example, since an apostrophe identifies modified files,
.P1
.WC "X/'/ w
.P2
writes the changed files out to disc.
Here is a longer example: find all uses of a particular variable
in the C source files:
.P1
.WC "X/\e.c$/ ,x/variable/+-p
.P2
We can use an
.CW f
command to identify which file the variable appears in:
.P1
.ft CI
X/\e.c$/ ,g/variable/ {
f
,x/variable/+-{
=
p
}
}
.ft
.P2
Here, the
.CW g
command guarantees that only the names of files containing the variable
will be printed (but beware that
.CW sam
may confuse matters by printing the names of files it reads in during
the command).
The
.CW =
command shows where in the file the variable appears, and the
.CW p
command prints the line.
.PP
The
.CW D
command is handy as the target of an
.CW X .
This example deletes from the menu all C files that do not contain
a particular variable:
.P1
.WC "X/\e.c$/ ,v/variable/ D
.P2
If no pattern is provided for the
.CW X ,
the command (which defaults to
.CW f )
is run in all files, so
.P1
.WC "X D
.P2
cleans
.CW sam
up for a fresh start.
.PP
But rather than working any further, let's stop now:
.P1
.WC q
$
.P2
.fi
.PP
Some of the file manipulating commands can be undone:
undoing a
.CW f ,
.CW e ,
or
.CW r
restores the previous state of the file,
but
.CW w ,
.CW B
and
.CW D
are irrevocable.
And, of course, so is
.CW q .