560 lines
11 KiB
Text
Executable file
560 lines
11 KiB
Text
Executable file
.TH AWK 1
|
|
.SH NAME
|
|
awk \- pattern-directed scanning and processing language
|
|
.SH SYNOPSIS
|
|
.B awk
|
|
[
|
|
.B -F
|
|
.I fs
|
|
]
|
|
[
|
|
.B -d
|
|
]
|
|
[
|
|
.BI -mf
|
|
.I n
|
|
]
|
|
[
|
|
.B -mr
|
|
.I n
|
|
]
|
|
[
|
|
.B -safe
|
|
]
|
|
[
|
|
.B -v
|
|
.I var=value
|
|
]
|
|
[
|
|
.B -f
|
|
.I progfile
|
|
|
|
|
.I prog
|
|
]
|
|
[
|
|
.I file ...
|
|
]
|
|
.SH DESCRIPTION
|
|
.I Awk
|
|
scans each input
|
|
.I file
|
|
for lines that match any of a set of patterns specified literally in
|
|
.I prog
|
|
or in one or more files
|
|
specified as
|
|
.B -f
|
|
.IR progfile .
|
|
With each pattern
|
|
there can be an associated action that will be performed
|
|
when a line of a
|
|
.I file
|
|
matches the pattern.
|
|
Each line is matched against the
|
|
pattern portion of every pattern-action statement;
|
|
the associated action is performed for each matched pattern.
|
|
The file name
|
|
.L -
|
|
means the standard input.
|
|
Any
|
|
.IR file
|
|
of the form
|
|
.I var=value
|
|
is treated as an assignment, not a file name,
|
|
and is executed at the time it would have been opened if it were a file name.
|
|
The option
|
|
.B -v
|
|
followed by
|
|
.I var=value
|
|
is an assignment to be done before the program
|
|
is executed;
|
|
any number of
|
|
.B -v
|
|
options may be present.
|
|
.B -F
|
|
.IR fs
|
|
option defines the input field separator to be the regular expression
|
|
.IR fs .
|
|
.PP
|
|
An input line is normally made up of fields separated by white space,
|
|
or by regular expression
|
|
.BR FS .
|
|
The fields are denoted
|
|
.BR $1 ,
|
|
.BR $2 ,
|
|
\&..., while
|
|
.B $0
|
|
refers to the entire line.
|
|
If
|
|
.BR FS
|
|
is null, the input line is split into one field per character.
|
|
.PP
|
|
To compensate for inadequate implementation of storage management,
|
|
the
|
|
.B -mr
|
|
option can be used to set the maximum size of the input record,
|
|
and the
|
|
.B -mf
|
|
option to set the maximum number of fields.
|
|
.PP
|
|
The
|
|
.B -safe
|
|
option causes
|
|
.I awk
|
|
to run in
|
|
``safe mode,''
|
|
in which it is not allowed to
|
|
run shell commands or open files
|
|
and the environment is not made available
|
|
in the
|
|
.B ENVIRON
|
|
variable.
|
|
.PP
|
|
A pattern-action statement has the form
|
|
.IP
|
|
.IB pattern " { " action " }
|
|
.PP
|
|
A missing
|
|
.BI { " action " }
|
|
means print the line;
|
|
a missing pattern always matches.
|
|
Pattern-action statements are separated by newlines or semicolons.
|
|
.PP
|
|
An action is a sequence of statements.
|
|
A statement can be one of the following:
|
|
.PP
|
|
.EX
|
|
.ta \w'\fLdelete array[expression]'u
|
|
if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
|
|
while(\fI expression \fP)\fI statement\fP
|
|
for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
|
|
for(\fI var \fPin\fI array \fP)\fI statement\fP
|
|
do\fI statement \fPwhile(\fI expression \fP)
|
|
break
|
|
continue
|
|
{\fR [\fP\fI statement ... \fP\fR] \fP}
|
|
\fIexpression\fP #\fR commonly\fP\fI var = expression\fP
|
|
print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
|
|
printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
|
|
return\fR [ \fP\fIexpression \fP\fR]\fP
|
|
next #\fR skip remaining patterns on this input line\fP
|
|
nextfile #\fR skip rest of this file, open next, start at top\fP
|
|
delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
|
|
delete\fI array\fP #\fR delete all elements of array\fP
|
|
exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
|
|
.EE
|
|
.DT
|
|
.PP
|
|
Statements are terminated by
|
|
semicolons, newlines or right braces.
|
|
An empty
|
|
.I expression-list
|
|
stands for
|
|
.BR $0 .
|
|
String constants are quoted \&\fL"\ "\fR,
|
|
with the usual C escapes recognized within.
|
|
Expressions take on string or numeric values as appropriate,
|
|
and are built using the operators
|
|
.B + \- * / % ^
|
|
(exponentiation), and concatenation (indicated by white space).
|
|
The operators
|
|
.B
|
|
! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
|
|
are also available in expressions.
|
|
Variables may be scalars, array elements
|
|
(denoted
|
|
.IB x [ i ] )
|
|
or fields.
|
|
Variables are initialized to the null string.
|
|
Array subscripts may be any string,
|
|
not necessarily numeric;
|
|
this allows for a form of associative memory.
|
|
Multiple subscripts such as
|
|
.B [i,j,k]
|
|
are permitted; the constituents are concatenated,
|
|
separated by the value of
|
|
.BR SUBSEP .
|
|
.PP
|
|
The
|
|
.B print
|
|
statement prints its arguments on the standard output
|
|
(or on a file if
|
|
.BI > file
|
|
or
|
|
.BI >> file
|
|
is present or on a pipe if
|
|
.BI | cmd
|
|
is present), separated by the current output field separator,
|
|
and terminated by the output record separator.
|
|
.I file
|
|
and
|
|
.I cmd
|
|
may be literal names or parenthesized expressions;
|
|
identical string values in different statements denote
|
|
the same open file.
|
|
The
|
|
.B printf
|
|
statement formats its expression list according to the format
|
|
(see
|
|
.IR fprintf (2)) .
|
|
The built-in function
|
|
.BI close( expr )
|
|
closes the file or pipe
|
|
.IR expr .
|
|
The built-in function
|
|
.BI fflush( expr )
|
|
flushes any buffered output for the file or pipe
|
|
.IR expr .
|
|
If
|
|
.IR expr
|
|
is omitted or is a null string, all open files are flushed.
|
|
.PP
|
|
The mathematical functions
|
|
.BR exp ,
|
|
.BR log ,
|
|
.BR sqrt ,
|
|
.BR sin ,
|
|
.BR cos ,
|
|
and
|
|
.BR atan2
|
|
are built in.
|
|
Other built-in functions:
|
|
.TF length
|
|
.TP
|
|
.B length
|
|
If its argument is a string, the string's length is returned.
|
|
If its argument is an array, the number of subscripts in the array is returned.
|
|
If no argument, the length of
|
|
.B $0
|
|
is returned.
|
|
.TP
|
|
.B rand
|
|
random number on (0,1)
|
|
.TP
|
|
.B srand
|
|
sets seed for
|
|
.B rand
|
|
and returns the previous seed.
|
|
.TP
|
|
.B int
|
|
truncates to an integer value
|
|
.TP
|
|
.B utf
|
|
converts its numerical argument, a character number, to a
|
|
.SM UTF
|
|
string
|
|
.TP
|
|
.BI substr( s , " m" , " n\fL)
|
|
the
|
|
.IR n -character
|
|
substring of
|
|
.I s
|
|
that begins at position
|
|
.IR m
|
|
counted from 1.
|
|
.TP
|
|
.BI index( s , " t" )
|
|
the position in
|
|
.I s
|
|
where the string
|
|
.I t
|
|
occurs, or 0 if it does not.
|
|
.TP
|
|
.BI match( s , " r" )
|
|
the position in
|
|
.I s
|
|
where the regular expression
|
|
.I r
|
|
occurs, or 0 if it does not.
|
|
The variables
|
|
.B RSTART
|
|
and
|
|
.B RLENGTH
|
|
are set to the position and length of the matched string.
|
|
.TP
|
|
.BI split( s , " a" , " fs\fL)
|
|
splits the string
|
|
.I s
|
|
into array elements
|
|
.IB a [1]\f1,
|
|
.IB a [2]\f1,
|
|
\&...,
|
|
.IB a [ n ]\f1,
|
|
and returns
|
|
.IR n .
|
|
The separation is done with the regular expression
|
|
.I fs
|
|
or with the field separator
|
|
.B FS
|
|
if
|
|
.I fs
|
|
is not given.
|
|
An empty string as field separator splits the string
|
|
into one array element per character.
|
|
.TP
|
|
.BI sub( r , " t" , " s\fL)
|
|
substitutes
|
|
.I t
|
|
for the first occurrence of the regular expression
|
|
.I r
|
|
in the string
|
|
.IR s .
|
|
If
|
|
.I s
|
|
is not given,
|
|
.B $0
|
|
is used.
|
|
.TP
|
|
.B gsub
|
|
same as
|
|
.B sub
|
|
except that all occurrences of the regular expression
|
|
are replaced;
|
|
.B sub
|
|
and
|
|
.B gsub
|
|
return the number of replacements.
|
|
.TP
|
|
.BI sprintf( fmt , " expr" , " ...\fL)
|
|
the string resulting from formatting
|
|
.I expr ...
|
|
according to the
|
|
.I printf
|
|
format
|
|
.I fmt
|
|
.TP
|
|
.BI system( cmd )
|
|
executes
|
|
.I cmd
|
|
and returns its exit status
|
|
.TP
|
|
.BI tolower( str )
|
|
returns a copy of
|
|
.I str
|
|
with all upper-case characters translated to their
|
|
corresponding lower-case equivalents.
|
|
.TP
|
|
.BI toupper( str )
|
|
returns a copy of
|
|
.I str
|
|
with all lower-case characters translated to their
|
|
corresponding upper-case equivalents.
|
|
.PD
|
|
.PP
|
|
The ``function''
|
|
.B getline
|
|
sets
|
|
.B $0
|
|
to the next input record from the current input file;
|
|
.B getline
|
|
.BI < file
|
|
sets
|
|
.B $0
|
|
to the next record from
|
|
.IR file .
|
|
.B getline
|
|
.I x
|
|
sets variable
|
|
.I x
|
|
instead.
|
|
Finally,
|
|
.IB cmd " | getline
|
|
pipes the output of
|
|
.I cmd
|
|
into
|
|
.BR getline ;
|
|
each call of
|
|
.B getline
|
|
returns the next line of output from
|
|
.IR cmd .
|
|
In all cases,
|
|
.B getline
|
|
returns 1 for a successful input,
|
|
0 for end of file, and \-1 for an error.
|
|
.PP
|
|
Patterns are arbitrary Boolean combinations
|
|
(with
|
|
.BR "! || &&" )
|
|
of regular expressions and
|
|
relational expressions.
|
|
Regular expressions are as in
|
|
.IR regexp (6).
|
|
Isolated regular expressions
|
|
in a pattern apply to the entire line.
|
|
Regular expressions may also occur in
|
|
relational expressions, using the operators
|
|
.BR ~
|
|
and
|
|
.BR !~ .
|
|
.BI / re /
|
|
is a constant regular expression;
|
|
any string (constant or variable) may be used
|
|
as a regular expression, except in the position of an isolated regular expression
|
|
in a pattern.
|
|
.PP
|
|
A pattern may consist of two patterns separated by a comma;
|
|
in this case, the action is performed for all lines
|
|
from an occurrence of the first pattern
|
|
though an occurrence of the second.
|
|
.PP
|
|
A relational expression is one of the following:
|
|
.IP
|
|
.I expression matchop regular-expression
|
|
.br
|
|
.I expression relop expression
|
|
.br
|
|
.IB expression " in " array-name
|
|
.br
|
|
.BI ( expr , expr,... ") in " array-name
|
|
.PP
|
|
where a
|
|
.I relop
|
|
is any of the six relational operators in C,
|
|
and a
|
|
.I matchop
|
|
is either
|
|
.B ~
|
|
(matches)
|
|
or
|
|
.B !~
|
|
(does not match).
|
|
A conditional is an arithmetic expression,
|
|
a relational expression,
|
|
or a Boolean combination
|
|
of these.
|
|
.PP
|
|
The special patterns
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
may be used to capture control before the first input line is read
|
|
and after the last.
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
do not combine with other patterns.
|
|
.PP
|
|
Variable names with special meanings:
|
|
.TF FILENAME
|
|
.TP
|
|
.B CONVFMT
|
|
conversion format used when converting numbers
|
|
(default
|
|
.BR "%.6g" )
|
|
.TP
|
|
.B FS
|
|
regular expression used to separate fields; also settable
|
|
by option
|
|
.BI \-F fs\f1.
|
|
.TP
|
|
.BR NF
|
|
number of fields in the current record
|
|
.TP
|
|
.B NR
|
|
ordinal number of the current record
|
|
.TP
|
|
.B FNR
|
|
ordinal number of the current record in the current file
|
|
.TP
|
|
.B FILENAME
|
|
the name of the current input file
|
|
.TP
|
|
.B RS
|
|
input record separator (default newline)
|
|
.TP
|
|
.B OFS
|
|
output field separator (default blank)
|
|
.TP
|
|
.B ORS
|
|
output record separator (default newline)
|
|
.TP
|
|
.B OFMT
|
|
output format for numbers (default
|
|
.BR "%.6g" )
|
|
.TP
|
|
.B SUBSEP
|
|
separates multiple subscripts (default 034)
|
|
.TP
|
|
.B ARGC
|
|
argument count, assignable
|
|
.TP
|
|
.B ARGV
|
|
argument array, assignable;
|
|
non-null members are taken as file names
|
|
.TP
|
|
.B ENVIRON
|
|
array of environment variables; subscripts are names.
|
|
.PD
|
|
.PP
|
|
Functions may be defined (at the position of a pattern-action statement) thus:
|
|
.IP
|
|
.L
|
|
function foo(a, b, c) { ...; return x }
|
|
.PP
|
|
Parameters are passed by value if scalar and by reference if array name;
|
|
functions may be called recursively.
|
|
Parameters are local to the function; all other variables are global.
|
|
Thus local variables may be created by providing excess parameters in
|
|
the function definition.
|
|
.SH EXAMPLES
|
|
.TP
|
|
.L
|
|
length($0) > 72
|
|
Print lines longer than 72 characters.
|
|
.TP
|
|
.L
|
|
{ print $2, $1 }
|
|
Print first two fields in opposite order.
|
|
.PP
|
|
.EX
|
|
BEGIN { FS = ",[ \et]*|[ \et]+" }
|
|
{ print $2, $1 }
|
|
.EE
|
|
.ns
|
|
.IP
|
|
Same, with input fields separated by comma and/or blanks and tabs.
|
|
.PP
|
|
.EX
|
|
{ s += $1 }
|
|
END { print "sum is", s, " average is", s/NR }
|
|
.EE
|
|
.ns
|
|
.IP
|
|
Add up first column, print sum and average.
|
|
.TP
|
|
.L
|
|
/start/, /stop/
|
|
Print all lines between start/stop pairs.
|
|
.PP
|
|
.EX
|
|
BEGIN { # Simulate echo(1)
|
|
for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
|
|
printf "\en"
|
|
exit }
|
|
.EE
|
|
.SH SOURCE
|
|
.B /sys/src/cmd/awk
|
|
.SH SEE ALSO
|
|
.IR sed (1),
|
|
.IR regexp (6),
|
|
.br
|
|
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
|
|
.I
|
|
The AWK Programming Language,
|
|
Addison-Wesley, 1988. ISBN 0-201-07981-X
|
|
.SH BUGS
|
|
There are no explicit conversions between numbers and strings.
|
|
To force an expression to be treated as a number add 0 to it;
|
|
to force it to be treated as a string concatenate
|
|
\&\fL""\fP to it.
|
|
.br
|
|
The scope rules for variables in functions are a botch;
|
|
the syntax is worse.
|
|
.br
|
|
UTF is not always dealt with correctly,
|
|
though
|
|
.I awk
|
|
does make an attempt to do so.
|
|
The
|
|
.I split
|
|
function with an empty string as final argument now copes
|
|
with UTF in the string being split.
|