Links & files
Awk reference and ressources |
Example files |
๐ Patterns, Actions, and Variables#
Pattern Elements#
Patterns control the execution of rules. A rule is executed when its pattern matches the current input record.
/regular expression/ |
a regular expression |
expression |
a single expression |
begpat, endpat |
a pair of patterns |
BEGIN |
special patterns to supply startup or cleanup actions |
END |
|
BEGINFILE |
special patterns to supply startup or cleanup actions perfile basis |
ENDFILE |
|
empty |
the empty pattern matches every input record |
Expressions as Patterns#
awk '$1 == "li" { print $2 }' mail-list
Regular Expressions as Patterns#
awk '$1 ~ /li/ { print $2 }' mail-list
awk '/edu/ && /li/' mail-list
awk '/edu/ || /li/' mail-list
awk '! /li/' mail-list
Specifying Record Ranges with Patterns#
A range pattern is used to match ranges of consecutive input records.
begpat,endpat
cat myfile
no first 100 65
on user1 1 12
on user2 4 345
off user3 12 73
no last 2 123
awk '$1 == "on", $1 == "off" { printf "%s %-3s %-3s\n", $2, $3, $4 }' myfile
The BEGIN and END Special Patterns#
Startup and cleanup actions#
awk 'BEGIN { print "Analysis of \"li\"" }
/li/ { ++n }
END { print "\"li\" appears in", n, "records." }' mail-list
Analysis of "li"
"li" appears in 4 records.
Input/output from BEGIN and END rules#
Danger
Be aware of referencing $0.
The next and nextfile statements are not allowed.
The BEGINFILE and ENDFILE Special Patterns#
FILENAME is set to the name of the current file, and FNR is set to zero. ERRNO is set. The next statement is not allowed.
The Empty Pattern#
An empty pattern match every input record.
awk '{ print $1 }' mail-list
Using Shell Variables in Programs#
Variable substitution via quoting:
printf "Enter search pattern: "; read pattern
Enter search pattern: ri
awk '$1 ~ '"/$pattern/"'{ nmatches++ }
END { print nmatches, "found."}' mail-list
1 found.
awkโs variable assignment, assign the shell variableโs value to an awk variable.
printf "Enter search pattern: "; read pattern
Enter search pattern: li
awk -v pat="$pattern" '$1 ~ pat { nmatches++ }
END { print nmatches, "found."}' mail-list
2 found.
Actions#
awk '/li/' mail-list
Types of statements:
Expressions
Control statements
Compound statements
Input statements
Output statements
Deletion statements
Control Statements in Actions#
The if-else Statement#
if (condition) then-body [else else-body]
awk '{ if ( $2 ~ /99/ ) print }' mail-list
The while Statement#
awk '{ i = 1 ; while ( i <= 3 ) { print $i ; i++ } }' inventory-shipped
The do-while Statement#
awk '{ i = 1 ; do { print $0 ; i++ } while ( i <= 5 ) }' inventory-shipped
The for Statement#
awk '{ for ( i = 1 ; i <= 3 ; i++ ) print $i }' inventory-shipped
The switch Statement#
awk '{ switch ($1) {
case "Bill":
print $1, "was here"
break
case "Julie":
print $1, "was here"
break
default:
break
} }' mail-list
The break Statement#
The break statement jumps out of the innermost for , while , or do loop.
The continue Statement#
The continue statement is used only inside for , while , and do loops causing the next cycle around the loop to begin immediately.
The next Statement#
The next statement forces awk to immediately stop processing the current record and go on to the next record.
awk '{ if ( $1 !~ /Bill|Julie/ ) next ; else print }' mail-list
The nextfile Statement#
The nextfile statement instructs awk to stop processing the current datafile.
awk '{ if ( $1 !~ /Bill|Julie/ ) print ; else nextfile }' mail-list
The exit Statement#
exit [return code]
awk '{ if ( $1 == "Bill" ) exit 1 "Bill scares me" ; else print }' mail-list
Predefined Variables#
Built-in Variables That Control awk#
BINMODE # |
specifies use of binary mode for all I/O |
CONVFMT |
controls the conversion of numbers to strings (โ%.6gโ) |
FIELDWIDTHS # |
space-separated list of columns |
FPAT # |
regexp that tells gawk to create the fields based on regexp match |
FS |
input field separator |
IGNORECASE # |
if non-zero/null, comparison & regexp matching are case-independent |
LINT # |
if true, provides warnings about constructs |
OFMT |
controls the conversion of numbers to strings |
OFS |
output field separator |
ORS |
output record separator |
PREC # |
working precision of arbitrary-precision floating-point numbers |
ROUNDMODE # |
rounding mode to use for arbitrary-precision arithmetic on numbers |
RS |
input record separator |
SUBSEP |
subscript separator used in indices of arrayโs separation |
TEXTDOMAIN # |
used for internationalization (โmessagesโ) |
Built-in Variables That Convey Information#
ARGC |
number of command-line arguments |
ARGV |
command-line arguments stored in an array |
ARGIND # |
index in ARGV of the current file |
ENVIRON |
associative array containing the values of the environment |
ERRNO # |
string describing the error (getline or close) |
FILENAME |
name of the current input file |
FNR |
current record number in the current file |
NF |
number of fields in the current input record |
FUNCTAB # |
array of all functions in the program |
NR |
number of input records awk has processed |
PROCINFO # |
array of informations about the running awk program |
RLENGTH |
length of the substring matched by match() |
RSTART |
start index in characters of the substring matched by match() |
RT # |
input text that matched the text denoted by RS |
SYMTAB # |
array of all defined global variables and arrays in the program |
awk -v foo=4 'BEGIN { SYMTAB["foo"] = "toto" ; print foo, ENVIRON["HOME"] }'
Using ARGC and ARGV#
awk 'BEGIN { for ( i = 0 ; i < ARGC ; i++ )
printf "\tARGV[%d] = %s\n", i, ARGV[i] }' toto tata