Get ready to GAWK
Learn about the AWK programming language and the differences between implementations, and prepare your GAWK installation so that you can begin programming.
AWK is the name of the programming language itself, written in 1977. Its name is an acronym for the surnames of its three principal authors: Drs. A. Aho, P. Weinberger, and B. Kernighan.
Because AWK is a text-processing and pattern-matching language, it's often called a data-driven language -- the program statements describe the input data to match and process rather than a sequence of program steps, as is the case with many languages. An AWK program searches its input for records containing patterns, performing specified actions on that record until the program reaches the end of input. AWK programs are excellent for work on databases and tabular data, such as for pulling out columns from multiple data sets, making reports, or analyzing data. In fact, AWK is useful for writing short, one-off programs to perform some feat of text hackery that in another language might be overkill. In addition, AWK is often used on the command line or with pipelines as a power tool.
Like Perl -- which it inspired -- AWK is an interpreted language, so AWK programs are generally not compiled. Instead, the program scripts are passed to the AWK interpreter at run time.
Systems programmers find themselves immediately at home with the C
-like syntax of AWK's input language. In fact, many of its features, including control statements and string functions, such as printf
and sprintf
, appear virtually identical. However, some differences do exist.
The AWK language was updated and more or less replaced in the mid-1980s with an enhanced version called NAWK (New AWK). The old AWK interpreter still exists on many systems, but it's often installed as the oawk
(Old AWK) command, while the NAWK interpreter is installed as the main awk
command as well as being available as nawk
. Dr. Kernighan still maintains NAWK; like GAWK, it is open source and freely available (see Resources).
GAWK is the GNU Project's open source implementation of the AWK interpreter. While the early GAWK releases were replacements for the old AWK, it has since been updated to contain the features of NAWK.
In this tutorial, AWK always refers to references general to the language, while those features specific to the GAWK or NAWK implementations are referred to by their names. You'll find links to GAWK, NAWK, and other important AWK sites in the Resources section.
GAWK has the following unique features and benefits:
- It is available for all major UNIX platforms as well as other operating systems, including Mac OS X and Microsoft® Windows®.
- It is Portable Operating System Interface (POSIX) compliant and contains all features from the 1992 POSIX standard.
- It has no predefined memory limits.
- Helpful new built-in functions and variables are available.
- It contains special
regexp
operators. - Record separators can contain
regexp
operators. - Special file support is available to access standard UNIX streams.
- Lint checking is available.
- It uses extended regular expressions by default.
- It allows unlimited line lengths and continuations with the backslash character (
\
). - It has better, more informative error messages.
- It includes TCP/IP networking functions.
After you have installed GAWK, you must first determine where your local copy has been placed. Most systems use GAWK as their primary AWK install, such as /usr/bin/awk as a symbolic link to /usr/bin/gawk, so that awk
is the name of the command for the GAWK interpreter. This tutorial assumes such an installation. On systems with another flavor of AWK already installed or taking precedence, you might have to call GAWK as gawk
.
You'll know that you have everything installed correctly if you type awk
and get the GNU usage screen, as shown in Listing 1. Most other flavors of AWK return nothing at all.
Listing 1. GAWK installed as
awk
$ awk Usage: gawk [POSIX or GNU style options] -f progfile [--] file ... Usage: gawk [POSIX or GNU style options] [--] 'program' file ... POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val -m[fr] val -W compat --compat -W copyleft --copyleft -W copyright --copyright -W dump-variables[=file] --dump-variables[=file] -W gen-po --gen-po -W help --help -W lint[=fatal] --lint[=fatal] -W lint-old --lint-old -W non-decimal-data --non-decimal-data -W profile[=file] --profile[=file] -W posix --posix -W re-interval --re-interval -W source=program-text --source=program-text -W traditional --traditional -W usage --usage -W version --version To report bugs, see node `Bugs' in `gawk.info', which is section `Reporting Problems and Bugs' in the printed version. |
As you can see, GAWK takes the GNU-standard option for getting the version. The output you get, including a notice from the Free Software Foundation concerning the licensing of GAWK and its lack of warranty, should look like Listing 2.
Listing 2. Displaying the GAWK version
$ gawk --version GNU Awk 3.1.5 Copyright (C) 1989, 1991-2005 Free Software Foundation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. $ |
Now that you have a working GAWK installation and you know how to call it, you're ready to begin programming. The next section describes basic AWK programming concepts.