Gawk/Programs-Exercises
Previous: Programs Summary, Up: Sample Programs [Contents][Index]
11.5 Exercises
- Rewrite
cut.awk
(see section Cutting Out Fields and Columns) usingsplit()
with""
as the separator. - In Searching for Regular Expressions in Files, we mentioned that ‘
egrep -i
’ could be simulated in versions ofawk
withoutIGNORECASE
by usingtolower()
on the line and the pattern. In a footnote there, we also mentioned that this solution has a bug: the translated line is output, and not the original one. Fix this problem. - The POSIX version of
id
takes options that control which information is printed. Modify theawk
version (see section Printing Out User Information) to accept the same arguments and perform in the same way. - The
split.awk
program (see section Splitting a Large File into Pieces) assumes that letters are contiguous in the character set, which isn’t true for EBCDIC systems. Fix this problem. (Hint: Consider a different way to work through the alphabet, without relying onord()
andchr()
.) - In
uniq.awk
(see section Printing Nonduplicated Lines of Text, the logic for choosing which lines to print represents a state machine, which is “a device that can be in one of a set number of stable conditions depending on its previous condition and on the present values of its inputs.”84 Brian Kernighan suggests that “an alternative approach to state machines is to just read the input into an array, then use indexing. It’s almost always easier code, and for most inputs where you would use this, just as fast.” Rewrite the logic to follow this suggestion. - Why can’t the
wc.awk
program (see section Counting Things) just use the value ofFNR
inendfile()
? Hint: Examine the code in Noting Data file Boundaries. - Manipulation of individual characters in the
translate
program (see section Transliterating Characters) is painful using standardawk
functions. Given thatgawk
can split strings into individual characters using""
as the separator, how might you use this feature to simplify the program? - The
extract.awk
program (see section Extracting Programs from Texinfo Source Files) was written beforegawk
had thegensub()
function. Use it to simplify the code. - Compare the performance of the
awksed.awk
program (see section A Simple Stream Editor) with the more straightforward:BEGIN { pat = ARGV[1] repl = ARGV[2] ARGV[1] = ARGV[2] = "" } { gsub(pat, repl); print }
- What are the advantages and disadvantages of
awksed.awk
versus the realsed
utility? - In An Easy Way to Use Library Functions, we mentioned that not trying to save the line
read with
getline
in thepathto()
function when testing for the file’s accessibility for use with the main program simplifies things considerably. What problem does this engender though? As an additional example of the idea that it is not always necessary to add new features to a program, consider the idea of having two files in a directory in the search path:
default.awk
This file contains a set of default library functions, such as
getopt()
andassert()
.site.awk
This file contains library functions that are specific to a site or installation; i.e., locally developed functions. Having a separate file allows
default.awk
to change with newgawk
releases, without requiring the system administrator to update it each time by adding the local functions.
One user suggested that
gawk
be modified to automatically read these files upon startup. Instead, it would be very simple to modifyigawk
to do this. Sinceigawk
can process nested@include
directives,default.awk
could simply contain@include
statements for the desired library functions. Make this change.- Modify
anagram.awk
(see section Finding Anagrams from a Dictionary), to avoid the use of the externalsort
utility.
Previous: Programs Summary, Up: Sample Programs [Contents][Index]