Gnu/coreutils/Squeezing-and-deleting
Previous: Translating, Up: tr invocation [Contents][Index]
9.1.3 Squeezing repeats and deleting
When given just the --delete (-d) option, tr
removes any input characters that are in set1.
When given just the --squeeze-repeats (-s) option
and not translating, tr replaces each input sequence of a
repeated character that is in set1 with a single occurrence of
that character.
When given both --delete and --squeeze-repeats, tr
first performs any deletions using set1, then squeezes repeats
from any remaining characters using set2.
The --squeeze-repeats option may also be used when translating,
in which case tr first performs translation, then squeezes
repeats from any remaining characters using set2.
Here are some examples to illustrate various combinations of options:
- Remove all zero bytes:
tr -d '\0'
- Put all words on lines by themselves. This converts all
non-alphanumeric characters to newlines, then squeezes each string
of repeated newlines into a single newline:
tr -cs '[:alnum:]' '[\n*]'
- Convert each sequence of repeated newlines to a single newline.
I.e., delete blank lines:
tr -s '\n'
- Find doubled occurrences of words in a document.
For example, people often write “the the” with the repeated words
separated by a newline. The Bourne shell script below works first
by converting each sequence of punctuation and blank characters to a
single newline. That puts each “word” on a line by itself.
Next it maps all uppercase characters to lower case, and finally it
runs
uniqwith the-doption to print out only the words that were repeated.#!/bin/sh cat -- "$@" \ | tr -s '[:punct:][:blank:]' '[\n*]' \ | tr '[:upper:]' '[:lower:]' \ | uniq -d
Deleting a small set of characters is usually straightforward. For example, to remove all ‘
a’s, ‘x’s, and ‘M’s you would do this:tr -d axM
However, when ‘
-’ is one of those characters, it can be tricky because ‘-’ has special meanings. Performing the same task as above but also removing all ‘-’ characters, we might trytr -d -axM, but that would fail becausetrwould try to interpret-aas a command-line option. Alternatively, we could try putting the hyphen inside the string,tr -d a-xM, but that wouldn’t work either because it would maketrinterpreta-xas the range of characters ‘a’…‘x’ rather than the three. One way to solve the problem is to put the hyphen at the end of the list of characters:tr -d axM-
Or you can use ‘
--’ to terminate option processing:tr -d -- -axM
More generally, use the character class notation
[=c=]with ‘-’ (or any other character) in place of the ‘c’:tr -d '[=-=]axM'
Note how single quotes are used in the above example to protect the square brackets from interpretation by a shell.
Previous: Translating, Up: tr invocation [Contents][Index]