Gnu/coreutils/Squeezing-and-deleting
Previous: Translating, Up: tr invocation [Contents][Index]
9.1.3 Squeezing repeats and deleting
When given just the --delete
(-d
) option, tr
removes any input characters that are in set1
.
When given just the --squeeze-repeats
(-s
) option
and not translating, tr
replaces each input sequence of a
repeated character that is in set1
with a single occurrence of
that character.
When given both --delete
and --squeeze-repeats
, tr
first performs any deletions using set1
, then squeezes repeats
from any remaining characters using set2
.
The --squeeze-repeats
option may also be used when translating,
in which case tr
first performs translation, then squeezes
repeats from any remaining characters using set2
.
Here are some examples to illustrate various combinations of options:
- Remove all zero bytes:
tr -d '\0'
- Put all words on lines by themselves. This converts all
non-alphanumeric characters to newlines, then squeezes each string
of repeated newlines into a single newline:
tr -cs '[:alnum:]' '[\n*]'
- Convert each sequence of repeated newlines to a single newline.
I.e., delete blank lines:
tr -s '\n'
- Find doubled occurrences of words in a document.
For example, people often write “the the” with the repeated words
separated by a newline. The Bourne shell script below works first
by converting each sequence of punctuation and blank characters to a
single newline. That puts each “word” on a line by itself.
Next it maps all uppercase characters to lower case, and finally it
runs
uniq
with the-d
option to print out only the words that were repeated.#!/bin/sh cat -- "$@" \ | tr -s '[:punct:][:blank:]' '[\n*]' \ | tr '[:upper:]' '[:lower:]' \ | uniq -d
Deleting a small set of characters is usually straightforward. For example, to remove all ‘
a
’s, ‘x
’s, and ‘M
’s you would do this:tr -d axM
However, when ‘
-
’ is one of those characters, it can be tricky because ‘-
’ has special meanings. Performing the same task as above but also removing all ‘-
’ characters, we might trytr -d -axM
, but that would fail becausetr
would try to interpret-a
as a command-line option. Alternatively, we could try putting the hyphen inside the string,tr -d a-xM
, but that wouldn’t work either because it would maketr
interpreta-x
as the range of characters ‘a
’…‘x
’ rather than the three. One way to solve the problem is to put the hyphen at the end of the list of characters:tr -d axM-
Or you can use ‘
--
’ to terminate option processing:tr -d -- -axM
More generally, use the character class notation
[=c=]
with ‘-
’ (or any other character) in place of the ‘c
’:tr -d '[=-=]axM'
Note how single quotes are used in the above example to protect the square brackets from interpretation by a shell.
Previous: Translating, Up: tr invocation [Contents][Index]