Gnu/coreutils/Squeezing-and-deleting

From Get docs

Previous: Translating, Up: tr invocation   [Contents][Index]


9.1.3 Squeezing repeats and deleting

When given just the --delete (-d) option, tr removes any input characters that are in set1.

When given just the --squeeze-repeats (-s) option and not translating, tr replaces each input sequence of a repeated character that is in set1 with a single occurrence of that character.

When given both --delete and --squeeze-repeats, tr first performs any deletions using set1, then squeezes repeats from any remaining characters using set2.

The --squeeze-repeats option may also be used when translating, in which case tr first performs translation, then squeezes repeats from any remaining characters using set2.

Here are some examples to illustrate various combinations of options:

  • Remove all zero bytes:
    tr -d '\0'
  • Put all words on lines by themselves. This converts all non-alphanumeric characters to newlines, then squeezes each string of repeated newlines into a single newline:
    tr -cs '[:alnum:]' '[\n*]'
  • Convert each sequence of repeated newlines to a single newline. I.e., delete blank lines:
    tr -s '\n'
  • Find doubled occurrences of words in a document. For example, people often write “the the” with the repeated words separated by a newline. The Bourne shell script below works first by converting each sequence of punctuation and blank characters to a single newline. That puts each “word” on a line by itself. Next it maps all uppercase characters to lower case, and finally it runs uniq with the -d option to print out only the words that were repeated.
    #!/bin/sh
    cat -- "$@" \
      | tr -s '[:punct:][:blank:]' '[\n*]' \
      | tr '[:upper:]' '[:lower:]' \
      | uniq -d
  • Deleting a small set of characters is usually straightforward. For example, to remove all ‘a’s, ‘x’s, and ‘M’s you would do this:

    tr -d axM

    However, when ‘-’ is one of those characters, it can be tricky because ‘-’ has special meanings. Performing the same task as above but also removing all ‘-’ characters, we might try tr -d -axM, but that would fail because tr would try to interpret -a as a command-line option. Alternatively, we could try putting the hyphen inside the string, tr -d a-xM, but that wouldn’t work either because it would make tr interpret a-x as the range of characters ‘a’…‘x’ rather than the three. One way to solve the problem is to put the hyphen at the end of the list of characters:

    tr -d axM-

    Or you can use ‘--’ to terminate option processing:

    tr -d -- -axM

    More generally, use the character class notation [=c=] with ‘-’ (or any other character) in place of the ‘c’:

    tr -d '[=-=]axM'

    Note how single quotes are used in the above example to protect the square brackets from interpretation by a shell.

Previous: Translating, Up: tr invocation   [Contents][Index]