Removing junk characters from text files?

William O'Higgins william.ohiggins-H217xnMUJC0sA/PxXw9srA at public.gmane.org
Thu Feb 10 17:06:13 UTC 2005


Thanks for all of the replies thus far.  To recap:

I have files with "bad" characters in them - stuff that doesn't print,
but does screw up the regexes and other text processing.  I identified
one of these as (I think) \240, but I wasn't sure.

Several people suggested tricks for removing DOS line endings, both in
vi and using utilities like dos2unix (I use flip, but we're on the same
page).

We also had people suggesting transposition operators, usually looking
like tr///.  I agree whole-heartedly with this advice - these are good
tools.

Lennart asked the incredibly salient question of "what does file say?"
The answer is that file thinks it is text, encoded with the 8859
charset.  These files are often multi-generational Windoze documents
that have passed via the beauty of Object Linking and Embedding through
several programs, each of which "knows" best.

The problem I have is that I don't know what to call some of these junk
characters for transposition.  When vi hands you "||" in blue, what does
that mean, and how do you strip it?  Thanks.
-- 

yours,

William

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://gtalug.org/pipermail/legacy/attachments/20050210/f71c1082/attachment.sig>


More information about the Legacy mailing list