Removing junk characters from text files?
William O'Higgins
william.ohiggins-H217xnMUJC0sA/PxXw9srA at public.gmane.org
Thu Feb 10 21:01:43 UTC 2005
On Thu, Feb 10, 2005 at 01:51:27PM -0500, Stewart C. Russell wrote:
>tcs sounds like the little brother of Gnu recode, which handles more
>charsets than most people can even imagine could exist; 281, in the
>version I have.
These look very neat, but the problem isn't really the encoding.
>William, what do you want to do with these 'junk' characters? It's
>getting harder and harder to work in just plain ASCII these days. It
>just doesn't support the glyphs that people need to use.
What I am looking for is a way to strip these characters out. They seem
to be coming from formatting code, and they have 0 semantic value - they
just prevent CSV files from being cleanly pulling into databases or
correctly interpreted by spreadsheets. Basically, the problem is that
when I see these junk characters (vim syntax colouring shows them in
blue on a console) I want to do this:
:%s/$junkcharacter//g
The problem is that I don't know how to obtain values for $junkcharacter
based on the crap I see on the screen. F'rinstance, a CRLF shows up as
^M in vim (with the a line break) and I know that that is called "\r" in
my replacement string - but I don't know what to call some of this other
stuff that I see. I can't copy/paste it, because it is represented on
the screen as something other than what is found with a regex. Does
that help?
--
yours,
William
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://gtalug.org/pipermail/legacy/attachments/20050210/a1972a07/attachment.sig>
More information about the Legacy
mailing list