Removing junk characters from text files?

Devin Whalen devin-Gq53QDLGkWIleAitJ8REmdBPR1lH4CV8 at public.gmane.org
Thu Feb 10 21:18:38 UTC 2005


On Thu, 2005-02-10 at 16:01 -0500, William O'Higgins wrote:
> On Thu, Feb 10, 2005 at 01:51:27PM -0500, Stewart C. Russell wrote:
> >tcs sounds like the little brother of Gnu recode, which handles more 
> >charsets than most people can even imagine could exist; 281, in the 
> >version I have.
> 
> These look very neat, but the problem isn't really the encoding.
> 
> >William, what do you want to do with these 'junk' characters? It's 
> >getting harder and harder to work in just plain ASCII these days. It 
> >just doesn't support the glyphs that people need to use.
> 
> What I am looking for is a way to strip these characters out.  They seem
> to be coming from formatting code, and they have 0 semantic value - they
> just prevent CSV files from being cleanly pulling into databases or
> correctly interpreted by spreadsheets.  Basically, the problem is that
> when I see these junk characters (vim syntax colouring shows them in
> blue on a console) I want to do this:
> 
> :%s/$junkcharacter//g
> 
> The problem is that I don't know how to obtain values for $junkcharacter
> based on the crap I see on the screen.  F'rinstance, a CRLF shows up as
> ^M in vim (with the a line break) and I know that that is called "\r" in
> my replacement string - but I don't know what to call some of this other
> stuff that I see.  I can't copy/paste it, because it is represented on
> the screen as something other than what is found with a regex.  Does
> that help?

Can you send a file with some examples?  I am pretty sure the perl
script I sent will work.  I used it on getting junk characters from a
file from an AIX server.

Later



-- 
Devin Whalen
Programmer
Synaptic Vision Inc
Phone-(416) 539-0801
Fax- (416) 539-8280
1179A King St. West
Toronto, Ontario
Suite 309 M6K 3C5
Home-(416) 653-3982


Take back the Web with FireFox....a browser you can trust
www.getfirefox.com

   .-.
   /v\    L   I   N   U   X
  // \\  
 /(   )\
  ^^-^^   

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list