dos to unix CR/LF conversion?

James Knott james.knott-bJEeYj9oJeDQT0dZR+AlfA at public.gmane.org
Sun Nov 9 12:14:30 UTC 2003


Henry Spencer wrote:
> On Sat, 8 Nov 2003, Max Blanco wrote:
> 
>>you were right.  My mac/ie5 converted linefeed to ^m, instead of \r.
>>I can't see straight anymore between '\n', '\r', "^M", "^L", C, perl, tr, 
>>unix, dos, mac, dog, cat.  Arrgh.  
> 
> 
> Let's get this one sorted out once and for all...
> 
> Unix end-of-line is ASCII LF = linefeed = newline = \n = \012 = ^J.  (This
> actually follows the ASCII standard, which says that if one character is
> used for newline, it shall be LF.)

You should indicate you're using octal for \012.  If using decimal, ^J 
is 10.  It's straight counting.  ^A is 1, ^B is 2 etc.

> 
> Mac end-of-line is ASCII CR = carriage return = \r = \015 = ^M.  (This is
> just wrong, a stupid violation of standards, but it's a bit late to fix
> Apple's botch.)
> 
> DOS/Windows end-of-line is ASCII CR followed by ASCII LF.  (This also
> follows the ASCII standard, which by default separates CR "go back to left
> margin" from LF "go down to next line".  However, with two characters, you
> have all sorts of fun questions about what happens if they show up in the
> wrong order, or if only one of them shows up.  Unix made the right choice
> by using a single character.)

Back in the days of mechanical printers, you were supposed to do the CR 
first, to allow time for the carriage to move back to the left margin. 
If you did the LF first, you'd often get the 1st character of the next 
line, somewhere along the middle of the line.

> 
> ^L is formfeed, and ^R is an obscure device-control character, and neither
> of those is involved at all. 

Again with the old mechanical devices, it was often used to control the 
tape punch, though that was not the only use.  According to my trusty 
ASCII and Baudot card, from back in my tech days, it was called Device 
Control 2 (DC2).
> 
> Converting Unix to Mac or vice-versa is easy on most modern systems:
> 
> 	tr '\n' '\r' <input >output	# Unix to Mac
> 	tr '\r' '\n' <input >output	# Mac to Unix
> 
> (On old systems, it may be necessary to spell \n as \012 and \r as \015.)
> This just transforms one character into the other.
> 
> Converting DOS to Unix is likewise easy:
> 
> 	tr -d '\r' <input >output	# DOS to Unix
> 
> This deletes the silly CRs, leaving the LFs which are already in the 
> right places.
> 
> Going *to* DOS is a little harder, because it requires inserting the
> missing characters, which "tr" can't do.  You could do it with "sed", but
> that's a little awkward because of the lack of good escape conventions for
> the unprintable characters.  So let's use awk instead: 
> 
> 	awk '{ print $0 "\r" }' <input >output		# Unix to DOS
> 
>                                                           Henry Spencer
>                                                        henry-lqW1N6Cllo0sV2N9l4h3zg at public.gmane.org
> 
> --
> The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml


--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list