filename completion, UTF-16 [was Re: rsync backup]

Thu Dec 9 19:06:35 UTC 2010

On Thu, Dec 09, 2010 at 12:16:32PM -0500, D. Hugh Redelmeier wrote:
> Spoken like a touch-typist.  I agree.
> 
> This is somewhat mitigated by the fact that the odd behaviour on SPace is 
> triggered by having previously used tab.  But not completely: it means
> that you need to remember that the shell is in
> incomplete-tab-completion mode.  And modes add a cognitive burden.
> "Don't mode me in" is an old EMACs-culture refrain that I subscribe
> to.
> 
> My guess is that touch-typing habits are swamped by the opposite: the
> mass of thumb-typing folks that want anything that can save
> thumb-strokes, including a lot of very modish things.  Kids these days!
> 
> I'm old school.  I like EMACS keystrokes but not its complexity
> ("there's an app for that").  So I use a small subset implementation
> called JOVE.  Richer than nano/pico but small enough that it fit on a
> 64K machine (PDP-11).
> 
> Some day I'll switch because I'm too lazy to make JOVE support UTF-8.
> It survives UTF-8 but doesn't support it.  For example, the following
> line:
> | >  - If you type a space, that space replaces the trailing slash. (Or the
> 
> looks like this to me as I edit it in JOVE:
> | > \302\240- If you type a space, that space replaces the trailing slash. (Or the
> 
> 
> My opinion is UTF-16 is a mistake.
> 
> - a single UTF-16 code unit cannot represent all of Unicode.  So each
>   character takes one or two 16-bit code units.  So, like UTF-8 (and
>   unlike ASCII, ISO-8859-1, UCS-2, UTF-32) a character is variable
>   length, making processing a little more awkward.
> 
> - it doubles the size of ordinary text (unlike UTF-8)
> 
> - I think that it is harder to convert old C code to support UTF-16
>   than UTF-8.
> 
> - a chunk of plain ASCII in memory does not look like UTF-16 and vice
>   versa.  But it does look like UTF-8 and vice versa.
> 
> - UTF-16 raises the big-endian vs little-endian issues (in which order do 
>   the two bytes go?).  Unfortunately, both are legitimate, leading to a 
>   having to support both.  I seem to recollect HTML standards have to make 
>   this surviveable.  Sheesh!
> 
> Yet UTF-16 is what MS Windows, Java, Python, and who knows what else have 
> adopted.  I imagine the reason was that they thought UCS-2 was good enough 
> but had to back down to UTF-16 when Han Unification was rejected.
> UTF-16 replaced UCS-2 in 1996 (according to my reading of Wikipedia).  
> Surely that was early enough to prevent some of those adoptions.
> 
> Am I wrong?

UTF-16 is useless.  It makes ascii take twice the space, and doesn't
handle all unicode without multiple chunks anyhow.  UTF-8 leaves ascii
alone, and handles everything, usually more space efficiently than UTF-16.
UTF-16 is simply a bad mistake.

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists