filename completion, UTF-16 [was Re: rsync backup]
Lennart Sorensen
lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Thu Dec 9 19:06:35 UTC 2010
On Thu, Dec 09, 2010 at 12:16:32PM -0500, D. Hugh Redelmeier wrote:
> Spoken like a touch-typist. I agree.
>
> This is somewhat mitigated by the fact that the odd behaviour on SPace is
> triggered by having previously used tab. But not completely: it means
> that you need to remember that the shell is in
> incomplete-tab-completion mode. And modes add a cognitive burden.
> "Don't mode me in" is an old EMACs-culture refrain that I subscribe
> to.
>
> My guess is that touch-typing habits are swamped by the opposite: the
> mass of thumb-typing folks that want anything that can save
> thumb-strokes, including a lot of very modish things. Kids these days!
>
> I'm old school. I like EMACS keystrokes but not its complexity
> ("there's an app for that"). So I use a small subset implementation
> called JOVE. Richer than nano/pico but small enough that it fit on a
> 64K machine (PDP-11).
>
> Some day I'll switch because I'm too lazy to make JOVE support UTF-8.
> It survives UTF-8 but doesn't support it. For example, the following
> line:
> | > - If you type a space, that space replaces the trailing slash. (Or the
>
> looks like this to me as I edit it in JOVE:
> | > \302\240- If you type a space, that space replaces the trailing slash. (Or the
>
>
> My opinion is UTF-16 is a mistake.
>
> - a single UTF-16 code unit cannot represent all of Unicode. So each
> character takes one or two 16-bit code units. So, like UTF-8 (and
> unlike ASCII, ISO-8859-1, UCS-2, UTF-32) a character is variable
> length, making processing a little more awkward.
>
> - it doubles the size of ordinary text (unlike UTF-8)
>
> - I think that it is harder to convert old C code to support UTF-16
> than UTF-8.
>
> - a chunk of plain ASCII in memory does not look like UTF-16 and vice
> versa. But it does look like UTF-8 and vice versa.
>
> - UTF-16 raises the big-endian vs little-endian issues (in which order do
> the two bytes go?). Unfortunately, both are legitimate, leading to a
> having to support both. I seem to recollect HTML standards have to make
> this surviveable. Sheesh!
>
> Yet UTF-16 is what MS Windows, Java, Python, and who knows what else have
> adopted. I imagine the reason was that they thought UCS-2 was good enough
> but had to back down to UTF-16 when Han Unification was rejected.
> UTF-16 replaced UCS-2 in 1996 (according to my reading of Wikipedia).
> Surely that was early enough to prevent some of those adoptions.
>
> Am I wrong?
UTF-16 is useless. It makes ascii take twice the space, and doesn't
handle all unicode without multiple chunks anyhow. UTF-8 leaves ascii
alone, and handles everything, usually more space efficiently than UTF-16.
UTF-16 is simply a bad mistake.
--
Len Sorensen
--
The Toronto Linux Users Group. Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
More information about the Legacy
mailing list