[GTALUG] interesting article and comments about UCS-16, UTF-16, UTF-8
Lennart Sorensen
lsorense at csclub.uwaterloo.ca
Tue Aug 6 12:20:59 EDT 2019
On Sat, Aug 03, 2019 at 06:18:51PM -0400, D. Hugh Redelmeier via talk wrote:
> https://news.ycombinator.com/item?id=20600195
>
> There are so many hairy details!
>
> UTF-8 gets a bit less coverage since it has fewer hairy details.
>
> From this I learned that Java and JavaScript now have optimizations to
> use LATIN-1 when they can. Normally they use UTF-16 (originally
> UCS-16). I take it that Using Latin-1 is an opportunistic
> optimization hidden from the program. I don't think Python 3 uses
> this.
>
> I think that Linux does this right and needs no such hack: just use
> UTF-8. Of course Java, JavaScript, Python 2, and Python 3 on Linux
> don't get it right.
UTF-8 just makes much more sense. Backwards compatible with ascii,
no endieness issues, stupidly simple. It just makes sense.
16 bit characters are just all sorts of pain. :)
Of course given who invented UTF8, it is no wonder it is briliant
and simple.
--
Len Sorensen
More information about the talk
mailing list