Test for invalid unicode in file name

Henry Spencer henry-lqW1N6Cllo0sV2N9l4h3zg at public.gmane.org
Sat May 7 04:44:13 UTC 2005


On Fri, 6 May 2005, Lennart Sorensen wrote:
> Well yes, you MUST use the shortest form possible although why anyone
> would write a parser to NOT accept all possible forms I don't know...

There are people who want to be able to do security checks by examining
the UTF-8 encoding, and hence are deeply averse to multiple encodings of
the same character.  This is a large part of why the non-shortest
encodings are now officially invalid.

I don't say I think this was a good idea, mind you, but there's a
significant user community which does want to at least have the option
of rejecting such forms.

The "surrogates" area of Unicode is also not allowed to show up in UTF-8,
since it's used only in the UTF-16 encoding, and you are not supposed to
nest encodings. 

                                                          Henry Spencer
                                                       henry-lqW1N6Cllo0sV2N9l4h3zg at public.gmane.org

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list