Test for invalid unicode in file name
Henry Spencer
henry-lqW1N6Cllo0sV2N9l4h3zg at public.gmane.org
Sat May 7 04:44:13 UTC 2005
On Fri, 6 May 2005, Lennart Sorensen wrote:
> Well yes, you MUST use the shortest form possible although why anyone
> would write a parser to NOT accept all possible forms I don't know...
There are people who want to be able to do security checks by examining
the UTF-8 encoding, and hence are deeply averse to multiple encodings of
the same character. This is a large part of why the non-shortest
encodings are now officially invalid.
I don't say I think this was a good idea, mind you, but there's a
significant user community which does want to at least have the option
of rejecting such forms.
The "surrogates" area of Unicode is also not allowed to show up in UTF-8,
since it's used only in the UTF-16 encoding, and you are not supposed to
nest encodings.
Henry Spencer
henry-lqW1N6Cllo0sV2N9l4h3zg at public.gmane.org
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list