Test for invalid unicode in file name
Lennart Sorensen
lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Fri May 6 19:23:25 UTC 2005
On Fri, May 06, 2005 at 03:17:30PM -0400, William O'Higgins wrote:
> I'm not sure if this will help, but I found this one-liner (reconstruct
> it using " \\" as the separator):
>
> perl -ne 'use bytes;/^(([\x00-\x7f]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef] \\
> [\x80-\xbf]{2}|[\xf0-\xf7][\x80-\xbf]{3})*)(.*)$/;print "$ARGV:$.:".($ \\
> -[3]+1).":$_" if length($3)'
>
> I found the above here: http://www.cl.cam.ac.uk/~mgk25/unicode.html#perl
Neat, although to be complete it should allow {4} and {5} as well in the
matching since UTF-8 does permit that, although I don't think there are
any defined charaters in that range yet.
Lennart Sorensen
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list