Test for invalid unicode in file name
William O'Higgins
william.ohiggins-H217xnMUJC0sA/PxXw9srA at public.gmane.org
Fri May 6 19:17:30 UTC 2005
On Thu, May 05, 2005 at 10:20:48PM -0400, Madison Kelly wrote:
>Hi all,
>
> I've run into a problem where a bulk postgres "COPY..." statement is
>dieing because one of the lines contains a file name with an invalid
>unicode character. In nautilus this file has '(invalid encoding)' and
>the postgres error is 'CONTEXT: COPY file_info_3, line 228287, column
>file_name: "Femme Fatal\uffff.url"'.
>
> Is there a way in perl (something like 'stat') where I can check to
>make sure a file name has valid encoding? If there is than I can catch
>this problem before adding it to, and corrupting, my COPY statement? I
>already 'quote' the file names first but that didn't catch it.
I'm not sure if this will help, but I found this one-liner (reconstruct
it using " \\" as the separator):
perl -ne 'use bytes;/^(([\x00-\x7f]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef] \\
[\x80-\xbf]{2}|[\xf0-\xf7][\x80-\xbf]{3})*)(.*)$/;print "$ARGV:$.:".($ \\
-[3]+1).":$_" if length($3)'
I found the above here: http://www.cl.cam.ac.uk/~mgk25/unicode.html#perl
Good luck.
--
yours,
William
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://gtalug.org/pipermail/legacy/attachments/20050506/e408b2a3/attachment.sig>
More information about the Legacy
mailing list