Test for invalid unicode in file name

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Fri May 6 16:33:14 UTC 2005


On Thu, May 05, 2005 at 10:20:48PM -0400, Madison Kelly wrote:
>   I've run into a problem where a bulk postgres "COPY..." statement is 
> dieing because one of the lines contains a file name with an invalid 
> unicode character. In nautilus this file has '(invalid encoding)' and 
> the postgres error is 'CONTEXT:  COPY file_info_3, line 228287, column 
> file_name: "Femme Fatal\uffff.url"'.
> 
>   Is there a way in perl (something like 'stat') where I can check to 
> make sure a file name has valid encoding? If there is than I can catch 
> this problem before adding it to, and corrupting, my COPY statement? I 
> already 'quote' the file names first but that didn't catch it.
> 
>   Thanks!
> 
> Madison
> 
> PS - I posted this on TPM for anyone subscribed to there but I didn't 
> get any replies so I am hoping for better luck here. :p

I used to get this problem when postgres started enforcing encodings
some years ago.  I had data that was LATIN1 and the database
configured for unicode, which used to work fine as postgresql used to
not care.  When it started caring, upgrades failed as the dump from the
old one wouldn't load in the new one.  Switching the database to LATIN1
to match the data solved the problem.  You can even set an encoding for
a session using a postgresql command and then send it LATIN1 data and it
will convert it to unicode for storage automatically.

Lennart Sorensen
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list