Perl "Wide character in print" error...

Tue Sep 11 18:08:03 UTC 2007

Hi all,

   This is a follow-up to my earlier "Locale/UTF8 problem in firefox" 
thread. I've since, with your help, removed Firefox as the problem and 
have written a bare-bones script to duplicate the exact error on the 
command line only. Hopefully I can ask for more of your help... This one 
is really stumping me!

   So;

   I've got a program that prints to a browser. Simple enough so far.
Some text comes from an XML file (via XML::Simple), and other text comes
from a postgres database (set to UTF-8).

   If I print to the browser normally, the unicode text coming
from the pgsql db prints fine, but when unicode text from the XML file
prints, I get the "Wide character in print" error, but it still prints
okay. I don't want to "just live with it" though because this floods the
  logs...

   If I switch binmode on STDOUT to ':utf8' (binmode STDOUT, ":utf8";),
then the text coming from the DB is double-encoded and looks garballed,
but the data from the XML file looks fine and *doesn't* generate the
"Wide character..." error.

   I've written a very stripped down script to test this:

-=] test.pl [=-
#!/usr/bin/perl

use strict;
use warnings;
use DBI qw(:sql_types);
use XML::Simple;

# Tell the program where to find the 'words' file.
my $file="./test.xml";

# Read in the words file.
my $word=XMLin($file);

my $dbh=DBI->connect("DBI:Pg:dbname=dbname", "user", "secret",
	{
		RaiseError => 1,
		AutoCommit => 1
	}
) || die "DBI connect error: $DBI::errstr\n";

binmode STDOUT, ":utf8";

# Print unicode data from the DB
my $query="SELECT usr_note FROM users WHERE usr_id=1";
my $DBreq=$dbh->prepare($query) || die "Error with query: [$query;],
error: $DBI::errstr\n";
$DBreq->execute()  || die "Error with query: [$query;], error:
$DBI::errstr\n";
my ($note)=$DBreq->fetchrow_array();
print "Note: [$note]\n";

# Print unicode data from the XML file
print "English : [$$word{lang}{en_CA}{key}{long_name}{content}]\n";
print "Japanese: [$$word{lang}{jp}{key}{long_name}{content}]\n";

exit(0);
-=] test.pl [=-

   Here is the XML file:

-=] test.xml [=-
<?xml version="1.0" encoding="UTF-8" ?>

<words>
	<lang name="en_CA">
		<key name="long_name">Canadian English</key>
		<key name="bar">FOO.</key>
	</lang>
	<lang name="jp">
		<key name="long_name">日本語</key>
		<key name="bar">FOO.</key>
	</lang>
</words>
-=] test.xml [=-

   And this is a copy of the DB encoding string and stripped down schema
and data:

-=] PgSQL stuff [=-
SET client_encoding = 'UTF8';
SET check_function_bodies = false;
SET client_min_messages = warning;

CREATE SEQUENCE usr_seq
     INCREMENT BY 1
     NO MAXVALUE
     MINVALUE 0
     CACHE 1;

CREATE TABLE users (
     usr_id integer DEFAULT nextval('usr_seq'::regclass) NOT NULL,
     usr_note text,
);

COPY users (usr_id, usr_note) FROM stdin;
1	Just some text but now with kanji; 私は名前ケッリです。 How is this
handled?
\.

-=] PgSQL stuff [=-

   When I run it with (line 21): # binmode STDOUT, ":utf8";

I get:

-=] With 'binmode' commented out [=-
-Note: [Just some text but now with kanji; 私は名前ケッリです。 How is
this handled? And still again...]
English : [Canadian English]
Wide character in print at ./test.pl line 32.
Japanese: [日本語]
=] With 'binmode' commented out [=-

   And when I run it with (line 21): binmode STDOUT, ":utf8";

-=] With 'binmode' enabled [=-
Note: [Just some text but now with kanji; ç§ã¯åå‰ã‚±ãƒƒãƒªã§ã™ã€‚
How is this handled? And still again...]
English : [Canadian English]
Japanese: [日本語]
-=] With 'binmode' enabled [=-

   Any tips/help?

   Thanks!

A desperate Madi :)

--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists