I'm puzzled by this perl behaviour

D. Hugh Redelmeier hugh-pmF8o41NoarQT0dZR+AlfA at public.gmane.org
Sat Sep 20 20:05:37 UTC 2003


[Disclaimer: I don't actually know perl.]

I don't understand why the regular expression match in the following
perl script fails if and only if the environment is utf8.

If the RE element can be matched without a "+" suffix, surely it can
match with a "+" suffix.  Matching exactly once should be a stronger
condition than matching at least once.

My guess is that there is some Perl feature that I don't know about
that explains this behaviour.

Help!

Hugh Redelmeier
hugh-pmF8o41NoarQT0dZR+AlfA at public.gmane.org  voice: +1 416 482-8253

PS: you can use en_CA in place of en_US.  I wrote en_US in the hope
that this is a more debugged setting.

================ whacky.pl ================
#!/usr/bin/perl

# whacky.pl: demonstrate oddity in Red Hat Linux 9.0's Perl (5.8.0)

# works:	echo "package" | LANG=en_US.utf8 ./whacky.pl
# fails:	echo "package" | LANG=en_US ./whacky.pl


use Data::Dumper;

print $ENV{"LANG"}, "\n";
while (<>) {
#	print Dumper($_);
	chop;
#	$_ = "package=defaults";
	print Dumper($_);
#happy	if( /^\s*([^\s=+])/ ){
#sad
	if( /^\s*([^\s=+]+)/ ){
		die "$1 happy";
	}
	else {
		warn "unknown input in \"$ARGV\" line $. of: $_\n";
		die "sad";
	}
}
================ end ================

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list