printf bad ?

Sat Mar 19 11:11:38 UTC 2005

On Fri, 18 Mar 2005, John Macdonald wrote:

> As far as bash is concerned, the argument to printf is just
> a bunch of characters, not a string.  Imagine a program
> addinventory that takes an incoming shipment and adds it to
> the amount of inventory on hand - with pairs of arguments that
> are part-key and quantity-received.
>
> addinventory x3y2 24
> addinventory 0z99 144
> addinventory 0x33 75
>
> You wouldn't want bash to change the part key on the last
> command to 51 because bash decided to treat it as a number?

No, but I'd expect the command line to be sane decimal number-wise (2005 
style, not 1970s style). So I'd expect 0012 to be evaluated as 12 
decimal not 10 decimal in *any* number context and that behavior to be 
consistent, or at least 10#0012 to be decimal in any numeric context 
(for example, when bash::printf is converting an argument for a %d 
option). Especially when handling directory structures as often used 
with databases, mail systems etc (directories 00,01,02 ...). All is well 
until one writes a script to iterate over them numerically by 
incrementing the 'name' as a number, when it is already formatted, and 
then reach 00008, and oops, we have a problem. Or execve a program 
written in C with a filename of the form 00012 as argument, convert it 
using strtoul in the hope of being able to iterate over more files like 
it (like 00013, 00014 ...), and end up with unexpected results. Lesson: 
*always* step past leading zeros in a number to be converted with 
strtoul(_,_,0) unless expecting octal input base.

The previous time this 'bug' happened (in September - see thread on this 
list) I did not even realise what was causing the error. Now I am wiser 
thanks to the list. Of course it is my mistake that I did not think 
about this myself.

For reference, the thread was 'Re: [TLUG]: re: bash limits' and there I 
also tried to iterate over formatted numerical directory names in a 
loop, using something like N=$(( $N + 1 )) to increment the directory 
name where $N was of the form 0000xxx with xxx decimal digits. Some 
files and directories 'disappeared' for no apparent reason from the 
test-set while I was running that script. They were probably teleported 
to octal-land (in the sense that the filenames changed in unexpected 
ways) and subsequently inaccessible using the script's addressing logic. 
I guess it would have worked if I had used octal throughout.

I understand what happened then only now: The first few runs always 
worked, and this was because the script looked at the directory to pick 
up the last file used (as a number). If none was found, it started with 
0 and all was well. Or it would find file 1234 and continue with 1235. 
But sometimes, it would find something like 0013 as last filename, and 
use it in a numerical context to find the next free filename. That would 
give $((0013+1)) = 12 decimal. The next formatted output would use 0012 
decimal as filename and overwrite files 0012 and 0013. Voila, 
disappeared files ...

> Why would 008 be decimal and not base 9?  Is 0101 binary
> or octal?

Normal people assume that the default base for numbers on the command 
line is the one they use everyday. With or without leading zeros. By 
this I do not mean sysadmins and programmers, but average users who know 
10 commands and don't want to learn more (not my case but this is the 
idea).

> If you guess the base from which digits are used, there are
> lots of numbers in that base that can't be written because
> they happen to only use small digits.

I agree. But when set to automatically detect the base then the program 
*should* try a number in the default *user* number base first (as 
opposed to the system default, octal), then try the rest. At least 
that's my opinion. By program I mean most shell commands meant to be 
used by users. I realise that this would break the usual use of chmod 
but that could be made an exception. It's just an opinion.

The 'new' way to intrepret numbers could be merged into the existing 
strtoul using base=1 as flag. It would try to convert as decimal, in 
despite of leading zeros, and also allow binary input using 0b as 
prefix.

I realise that 20 years of doing things in a certain way cannot be 
overturned (and should not be), but I still want my convenience.

Can you quote one application, excepting chmod, where octal use on the 
command line is common ?

For example, scanf %i also has the 'octal bug', but %i is relatively 
seldomly used. So is octal. Using 0 (a valid digit in any base) to flag 
an octal number is a really bad idea imho, especially since there is no 
way to escape it in a consistent way ( $((10#0012)) won't work if the 
program is execv'd directly and uses strtoul to convert some arguments 
passed on the command line).

thanks,
Peter

PS: Here is a revision of strtoul(), tested. A test program and makefile 
were mailed separately to the list on this thread.

// use base=1 to convert numbers using decimal default base,
// even if leading 0s are present. Also accept binary using
// prefix 0b. - plp
unsigned long int my_strtoul(const char *a, char **z, int b)
{
         if(b==1) {
                 while( *a && *(a+1) && (*a=='0') && isdigit(*(a+1)) )
                         ++a;
                 if( *a && *(a+1) && *(a+2) && (*a=='0') &&
                         isdigit(*(a+2)) && ((*(a+1)=='b') || (*(a+1)=='B')) )
                                 return(strtoul(a+2,z,2));
                 b=0;
                 // this may break some program assumptions
                 // about z if the conversion fails
                 if(z)
                         *z = (char*) a;
         }
         return(strtoul(a,z,b));
}

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml