64 bit linux on Intel T9600

Mon Jun 22 20:18:21 UTC 2009

| From: Lennart Sorensen <lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org>

| On Fri, Jun 19, 2009 at 05:45:01PM -0400, D. Hugh Redelmeier wrote:
| > | From: Lennart Sorensen <lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org>
| > 
| > | On Fri, Jun 19, 2009 at 02:39:18PM -0400, D. Hugh Redelmeier wrote:
| > 
| > | >  Unfortunately the x86-64 ABI panders to this 
| > | > disease (evidence: sizeof(int) == 4!).
| > |  
| > | How is that a problem?  As long as size(long) == 8 on a 64bit machine,
| > | then you should be happy.  Windows unfortunately does NOT do that.
| > | They have a special long type for storing pointer size things on 64bt
| > | systems.
| > 
| > K&R, first edition, describes int as:
| > 	an integer, typically reflecting the natural size of integers
| > 	on the host machine
| 
| Well K&R did many stupid things that people have since realized and
| tried to fix.  That was probably one of them.

I consider it quite sensible.  Remember, C evolved from a "typeless" 
language, B (which evolved from BCPL, a typeless variant of CPL).  
Everything was a machine word.  "int" was how this was spelled in C.

"int" is the type to be used when you don't have any reason to demand
special properties.

| > Furthermore, in describing short and long, it says:
| > 
| > 	The intent is that short and long should provide different
| > 	lengths of integers where practical; in will normally reflect
| > 	the most ``natural'' size for a particular machine.
| > 
| > To me, on a machine that I think of as 64-bit, that would be 64 bits.
| 
| Well that's not how any modern system treats it.

It is true that LP64 rather than ILP64 was adopted by most
implementations.  See
  http://www.unix.org/version2/whatsnew/lp64_wp.html
and
  http://gcc.gnu.org/ml/gcc-help/2009-02/msg00030.html
(The first place I noticed this discussion and terminology was in the
C Standards working paper 92-038 from 1992 April 1 (oops).  It seemed
to recommend ILP64, but certainly not as a requirement.)

At the root, this was for the pragmatic reason that it made old C
programs work more often.  On the other hand, it was neither in the
Spirit of C, nor did it maximize the benefit of the transition to
64-bit.

This was particularly sad on the Alpha, the first version of which
didn't even have partial word instructions (if I rememeber correctly).

The trade off was: less short term pain for less long term gain.

I will admit that the change of dynamic range from 16-bit to 32-bit
was way more important than the change from 32 to 64.

| > To be honest, I think that the structure of integral types in C are a
| > mess.  One fix-up adopted by the C Committee looks ugly too: adding
| > types with explicit widths in bits.
| 
| Sometimes that is what you require for the ability to control structs,
| packing of data in files, on network links, etc.

That is a sin.  C structs are not correctly used for layout.

| > Pascal's subranges are more natural.  You get to specify the range in
| > terms that are relevant to your problem.  Intermediate expressions should
| > be as-if calculated in infinite width.  If a programmer knows that
| > an intermediate expression won't overflow a narrower type, and the
| > compiler cannot know that, the programmer can add a cast to help the
| > compiler.  The default is then correct, if possibly inefficient,
| > instead of efficient, if possibly incorrect.
| 
| Pascal is a higher level language than C.

Sure, but that's not the issue.  Subrange specification of integral
types is actually simpler than the weird integral-type mixed-up
partial order and easier to implement.
- simpler to specify sanely
- simpler for the programmer to understand
- simpler to handle syntactically.
- fewer keywords
- fewer symbols to define in <inttypes.h>
- simpler to write portable code.

Quiz: does char promote to int or unsigned int?

| > Examples of the as-if rule in practice:
| > 
| > 	int i, j, k;
| > 	long m, n;
| > 
| > 	m = i + j;
| > 
| > Now, if i + j does not fit in int, but does fit long, overflow occurs.
| > With my rule, the correct result will be calculated.
| > 
| > 	i = i + j;
| >
| > No difference: since the result is stored in int, the calculation can
| > be done in int without loss.
| 
| If i and j are both MAXINT, then it doesn't fit and loss will happen.

Sorry that I wasn't clearer: "no difference" meant "just as wrong,
just as right, just as efficient, the same compiled code".

| > 	i = i + j + 1;
| > This case is interesting because, for certain values of i and j, i + j
| > could overflow int and yet have i + j + 1 still representable as int.
| 
| Well at least the bits left over after throwing away the overflow.

I meant what I said: the result could be representable.  For example,
the case
	MININT + (-1) + 1

Under the current rules, integer addition is not associative:
different associations can cause different overflows.  Overflow causes
the program to be non-conformant (I've not checked the exactly correct
term).  Under my rules, integer addition would be associative, just as
naive folks assumed.

One reason that it is easy to assume that addition is commutative is
that it is associative if the hardware silently wraps on overflow, and
that is what most hardware does these days.

BTW, I have written IBM/360 assembly programs where I could decide
what to do with overflow: ignore or trap.  I found that trapping was
the best choice because most overflows detected actual errors.

Just as silent buffer overflow has lead to bugs in C code, some with
security implications, silent integer overflow has lead to bugs and
vulnerabilities.

My first public exploit (1976, I'd guess) was based on undetected
integer overflow.

| > Most current hardware silently ignores overflow (essentially copying
| > the PDP-11).  I've found this to be unfortunate: overflow is generally
| > a sign of a bug.  It is legal for a C implementation to consider this
| > an error.
| > 
| > With existing C, the program could fail due to overflow whereas my
| > rule would make this correct.  The cost is that, under my rules, on a
| > machine with trapping overflow, the calculation must be done with
| > wider intermediate results.
| 
| Which involves serious hardware changes in which case why not just make
| everything in the machine wider.  Of course then someone will want a
| wider one still.

No, the case presupposes hardware that already does it.  I'm not
demanding such hardware, only talking about how to deal with it.

| > The programmer could have written i = (int) (i + j) + 1;
| > thus allowing the calculation to be done in int under my rule.
| 
| And it still can cause overflow, so what's your point?

The point is that the programmer can cause the efficient-but-risky
code to be generated, even if it isn't the default behaviour.

That isn't modern, that is broken.  The C standard does not support
such usage, nor should it.  The modern use of C is to get the types
right (and the C standard tries to provide the necessary types).

Linus can require that of the environment in which the kernel lives.

Why would you want this feature?  Would intptr_t satisfy that
requirement?

| > When I learned C, there was no "long".  When long was introduced, it
| > was twice as wide as a pointer (PDP-11).
| 
| Must be a long time ago.
| 
| I am rather pleased that linux has tried to make some sanity of things
| by standardizing the size of most of the types on linux.  The only
| one that varies is long, and it is defined as the size of a pointer,
| so even it is rather standard.  The only other one that varies is the sign
| of char when not specified.  Some systems are signed, some are unsigned,
| although apparently C now says that char isn't the same as unsigned char
| or signed char but is its own third type for string use.

That attitude would have left us stuck with PDP-11 representations:
    char 8
    short 16
    int 16
    long 32

I can tell you that folks learned lessons on cleaner coding during
each transition.  The main ones seem to have been:
    PDP-11 => IBM/370, GE635
    PDP-11 => VAX
    DOS => Win32 (?  I wasn't involved)
    VAX => 68k (different endian)
    68k => i386 (different endian)
    => Alpha, itanium SPARC64 (almost all stayed at 32, I think), PPC 64,
    i386 => x86_64.

--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists