in Toronto this month: International Symposium on Code Generation and Optimization

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Tue Apr 6 15:14:36 UTC 2010


On Mon, Apr 05, 2010 at 10:20:27PM -0400, D. Hugh Redelmeier wrote:
> As I read it, this is not just about iTanium.  "Eighth Workshop on
> Explicitly Parallel Instruction Computing Architectures and Compiler
> Technology (EPIC-8)".  Note the plural "Architectures".
> 
> I don't know how they define EPIC.  Some say it is a synonym for
> Itanium, but then again, other architectures claim to be EPIC too (eg.
> recent Elbrus architectures).

Well EPIC as it says above is simply an instruction set where the
compiler is responsible for putting instructions to execute at the same
time together in one long instruction word at compile time.  The itanium
has a VLIW with space for 3 instructions at a time.

> To me it is VLIW (quite interesting) plus a bunch of additional
> features to try to make this practical.
> 
> I don't deeply understand the Itanium's problems.  I've heard or made
> up many plausible explanations but they may not be right.

The EPIC/VLIW design was made on the assumption compiler technology would
advance and doing compile time parallel instruction scheduling would
become something compilers just did.  Well the assumption has so far
turned out to be completely wrong in general.  Very few types of software
seem able to be efficiently scheduled at compile time.  Of course doing
it at compile time would be great since you then save the out of order
execution hardware which is rather costly (but very effective in general.
It is generally considered to double the performance of a CPU.  The atom
doesn't have it.  Most x86 chips do.  That's the main reason the atom
performs so much slower than its clock speed would make you think).

Maybe someday compiler technology will do what intel (and HP) hoped
it would do.  The open64 compiler may be heading that way (not sure
really, although it does seem better at optimizing parallel code than
most compilers as far as I understand the description of it).

So far gcc and most other compilers rarely manage to get more than
one instruction in the VLIW at a time.  The itanium would like three
instructions at a time.  This of course doesn't make for good use of
the CPU.  Some database code seems able to be optimized a bit better,
but most code so far apparently does not compile well for the itanium
at all, and hence the performance is in general rather awful for a CPU
of that complexity (and cost).  The only thing saving the itanium from
complete death so far is that it was designed for high reliability
systems.  Unfortunately for the itanium the new xeon CPUs are starting
to get those features too.  Hence most vendors (anyone not named HP)
including Microsoft are now abandoning the itanium.

> - Intel spent all its energy on x86, its bread and butter (see
>   Christensen's "The Innovator's Dilemma" for an explanation of why
>   this is to be expected)

Well for a while they didn't (hence AMD made x86_64, not intel, given
intel was busy trying to convince the world itanium was the future).
That has now changed (and intel seems to have dumped their arm division
to marvell or something to focus pretty much entirely on x86).

> - static scheduling is really hard when cached memory systems are
>   very effective but each reference's time is very hard to predict.
>   Some workloads could be tractable (eg. linear algebra).
> 
> - memory bandwidth is the main barrier; perhaps the organization of the
>   processor is not that important.
> 
> - I think that VLIW has poor code-density.  Not just because of the
>   size of instructions but also VLIW optimization techniques such as
>   trace scheduling.
> 
> - VLIW seems a poor match for the kind of code I usually write: short
>   basic blocks and lots of them.

So far it has been a poor match for almost all code written by anyone.

> I think that the case against VLIW and EPIC is not yet proven.

Well so far there hasn't been any success stories for it that I have
encountered.  To me that pretty much is as good as proving it.  I don't
think x86 is a good instruction set by any means, but it sure looks a
lot better than the itanium's instruction set.  Powerpc is certainly
nicer, as is sparc.  Not sure about arm.  Mips is pretty nice (well
the new mips instruction set is nice, the old one lacked a few rather
essential features).  Not a fan of coldfire/m68k instructions either
(I really don't like variable length instructions).

There are CPU designs that impress me when they are announced.  There are
some that concern me.  Some I am not sure about.

When the pentium 4 was announced, I expected problems.  The pipeline
simple went way beyond anything traditional CPU architecture design says
makes sense.  I am sure AMD loved it.  It really did turn out to be as
bad an inefficient on general purpose code as traditional text books on
CPU archtrecture design predicted.

The itanium I wasn't sure about.  Certainly avoiding all the complexity
of out of order and speculative execution and all that seems reasonable.
Having the compiler do the scheduling on the other hand was not something
I had ever heard of being done.  I can't even imagine how a compiler
would be able to do that in general (only in specific cases).  But being
not a compiler designer I figured intel/HP must have some idea what they
were getting into.  Well turns out they didn't know and apparently no
one else seems to know how to make a compiler do it either.

The powerpc chips from IBM have usually been very impressive when
announced.  They have so far been rather successful too.  Too bad they
cost so much and use so much power.  They really are monster sized
CPUs these days.  I am very impressed by them.  Now if only they were
affordable enough to get one to actually use.

The atom has not impressed me.  Dropping out of order execution is a
terrible idea on x86 instructions.  It just kills performance.  If you
want an efficient low power CPU, forget x86 and go for arm or mips.
They can get the same performance in 1/4 the power it seems, even though
intel has more advanced manufacturing processes to play with.

OK, I will shut up now and go back to doing real work again. :)

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list