<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    A number of groups have tried to develop extremely parallel

    processors but all seem to have gained little traction.<br>

    <br>

    There was the XPU 128, the Epiphany(<a class="moz-txt-link-freetext" href="http://www.adapteva.com/">http://www.adapteva.com/</a>) and

    more recently the Xenon Phi and AMD Epyc.<br>

    <br>

    At one point I remember reading a article about sun developing an

    asynchronous CPU which would be interesting.<br>

    <br>

    All these processors run into the same set of problems.<br>

        1) x86 silicon is amazingly cheap.<br>

        2) supporting multiple CPUs cause more software support for each

    new CPU architecture.<br>

        3) very little software is capable of truly taking advantage of

    many parallel threads without really funky compilers and software

    design tools.<br>

        4) having designed a fancy CPU most companies try very hard to

    keep their proprietary knowledge all within their own control where

    the x86 instruction set must be just about open source now days.<br>

        5) getting motherboard manufacturers to take a chance on a new

    CPU is not an easy thing.<br>

    <br>

    My benchmark for processor success is: Does several of

    Asus,Supermicro,Tyan,Gigabyte et al make a motherboard for this CPU.<br>

    <br>

    Even people with deep pockets like DEC with their Alpha CPU and IBM

    with their Power CPUs have not been able to make a significant

    inroad into the commodity server world.<br>

    Mips has had some luck with low to mid range systems for routers and

    storage systems but their server business is long gone with the

    death of SGI.<br>

    Sun/Oracle has had some luck with the Sparc but not all that much

    outside their own use and I am just speculating but I would bet that

    Sun/Oracle sells more x86 systems than Sparc systems.<br>

    <br>

    ARM seems to be having some luck but I believe that luck is because

    of their popularity in the small computer systems world sliding into

    supporting larger systems and not by being designed for servers from

    the get go.<br>

    <br>

    I am a bit of a processor geek and have put lots of effort in the

    past into elegant processors that just seem to go nowhere.<br>

    I would love to see some technologies other than the current von

    Neumann somewhat parallel SMP but I have a sad feeling that that

    will be a long time coming.<br>

    <p>With the latest screw-up from Intel and the huge exploit surface

      that is the Intel ME someone may be able to get some traction by

      coming up with a processor that is designed and verified for

      security.<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 01/29/2018 05:36 PM, David

      Collier-Brown via talk wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:dc5468e9-b8fb-c7e6-155c-ba006cb6267a@rogers.com">

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      <p>Kunle Olukotun didn't like systems that wasted their time

        stalled on loads and branches. He and his team at Afara

        Websystems therefor designed a non-speculating processor that

        did work without waits. It became the Sun T1.</p>

      <h1>Speed without speculating</h1>

      <p>The basic idea is to have more decoders than ALUs, so you can

        have lots of threads competing for an ALU.  If, for example,

        thread 0 comes to a load, it will stall, so on the next

        instruction thread 1 gets the ALU, and runs... until it stalls

        and thread 2 get the ALU.  Ditto for thread 3, and control goes

        back to thread 0, which has completed a multi-cycle fetch from

        cache and is ready to proceed once more.</p>

      <p>That is the basic idea of the Sun T-series processors.</p>

      <p>The strength is that the ALUs are never waiting for work. The

        weakness is that individual threads still have to wait for data

        to come from cache.</p>

      <h1>You can improve on that</h1>

      <p>Now imagine it isn't entire ALUs that are the available

        resources, its individual ALU component, like adders.  Now the

        scenario becomes</p>

      <ul>

        <li>thread 0 stalls</li>

        <li>thread 1 get an adder</li>

        <li>thread 2 gets a compare (really a subtracter)</li>

        <li>thread 3 gets a branch unit, and will probably need to wait

          in the next cycle</li>

        <li>thread 4 gets an adder</li>

        <li>thread 5 gets an FPU</li>

      </ul>

      <p>... and so on. Each cycle, the hardware assigns as many ALU

        components as it has available to threads, all of which can run.

        Only the stalled threads are waiting, and they don't need ALU

        bits to do that.</p>

      <p>Now more threads can run at the same time, the ALU components

        are (probabilistically) all busy, and we have increased

        capacity. But individual threads are still waiting for cache...<br>

      </p>

      <h1>Do I feel lucky?</h1>

      <p>In principle, we could allocate two adders to thread 5, one

        doing the current instruction and another doing a subsequent,

        non-dependent instruction. It's not speculative, but it is

        out-of-order. That makes some threads twice as fast when doing

        non-interacting calculations. Allocate it three adders and it's

        three times as fast.</p>

      <p>If we're prepared to have more ALU components than decoders,

        decode deeply and we have enough of each to be likely to be able

        to find lots of non-dependent instructions, then we can be

        executing multiple instructions at once in multiple streams, and

        probabilistically get <em>startlingly</em> better performance.</p>

      <p>I can see a new kind of optimizing compiler, too: one which

        tries to group non-dependent instructions together.</p>

      <h1>Conclusion</h1>

      <p>Is this what happens in a T5? That's a question for a hardware

        developer: I have no idea... yet</p>

      <p><br>

      </p>

      <p>Links:<br>

      </p>

      <p><a class="moz-txt-link-freetext"

          href="https://en.wikipedia.org/wiki/Kunle_Olukotun"

          moz-do-not-send="true">https://en.wikipedia.org/wiki/Kunle_Olukotun</a></p>

      <p><a class="moz-txt-link-freetext"

          href="https://en.wikipedia.org/wiki/Afara_Websystems"

          moz-do-not-send="true">https://en.wikipedia.org/wiki/Afara_Websystems</a></p>

      <p><a class="moz-txt-link-freetext"

href="https://web.archive.org/web/20110720050850/http://www-hydra.stanford.edu/%7Ekunle/"

          moz-do-not-send="true">https://web.archive.org/web/20110720050850/http://www-hydra.stanford.edu/~kunle/</a></p>

      <pre class="moz-signature" cols="72">-- 

David Collier-Brown,         | Always do right. This will gratify

System Programmer and Author | some people and astonish the rest

<a class="moz-txt-link-abbreviated" href="mailto:davecb@spamcop.net" moz-do-not-send="true">davecb@spamcop.net</a>           |                      -- Mark Twain

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">---

Talk Mailing List

<a class="moz-txt-link-abbreviated" href="mailto:talk@gtalug.org">talk@gtalug.org</a>

<a class="moz-txt-link-freetext" href="https://gtalug.org/mailman/listinfo/talk">https://gtalug.org/mailman/listinfo/talk</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Alvin Starr                   ||   land:  (905)513-7688

Netvel Inc.                   ||   Cell:  (416)806-0133

<a class="moz-txt-link-abbreviated" href="mailto:alvin@netvel.net">alvin@netvel.net</a>              ||

</pre>

  </body>

</html>