[GTALUG] example of why RISC was a good idea

Sun May 22 12:00:39 EDT 2016

On Sat, May 21, 2016 at 01:33:50PM -0400, D. Hugh Redelmeier wrote:
> <https://software.intel.com/en-us/articles/google-vp9-optimization>
> 
> Intel describing how they improved the performance of the VP9 decoder for 
> Silvermont, a recent Atom core.
> 
> The meat is several not-really-obvious changes to the code to overcome 
> limitations of the instruction decoder.  The optimizations seem particular 
> to Silvermont but the article says:
> 	Testing against the future Intel Atom platforms, codenamed Goldmont and 
> 	Tremont, the VP9 optimizations delivered additional gains.
> 
> These optimizations did nothing for Core processors as far as I can tell.  
> I don't know if it affects any AMD processors.
> 
> A RISC processor would not have a complex instruction decoder so this kind 
> of hacking would not apply.  I will admit that there are "hazards" in RISC 
> processors that are worth paying attention to when selecting and ordering 
> instructions but these tend to be clearer.
> 
> Another thing in the paper:
> 
> 	The overall results were outstanding. The team improved user-level 
> 	performance by up to 16 percent (6.2 frames per second) in 64-bit 
> 	mode and by about 12 percent (1.65 frames per second) in 32-bit 
> 	mode. This testing included evaluation of 32-bit and 64-bit GCC 
> 	and Intel® compilers, and concluded that the Intel compilers 
> 	delivered the best optimizations by far for Intel® Atom™ 
> 	processors. When you multiply this improvement by millions of 
> 	viewers and thousands of videos, it is significant. The WebM team 
> 	at Google also recognized this performance gain as extremely 
> 	significant. Frank Gilligan, a Google engineering manager, 
> 	responded to the team’s success: “Awesome. It looks good. I can’t 
> 	wait to try everything out.” Testing against the future Intel Atom 
> 	platforms, codenamed Goldmont and Tremont, the VP9 optimizations 
> 	delivered additional gains.
> 
> Consider 64-bit.  If 16% improvement is 6.2 f/s, then the remaining 84% 
> would be 32.55 f/s.  Not great, but OK.
> 
> For 32-bit, 12% is 1.65 f/s; the remaining 88% would be 12 f/s.  Totally 
> useless, I think.
> 
> Quite interesting how different these two are.

64 bit has twice the registers, which for a lot of code is a huge
difference.  That is the biggest improvement AMD made to x86.  Scrapping
x87 is probably number 2.

-- 
Len Sorensen