[GTALUG] example of why RISC was a good idea

D. Hugh Redelmeier hugh at mimosa.com
Sat May 21 13:33:50 EDT 2016


<https://software.intel.com/en-us/articles/google-vp9-optimization>

Intel describing how they improved the performance of the VP9 decoder for 
Silvermont, a recent Atom core.

The meat is several not-really-obvious changes to the code to overcome 
limitations of the instruction decoder.  The optimizations seem particular 
to Silvermont but the article says:
	Testing against the future Intel Atom platforms, codenamed Goldmont and 
	Tremont, the VP9 optimizations delivered additional gains.

These optimizations did nothing for Core processors as far as I can tell.  
I don't know if it affects any AMD processors.

A RISC processor would not have a complex instruction decoder so this kind 
of hacking would not apply.  I will admit that there are "hazards" in RISC 
processors that are worth paying attention to when selecting and ordering 
instructions but these tend to be clearer.

Another thing in the paper:

	The overall results were outstanding. The team improved user-level 
	performance by up to 16 percent (6.2 frames per second) in 64-bit 
	mode and by about 12 percent (1.65 frames per second) in 32-bit 
	mode. This testing included evaluation of 32-bit and 64-bit GCC 
	and Intel® compilers, and concluded that the Intel compilers 
	delivered the best optimizations by far for Intel® Atom™ 
	processors. When you multiply this improvement by millions of 
	viewers and thousands of videos, it is significant. The WebM team 
	at Google also recognized this performance gain as extremely 
	significant. Frank Gilligan, a Google engineering manager, 
	responded to the team’s success: “Awesome. It looks good. I can’t 
	wait to try everything out.” Testing against the future Intel Atom 
	platforms, codenamed Goldmont and Tremont, the VP9 optimizations 
	delivered additional gains.

Consider 64-bit.  If 16% improvement is 6.2 f/s, then the remaining 84% 
would be 32.55 f/s.  Not great, but OK.

For 32-bit, 12% is 1.65 f/s; the remaining 88% would be 12 f/s.  Totally 
useless, I think.

Quite interesting how different these two are.


More information about the talk mailing list