[GTALUG] (question) GPU + Data center =?utf-8?Q?=3D_?=?

Thu Jul 16 18:49:52 EDT 2020

Thanks, Hugh! Nicely explained.

../Dave
On Jul 14, 2020, 10:15 AM -0400, D. Hugh Redelmeier via talk <talk at gtalug.org>, wrote:
> | From: David Mason via talk <talk at gtalug.org>
>
> | The short answer is: Machine Learning (and other data-mining-like applications)
>
> A much LONGER answer:
>
> There has been a field of Computing on GPUs for perhaps a dozen years.
> GPUs have evolved into having a LOT of Floating Point units that can
> act simultaneously, mostly in lock-step.
>
> They are nasty to program: conventional high-level languages and
> programmers aren't very good at exploiting GPUs.
>
> NVidia's Cuda (dominant) and the industry standard OpenCL (struggling)
> are used to program the combination of the host CPU and the GPU.
>
> Generally, a set of subroutines is written to exploit a GPU and those
> subroutines get called by conventional programs. Examples of such a
> library: TensorFlow, PyTorch, OpenBLAS. The first two are for machine
> learning.
>
> Some challenges GPU programmers face:
>
> - GPUs cannot do everything that programmers are used to. A program
> using a GPU must be composed of a Host CPU program and a GPU
> program. (Some languages let you do the split within a single
> program, but there still is a split.)
>
> - GPU programming requires a lot effort designing how data gets
> shuffled in and out of the GPU's dedicated memory. Without care,
> the time eaten by this can easily overwhelm the time saved by using a
> GPU instead of just the host CPU.
>
> Like any performance problem, one needs to measure to get an
> accurate understanding. The result might easily suggest massive
> changes to a program.
>
> - Each GPU links its ALUs into fixed-size groups. Problems must be
> mapped onto these groups, even if that isn't natural. A typical size
> is 64 ALUs. Each ALU in a group is either executing the same
> instruction, or is idled.
>
> OpenCL and Cuda help the programmer create doubly-nested loops that
> map well onto this hardware.
>
> Lots of compute-intensive algorithms are not easy to break down into this
> structure.
>
> - GPUs are not very good at conventional control-flow. And it is
> different from what most programmers expect. For example, when an
> "if" is executed, all compute elements in a group are tied up, even
> if they are not active. Think how this applies to loops.
>
> - each GPU is kind of different, it is hard to program generically.
> This is made worse by the fact that Cuda, the most popular language,
> is proprietary to NVidia. Lots of politics here.
>
> - GPUs are not easily safe to share amongst multiple processes. This
> is slowly improving.
>
> - New GPUs are getting better, so one should perhaps revisit existing
> programs regularly.
>
> - GPU memories are not virtual. If you hit the limit of memory on a
> card, you've got to change your program.
>
> Worse: there is a three or more level hierarchy of fixed-size
> memories within the GPU that needs to be explicitly managed.
>
> - GPU software is oriented to performance. Compile times are long.
> Debugging is hard and different.
>
> Setting up the hardware and software for GPU computing is stupidly
> challenging. Alex gave a talk to GTALUG (video available) about his
> playing with this. Here's what I remember:
>
> - AMD is mostly open source but not part of most distros (why???).
> You need to use select distros plus out-of-distro software. Support
> for APUs (AMD processor chips with built-in GPUs) is still missing
> (dumb).
>
> - NVidia is closed source. Alex found it easier to get going. Still
> work. Still requires out-of-distro software.
>
> - He didn't try Intel. Ubiquitous but not popular for GPU computing
> since all units are integrated and thus limited in crunch.
>
> Intel, being behind, is the nicest player.
> ---
> Post to this mailing list talk at gtalug.org
> Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gtalug.org/pipermail/talk/attachments/20200716/b321322b/attachment.html>