[GTALUG] (question) GPU + Data center = ?

Tue Jul 14 10:15:49 EDT 2020

| From: David Mason via talk <talk at gtalug.org>

| The short answer is: Machine Learning (and other data-mining-like applications)

A much LONGER answer:

There has been a field of Computing on GPUs for perhaps a dozen years.
GPUs have evolved into having a LOT of Floating Point units that can
act simultaneously, mostly in lock-step.

They are nasty to program: conventional high-level languages and
programmers aren't very good at exploiting GPUs.

NVidia's Cuda (dominant) and the industry standard OpenCL (struggling)
are used to program the combination of the host CPU and the GPU.

Generally, a set of subroutines is written to exploit a GPU and those
subroutines get called by conventional programs.  Examples of such a
library: TensorFlow, PyTorch, OpenBLAS.  The first two are for machine
learning.

Some challenges GPU programmers face:

- GPUs cannot do everything that programmers are used to.  A program
  using a GPU must be composed of a Host CPU program and a GPU
  program.  (Some languages let you do the split within a single
  program, but there still is a split.)

- GPU programming requires a lot effort designing how data gets
  shuffled in and out of the GPU's dedicated memory.  Without care,
  the time eaten by this can easily overwhelm the time saved by using a
  GPU instead of just the host CPU.

  Like any performance problem, one needs to measure to get an
  accurate understanding.  The result might easily suggest massive
  changes to a program.

- Each GPU links its ALUs into fixed-size groups.  Problems must be
  mapped onto these groups, even if that isn't natural.  A typical size
  is 64 ALUs.  Each ALU in a group is either executing the same
  instruction, or is idled.

  OpenCL and Cuda help the programmer create doubly-nested loops that
  map well onto this hardware.

  Lots of compute-intensive algorithms are not easy to break down into this
  structure.

- GPUs are not very good at conventional control-flow.  And it is
  different from what most programmers expect.  For example, when an
  "if" is executed, all compute elements in a group are tied up, even
  if they are not active.  Think how this applies to loops.

- each GPU is kind of different, it is hard to program generically.
  This is made worse by the fact that Cuda, the most popular language,
  is proprietary to NVidia.  Lots of politics here.

- GPUs are not easily safe to share amongst multiple processes.  This
  is slowly improving.

- New GPUs are getting better, so one should perhaps revisit existing
  programs regularly.

- GPU memories are not virtual.  If you hit the limit of memory on a
  card, you've got to change your program.

  Worse: there is a three or more level hierarchy of fixed-size
  memories within the GPU that needs to be explicitly managed.

- GPU software is oriented to performance.  Compile times are long.
  Debugging is hard and different.

Setting up the hardware and software for GPU computing is stupidly
challenging.  Alex gave a talk to GTALUG (video available) about his
playing with this.  Here's what I remember:

- AMD is mostly open source but not part of most distros (why???).
  You need to use select distros plus out-of-distro software.  Support
  for APUs (AMD processor chips with built-in GPUs) is still missing
  (dumb).

- NVidia is closed source.  Alex found it easier to get going.  Still
  work.  Still requires out-of-distro software.

- He didn't try Intel.  Ubiquitous but not popular for GPU computing
  since all units are integrated and thus limited in crunch.

  Intel, being behind, is the nicest player.