<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title></title>

</head>

<body>

<div name="messageBodySection">

<div dir="auto">Thanks, Hugh! Nicely explained.</div>

</div>

<div name="messageSignatureSection"><br />

<div class="matchFont">../Dave</div>

</div>

<div name="messageReplySection">On Jul 14, 2020, 10:15 AM -0400, D. Hugh Redelmeier via talk <talk@gtalug.org>, wrote:<br />

<blockquote type="cite" style="border-left-color: grey; border-left-width: thin; border-left-style: solid; margin: 5px 5px;padding-left: 10px;">| From: David Mason via talk <talk@gtalug.org><br />

<br />

| The short answer is: Machine Learning (and other data-mining-like applications)<br />

<br />

A much LONGER answer:<br />

<br />

There has been a field of Computing on GPUs for perhaps a dozen years.<br />

GPUs have evolved into having a LOT of Floating Point units that can<br />

act simultaneously, mostly in lock-step.<br />

<br />

They are nasty to program: conventional high-level languages and<br />

programmers aren't very good at exploiting GPUs.<br />

<br />

NVidia's Cuda (dominant) and the industry standard OpenCL (struggling)<br />

are used to program the combination of the host CPU and the GPU.<br />

<br />

Generally, a set of subroutines is written to exploit a GPU and those<br />

subroutines get called by conventional programs. Examples of such a<br />

library: TensorFlow, PyTorch, OpenBLAS. The first two are for machine<br />

learning.<br />

<br />

Some challenges GPU programmers face:<br />

<br />

- GPUs cannot do everything that programmers are used to. A program<br />

using a GPU must be composed of a Host CPU program and a GPU<br />

program. (Some languages let you do the split within a single<br />

program, but there still is a split.)<br />

<br />

- GPU programming requires a lot effort designing how data gets<br />

shuffled in and out of the GPU's dedicated memory. Without care,<br />

the time eaten by this can easily overwhelm the time saved by using a<br />

GPU instead of just the host CPU.<br />

<br />

Like any performance problem, one needs to measure to get an<br />

accurate understanding. The result might easily suggest massive<br />

changes to a program.<br />

<br />

- Each GPU links its ALUs into fixed-size groups. Problems must be<br />

mapped onto these groups, even if that isn't natural. A typical size<br />

is 64 ALUs. Each ALU in a group is either executing the same<br />

instruction, or is idled.<br />

<br />

OpenCL and Cuda help the programmer create doubly-nested loops that<br />

map well onto this hardware.<br />

<br />

Lots of compute-intensive algorithms are not easy to break down into this<br />

structure.<br />

<br />

- GPUs are not very good at conventional control-flow. And it is<br />

different from what most programmers expect. For example, when an<br />

"if" is executed, all compute elements in a group are tied up, even<br />

if they are not active. Think how this applies to loops.<br />

<br />

- each GPU is kind of different, it is hard to program generically.<br />

This is made worse by the fact that Cuda, the most popular language,<br />

is proprietary to NVidia. Lots of politics here.<br />

<br />

- GPUs are not easily safe to share amongst multiple processes. This<br />

is slowly improving.<br />

<br />

- New GPUs are getting better, so one should perhaps revisit existing<br />

programs regularly.<br />

<br />

- GPU memories are not virtual. If you hit the limit of memory on a<br />

card, you've got to change your program.<br />

<br />

Worse: there is a three or more level hierarchy of fixed-size<br />

memories within the GPU that needs to be explicitly managed.<br />

<br />

- GPU software is oriented to performance. Compile times are long.<br />

Debugging is hard and different.<br />

<br />

Setting up the hardware and software for GPU computing is stupidly<br />

challenging. Alex gave a talk to GTALUG (video available) about his<br />

playing with this. Here's what I remember:<br />

<br />

- AMD is mostly open source but not part of most distros (why???).<br />

You need to use select distros plus out-of-distro software. Support<br />

for APUs (AMD processor chips with built-in GPUs) is still missing<br />

(dumb).<br />

<br />

- NVidia is closed source. Alex found it easier to get going. Still<br />

work. Still requires out-of-distro software.<br />

<br />

- He didn't try Intel. Ubiquitous but not popular for GPU computing<br />

since all units are integrated and thus limited in crunch.<br />

<br />

Intel, being behind, is the nicest player.<br />

---<br />

Post to this mailing list talk@gtalug.org<br />

Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk<br /></blockquote>

</div>

</body>

</html>