[GTALUG] Running Dell branded Nvidia gtx 1060 in non-dell system

Mon Aug 12 22:46:51 EDT 2019

On Mon, Aug 12, 2019 at 9:01 PM D. Hugh Redelmeier via talk
<talk at gtalug.org> wrote:
>
> | From: xerofoify via talk <talk at gtalug.org>
> |
> | On Mon, Aug 12, 2019 at 6:11 PM D. Hugh Redelmeier via talk
> | <talk at gtalug.org> wrote:
> | > <https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf>
> | >
> | > "Intel(TM) 64 and IA-32 ArchitecturesSoftware Developer's Manual"
> | > Volume 1 of 9.
> | >
> | > I don't see PCIe mentioned there.  Nor would I expect it.  There is
> | > mention of PCI in an example of using the MOVNTDQA instruction.
> | >
> | It was odd before but instructions can touch or swap with PCI so that's why. PCI
> | is not like USB or other protocols it requires overhead on the CPU side if that
> | makes sense including lanes/instructions to a lesser degree. It may not be
> | mentioned for assembly manuals directly but in other hardware documentation
> | very likely.
>
> The architecture seen by a program is often separated from bus issues.
> PCI has historically been addressed as part of the memory address
> space (as opposed to the IO address space).
>
> Once caches were introduced, software needed to be able to make sure
> that it didn't cause misbehaviour in PCI bus operations.  When talking
> to a device, you usually (but not always) wish the cache to be
> bypassed.
>
> Historically on x86 (post i486), you did that using the MTR Registers.
> I'm sure that has since been changed since there were too few of
> those.  But the ideas are there.  See 18.3.1 "Memory-Mapped I/O".
>
> If you look at 12.10.3 "Streaming Load Hint Instruction", you will see
> a discussion of this issue and the MOVNTDQA instruction.  That's the
> context of the example referencing PCI.  There is no need for the PCIe
> version to bleed into the abstract X86 architecture.
>
> BTW "WC" means "Write Combining".  Memory so-designated (e.g. by an
> MTRR) is uncached but the processor may combine writes.  This, for
> example, is often used for accessing graphics card buffers.  Without
> write combining, many more writes would be required.
>
> Interestingly, on the machine I'm using to compose this email,
> /proc/mtrr shows 7 registers with write-back and one uncachable.  None
> is write combining.
> ---
Hugh,

That's correct. I wasn't sure if the manual mentions it directly as
related to PCI express but the correct
way of doing this is DMA. DMA or direct memory access and the amount
that may be buffered from an
 io range these days is dependent on the memory model of the CPU. This
includes the assembly used
to access it, again this does matter to the discussion as Alex is
dealing with GPUs and going across
the PCI express bus is very expensive, its basically a cache miss for GPUs.

Not sure if MTRRs or PATs which seem to be the current version would
help here as that's for mapping
a region of memory to the CPU not dealing with PCI bus latency issues.
You should attempt to map
the region in memory but again you need to poll the GPU so often in
order to get the data so doesn't
seem as good as DMA which does not wait on the processor and just is
filled directly.

Maybe I misspoken in stating that PCI was mentioned directly but that
memory models related to this
are and the amount that may be DMAed is dependent on the processor architecture.

Sorry for the misunderstanding,

Nick

> Post to this mailing list talk at gtalug.org
> Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk