[GTALUG] Coffee Lake GPU support [was: IBM Mainframe and z/OS]

D. Hugh Redelmeier hugh at mimosa.com
Sun Dec 10 00:38:27 EST 2017


| From: Russell via talk <talk at gtalug.org>

| On December 9, 2017 1:58:58 PM EST, "D. Hugh Redelmeier via talk" <talk at gtalug.org> wrote:

| >| Most recently, after trying several kernel taints
| >
| >What does that mean?  As far as I understand, the kernel reports
| >itself tainted if it observes any of a number of distateful things.
| >Like loading proprietary kernel modules.  The idea is that kernel
| >maintainers will ignore problem reports from folks running such
| >modules.
| 
| I tried some recommendations, adding i915.alpha_support=1 and a few 
| others on boot. I was eventually able to change display resolutions, but 
| this triggered a kernel crash. The system recovered but I didn't even 
| see the ABRT report until I updated F27.

I still don't understand what you are calling "kernel taints".

I recommended i915.alpha_support=1 because I remember that you got a
Coffee Lake processor but forgot you got a discrete card.  You are not
using the Intel GPU, so don't use that parameter.  I do hope it is
harmless.

| Now I'm at 4.13.16-302.fc27.x86_64. running Nvidia using their own 
| driver. I've tainted the kernel by blacklisting Nouveau, which in fact 
| appears to have provided me with more resolution choices. I may revert 
| soon.

No, you have not tainted the kernel by blacklisting Nouveau.  You've
tainted it by loading the Nvidia module.  That matches the meaning of
tainting as I understand it and have explained.

I too use the Nvidia drive, reluctantly.  My X (or Wayland) craps out
when I use Nouveau.  This has been the case for some years, through
many versions of Fedora.  I'm not happy with this but am unwilling to
spend the time required to provide a decent bug report.  My most
recent experience was better, but no cigar.  I use a GTX 650 to drive
an UltraHD TV set at 30Hz.  I'm guessing that the latest crash was due
to running of of display memory -- it happened when I ran restarted
firefox with perhaps hundreds of tabs.

I understand that Noveau isn't 100% capable on current Nvidia cards
because Nvidia hasn't disclosed newer clocking/power-management
features and they haven't made available the current firmware blob
necessary.  I don't think that these problems apply for my ancient
card.

Unfortunately, AMD cards aren't completely open either.  For example,
the method for getting sound on HDMI was undisclosed.  That may have
been fixed (someone reverse engineered a bit of that).  But it is
indicative of the gotchas.

The best GPU player, generally, has been Intel.  Probably because they
are playing from behind.  They did go through a bad patch where the
drivers were unreliable but that was a few years ago.

| I chose F27 for this build because I was very impressed with Gnome on 
| Wayland on my HP-110. It handles display much better than Mint or Bunsen 
| etc on it.

The proprietary Nvidia driver precludes Wayland.  Some day that may
change but it is out of our control.

| >| on my Intel gpu, which 
| >| is having font rendering issues, I dumped my DSDT table and found 
| >| namespace/pstate conflicts in returning zero as serialized data.
| >
| >I'm not sure what "returning zero as serialized data" means in this
| >context. 
| >
| >There are lots of buggy ACPI tables.  If you are lucky, you can ignore
| >them.
| >
| >You can disassemble ACPI tables and recompile them and get a surprising
| >number of compiler warnings.  I vaguely remember hearing the firmware
| >developers generally use a Microsoft ACPI compiler but we Linux users
| >use an Intel compiler, and it flags more errors (or at least different
| >errors) than the Microsoft one.
| >
| >One of the common errors is ACPI routines falling off the end rather
| >than returning a value.
| >
| >Does ACPI have anything to say about GPUs?
| 
| Not specifically, there are method, object and value references, which 
| I'm currently trying to get an understanding of. Keeping pstate values 
| stable and ordered, for finer grained control over reporting power 
| management, seems to conflict with some methods.

Keep separate problems separate, if possible.

ACPI and your GPU problems are likely separate.

If you are using the Intel GPU after all, then you must have removed your 
Nvidia card.  At that point the kernel parameter I mentioned probably 
becomes relevant.

| This warning, which looks to me like it's saying I'm getting 1 when I 
| expect 0, is related to one ACPI error repeated seven times.
| 
| This is where my base math skills fall down. Warning 4089. Is it a 0 
| object or a zero method?

That doesn't seem like a "math skill" to me.  That terminology just seems 
to cloud the problem solving.  (Note to self: jokes while trying to 
disentangle a confused communication probably make things harder.)

When in doubt, google error messages.  But be prepared to be skeptical
about the answers.

You should also specify what is generating the diagnostics.  The
kernel?  An ACPI compiler?  It looks to me like the latter.  If you
are recompiling ACPI tables, I think that you are going in the wrong
direction for solving GPU problems.

If you think that the firmware is screwed up, look for a newer
version.  Self-help with ACPI tables is a desperation move (and
potentially educational).  The biggest ACPI issues are on notebooks.

I get errors like the following on my desktop machine.  From the
kernel, on every boot.  They are aparently due to a kernel bug.  They
don't seem to break anything.

[    3.018775] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20170531/dsopcode-235)
[    3.018830] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
[    3.018873] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
[    3.018962] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20170531/dsopcode-235)
[    3.019010] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
[    3.019050] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
[    3.019137] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20170531/dsopcode-235)
[    3.019179] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
[    3.019219] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20170531/psparse-550)


| Line 4093 =length appears to exceed line 4091 =range maximum

ACPI is kind of an assembly language.  Use your assembly-language 
intuition for dealing with these diagnostics.  The messages are mostly 
clear but they refer to rules that you must find in the ACPI standards.


More information about the talk mailing list