[GTALUG] Coffee Lake GPU support [was: IBM Mainframe and z/OS]

Sun Dec 10 07:19:37 EST 2017

On December 10, 2017 12:38:27 AM EST, "D. Hugh Redelmeier via talk" <talk at gtalug.org> wrote:
>| From: Russell via talk <talk at gtalug.org>
>
>| On December 9, 2017 1:58:58 PM EST, "D. Hugh Redelmeier via talk"
><talk at gtalug.org> wrote:
>
>| >| Most recently, after trying several kernel taints
>| >
>| >What does that mean?  As far as I understand, the kernel reports
>| >itself tainted if it observes any of a number of distateful things.
>| >Like loading proprietary kernel modules.  The idea is that kernel
>| >maintainers will ignore problem reports from folks running such
>| >modules.
>| 
>| I tried some recommendations, adding i915.alpha_support=1 and a few 
>| others on boot. I was eventually able to change display resolutions,
>but 
>| this triggered a kernel crash. The system recovered but I didn't even
>
>| see the ABRT report until I updated F27.
>
>I still don't understand what you are calling "kernel taints".
>
>I recommended i915.alpha_support=1 because I remember that you got a
>Coffee Lake processor but forgot you got a discrete card.  You are not
>using the Intel GPU, so don't use that parameter.  I do hope it is
>harmless.

Thanks for taking the time to go over this with me. 

Actually, I didn't get the Nvidia 1050 Ti until a couple of days ago. The first thing I did on first boot after assembly was enable the alpha support for the Intel GPU. Then I dnf updated and saw that ABRT notifications, which pop up in the calendar widget, started displaying on login. Once I realized that DRM wasn't going to be easily available, even in Rawhide, I bought the Nvidia card and used their vendors install script.

Excellent step by step instructions for Fedora Nvidia in/uninstall can be found here, if anyone else needs them. Recently updated to use systemd targets for use of the necessary telinit runlevels.

https://www.if-not-true-then-false.com/2015/fedora-nvidia-guide/

>
>| Now I'm at 4.13.16-302.fc27.x86_64. running Nvidia using their own 
>| driver. I've tainted the kernel by blacklisting Nouveau, which in
>fact 
>| appears to have provided me with more resolution choices. I may
>revert 
>| soon.

I have just reverted to Nouveau. I guess I could have entered EDID data for the display resolutions which the Nvidia default config left out, but using the Nvidia driver somehow ran the card so that the cooling fan was running faster. It's vertical and the fan blades are a little unbalanced and so it seems to squeak a bit at higher speed. 

So far going back to Nouveau has at least stopped the squeaking.

>
>No, you have not tainted the kernel by blacklisting Nouveau.  You've
>tainted it by loading the Nvidia module.  That matches the meaning of
>tainting as I understand it and have explained.

Thanks, I get that now.

>
>I too use the Nvidia drive, reluctantly.  My X (or Wayland) craps out
>when I use Nouveau.  This has been the case for some years, through
>many versions of Fedora.  I'm not happy with this but am unwilling to
>spend the time required to provide a decent bug report.  My most
>recent experience was better, but no cigar.  I use a GTX 650 to drive
>an UltraHD TV set at 30Hz.  I'm guessing that the latest crash was due
>to running of of display memory -- it happened when I ran restarted
>firefox with perhaps hundreds of tabs.
>
>I understand that Noveau isn't 100% capable on current Nvidia cards
>because Nvidia hasn't disclosed newer clocking/power-management
>features and they haven't made available the current firmware blob
>necessary.  I don't think that these problems apply for my ancient
>card.
>
>Unfortunately, AMD cards aren't completely open either.  For example,
>the method for getting sound on HDMI was undisclosed.  That may have
>been fixed (someone reverse engineered a bit of that).  But it is
>indicative of the gotchas.
>
>The best GPU player, generally, has been Intel.  Probably because they
>are playing from behind.  They did go through a bad patch where the
>drivers were unreliable but that was a few years ago.
>
>| I chose F27 for this build because I was very impressed with Gnome on
>
>| Wayland on my HP-110. It handles display much better than Mint or
>Bunsen 
>| etc on it.
>
>The proprietary Nvidia driver precludes Wayland.  Some day that may
>change but it is out of our control.
>
>| >| on my Intel gpu, which 
>| >| is having font rendering issues, I dumped my DSDT table and found 
>| >| namespace/pstate conflicts in returning zero as serialized data.
>| >
>| >I'm not sure what "returning zero as serialized data" means in this
>| >context. 
>| >
>| >There are lots of buggy ACPI tables.  If you are lucky, you can
>ignore
>| >them.
>| >
>| >You can disassemble ACPI tables and recompile them and get a
>surprising
>| >number of compiler warnings.  I vaguely remember hearing the
>firmware
>| >developers generally use a Microsoft ACPI compiler but we Linux
>users
>| >use an Intel compiler, and it flags more errors (or at least
>different
>| >errors) than the Microsoft one.
>| >
>| >One of the common errors is ACPI routines falling off the end rather
>| >than returning a value.
>| >
>| >Does ACPI have anything to say about GPUs?
>| 
>| Not specifically, there are method, object and value references,
>which 
>| I'm currently trying to get an understanding of. Keeping pstate
>values 
>| stable and ordered, for finer grained control over reporting power 
>| management, seems to conflict with some methods.
>
>Keep separate problems separate, if possible.
>
>ACPI and your GPU problems are likely separate.
>
>If you are using the Intel GPU after all, then you must have removed
>your 
>Nvidia card.  At that point the kernel parameter I mentioned probably 
>becomes relevant.
>
>| This warning, which looks to me like it's saying I'm getting 1 when I
>
>| expect 0, is related to one ACPI error repeated seven times.
>| 
>| This is where my base math skills fall down. Warning 4089. Is it a 0 
>| object or a zero method?
>
>That doesn't seem like a "math skill" to me.  That terminology just
>seems 
>to cloud the problem solving.  (Note to self: jokes while trying to 
>disentangle a confused communication probably make things harder.)
>
>When in doubt, google error messages.  But be prepared to be skeptical
>about the answers.
>
>You should also specify what is generating the diagnostics.  The
>kernel?  An ACPI compiler?  It looks to me like the latter.  If you
>are recompiling ACPI tables, I think that you are going in the wrong
>direction for solving GPU problems.
>
>If you think that the firmware is screwed up, look for a newer
>version.  Self-help with ACPI tables is a desperation move (and
>potentially educational).  The biggest ACPI issues are on notebooks.
>
>I get errors like the following on my desktop machine.  From the
>kernel, on every boot.  They are aparently due to a kernel bug.  They
>don't seem to break anything.
>
>[    3.018775] ACPI Error: Field [D128] at bit offset/length 128/1024
>exceeds size of target Buffer (160 bits) (20170531/dsopcode-235)
>[    3.018830] ACPI Error: Method parse/execution failed \HWMC,
>AE_AML_BUFFER_LIMIT (20170531/psparse-550)
>[    3.018873] ACPI Error: Method parse/execution failed
>\_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
>[    3.018962] ACPI Error: Field [D128] at bit offset/length 128/1024
>exceeds size of target Buffer (160 bits) (20170531/dsopcode-235)
>[    3.019010] ACPI Error: Method parse/execution failed \HWMC,
>AE_AML_BUFFER_LIMIT (20170531/psparse-550)
>[    3.019050] ACPI Error: Method parse/execution failed
>\_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
>[    3.019137] ACPI Error: Field [D128] at bit offset/length 128/1024
>exceeds size of target Buffer (160 bits) (20170531/dsopcode-235)
>[    3.019179] ACPI Error: Method parse/execution failed \HWMC,
>AE_AML_BUFFER_LIMIT (20170531/psparse-550)
>[    3.019219] ACPI Error: Method parse/execution failed
>\_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
>
>
>| Line 4093 =length appears to exceed line 4091 =range maximum
>
>ACPI is kind of an assembly language.  Use your assembly-language 
>intuition for dealing with these diagnostics.  The messages are mostly 
>clear but they refer to rules that you must find in the ACPI standards.

Thanks for the support. One thing I noticed immediately after reverting to Nouveau was that ABRT now prompts me to join bugzilla and file a report. Before this it would tell me I had the option to email the maintainers privately. It seems that the AI approves of my kernel now.

I've installed a hauppauge quad win TV tuner. I haven't watched TV in years, but it registered something like 20 OTA stations and even the credit card remote shows up on /dev/input/by-path (I remember your tip about that, from some audio stuff a few years ago)

I installed KDE to check out plasma and kaffeine for the TV. 

Plasma generated an X crash report on KDE so I went back to Nouveau. 

The remote is crap. I can't even tell if it is powered on, so I can't really test lircd. A quick check of the log says no node at /sys/class/rc/rc*

I did try a udev rule to link /dev/lirc0 to the input event node but no luck yet.

So a new battery and more tinkering with stuff I'm a little more familiar with.

Thanks,
>---
>Talk Mailing List
>talk at gtalug.org
>https://gtalug.org/mailman/listinfo/talk

-- 
Russell