RHEL kernel patch backport [was Re: Are you running Linux as your desktop?]

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Mon Nov 15 23:06:36 UTC 2010


On Mon, Nov 15, 2010 at 02:25:18PM -0500, D. Hugh Redelmeier wrote:
> The story is long and probably boring to most folks.  But sometimes
> one can learn from others travails, so here's a summary.
> 
> My original summary "half a change" isn't accurtate enough.  Even this new 
> summary won't be 100% accurate since I'm basing it on my decaying memory.
> 
> My CentOS box is an old first-generation HP AMD64 box.
> <http://h10025.www1.hp.com/ewfrf/wc/softwareCategory?lc=en&dlc=en&cc=ca&os=228&product=404646>
> 
> Optional boring details about my computer; skip if you feel like it:
> 
>     I bought it in 2004 or 2005 from the precursor of TechSource,
>     debranded and refurbished
>     <http://www.techsourcecanada.ca/store/flyer.html> It was probably
>     a customer return from a US retail store, sold off in bulk.
>     "Debranded" meant that the HP branding was covered up or removed
>     and no software was included, not even Windows.  An awesomely
>     inexpensive way for me to get into the AMD64 world.  The
>     motherboard was made for HP by Asus.  It turns out to be a great
>     box: quiet and reliable.  I use it as a server now, hence CentOS.
> 
> The ACPI system allows the BIOS to export functionality to any willing
> OS.  This is a great idea: the original way of exporting
> functionality, code entry-points accessed through INT instructions,
> required that your CPU be in the stupidest mode (i.e. 16-bit mode with
> no memory mapping).  The way ACPI does this is to use a specified
> pseudo-machine code and have each OS include an interpreter for this
> machine code.  Intel even provides and maintains tools for this ACPI
> machine code that work in Linux and Windows (assemblers,
> disassemblers, and interpreters).
> 
> It is up to the machine maker to write/customize/maintain the ACPI
> code itself that is embedded in the BIOS.  Sadly, like all BIOS
> functionality, most manufacturers whack on the code until MS Windows
> seems to work and then never look back.  It turns out that that leaves
> several problems for Linux machines.
> 
> Linux ACPI support has to deal with broken ACPI code: otherwise the
> machines won't be supported.
> 
> My machine's ACPI has a kind of breakage (I think).  There are two
> ways of determining the number of entries in the table that specifies
> how many power states there are.  By one method, all is well.  By
> another, there are bad entries at the end that must be ignored.  No
> problem: the Linux kernel's AMD64 ACPI code ignores bad entries (at
> least the kind I have).
> 
> Apparently the Linux kernel's Intel 64 ACPI code does not ignore bad
> entries.  At some point, some Xeon BIOSes were produced with such bad
> entries.  The Linux kernel hung on those machines.  As a fix, the
> Linux kernel folks put sanity checks in for these table entries,
> common to AMD64 and Intel 64, upstream of where the control diverges.
> This sanity check says: if any entry is bad, consider the whole table
> to be bad.
> 
> With the table ignored, Linux would no longer run my server at less
> than full clock rate.  More precisely, Linux didn't know how to change
> the clock rate so it left it in the initial speed, full.
> 
> With the kernel.org Kernel, a second change was made.  Before the sanity
> check was done, the length of the table was calculated to be the
> lesser of the lengths yielded by the two ways of determining table
> size.  Since the bad entries on my machine are beyond one of these
> lengths, my machine would operate as expected if this second change
> were included.
> 
> Unfortunately, RHEL only backported the first change.  So my computer
> does not get properly throttled on RHEL or CentOS.
> 
> 
> This showed up about a year and a half ago -- 4+ years into the life
> of the computer.  Probably few of them are still in service running
> RHEL/CentOS.
> 
> Googling found me no other reports of this problem.
> 
> Figuring this out took me a long time.  Convincing others took me a
> long time.  I eventually reported it to the kernel.org Kernel bugzilla only
> to have the experts come back and point out that it couldn't happen in
> that kernel (due to the second change).  I then took that report back
> to Red Hat and that was enough to get them to finally see that there
> was a problem.
> 
> Reporting this to CentOS was worse than useless.  They will not
> diverge from RHEL (a good thing).  But reporting and discussing did
> take my time (and theirs).  Red Hat seems fairly open to reports of
> bugs from CentOS users.  Wow.
> 
> My CentOS bug report:
> <https://www.centos.org/modules/newbb/viewtopic.php?viewmode=flat&topic_id=22341&forum=44>
> 
> RHEL bz that culminated in the fix that broke my system:
> <https://bugzilla.redhat.com/show_bug.cgi?id=500311>
> 
> My RHEL bz entry:
> <https://bugzilla.redhat.com/show_bug.cgi?id=559357>
> 
> My kernel.org bz entry.  Oops.
> <https://bugzilla.kernel.org/show_bug.cgi?id=15174>
> Note: Zhang Rui and Bob Moore are at Intel and are kernel
> developers.
> 
> Should Red Hat backport the second change?
> 
> - kernel patch backporting can be dangerous.  Heck, my problem is an
>   example of that.  Skilled/experienced kernel folks are already busy
>   so it might fall onto an inexpereinced person.
> 
> - the current situation is only known to hurt one non-customer who
>   knows a work-around.  True, others may experience this, but where's
>   the evidence?

Well if it was an old HP AMD64 (probably with an ATI chipset, as HP
often used that), then yeah those did tend to be quite a pain and I do
remember tons of kernel patches attempting to fix those types of problems,
usually not very successfully.  Probably not the kind of hardware people
would normally buy for business use.  Fair enough then I suppose,

I must admit, I am not convinced redhat's kernel choice for RHEL is the
right one.  Sticking with one kernel for 5 to 7 years and continuously
backporting new features and drivers to it just doesn't make sense to me.
Either stick with the kernel as is and say this release is for hardware
that was available at release time, and newer hardware will need to use
a newer releease or at least a newer kernel, or just upgrade the kernel
once in a while.  After all those backports can be risky, quite possibly
more so than a complete new kernel would be.  I think they are giving
people a false sense of stability by keeping the old kernel version.

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list