[GTALUG] Crashes

ac ac at main.me
Tue Jan 31 09:23:44 EST 2017


On Tue, 31 Jan 2017 09:07:35 -0500
Giles Orr via talk <talk at gtalug.org> wrote:

> My primary machine is crashing with increasing frequency.  The
> commonest error I'm seeing in the log looks like this:
> 

my 1c observation (with limited data) - check your drive - had similar
soft locks just before head crash (close to the start of the part) as i
said, ymmv :)

hth 

Andre

> Jan 29 18:29:39 toshi7 kernel: nouveau 0000:01:00.0: DRM: suspending
> kernel object tree...
> Jan 29 18:30:00 toshi7 kernel: NMI watchdog: BUG: soft lockup - CPU#3
> stuck for 23s! [kscreenlocker_g:19647]
> Jan 29 18:30:00 toshi7 kernel: Modules linked in: fuse uas usb_storage
> rfcomm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set
> nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_nat
> nf_conntrack ...
> 
> I realize that I'm probably not giving enough information, but pasting
> large chunks of log files would be just as counterproductive in its
> own way.  I've seen this one A LOT - and sometimes I get it and the
> machine goes hours (but not days) before crashing.  So ... is
> kscreenlocker likely to be the problem here?  When I searched for "BUG
> soft lockup CPU stuck for" on Google, the top result had exactly the
> same number of seconds, and said that replacing the power supply fixed
> the problem.  Which is a step I'd probably be willing to take, but
> this isn't a desktop, it's a laptop.  So I'd want to be very sure as
> the power supply is unique to this machine (if it's available at all)
> and probably quite expensive.
> 
> The processor:
> 
> Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (4594 bogomips)
> current speed: 1274MHz, 4 cores, 8 threads
> 
> While it's not a current gen processor, this is still a good machine
> and I'd rather fix it than toss it.
> 
> Got an immediate crash this morning, and to my surprise the error was
> very different:
> 
> Jan 31 07:56:35 toshi7 kernel: ------------[ cut here ]------------
> Jan 31 07:56:35 toshi7 kernel: kernel BUG at lib/radix-tree.c:769!
> Jan 31 07:56:35 toshi7 kernel: invalid opcode: 0000 [#1] SMP
> Jan 31 07:56:35 toshi7 kernel: Modules linked in: uas usb_storage
> rfcomm ip6t_rpfilter ip6t_REJECT nf_reject
> _ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge
> stp llc ip6table_nat nf_conntrack_ipv6 ...
> 
> Finally, I'm also getting this periodically:
> 
> Jan 28 08:49:52 toshi7 kernel: CPU2: Core temperature above threshold,
> cpu clock throttled (total events = 1
> )
> Jan 28 08:49:52 toshi7 kernel: CPU6: Core temperature above threshold,
> cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: CPU7: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: CPU4: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: CPU1: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: CPU5: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: CPU3: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: CPU0: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: CPU6: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: mce: [Hardware Error]: Machine check
> events logged
> Jan 28 08:49:52 toshi7 kernel: CPU2: Package temperature above
> threshold, cpu clock throttled (total events = 1)
> Jan 28 08:49:52 toshi7 kernel: mce: [Hardware Error]: Machine check
> events logged
> Jan 28 08:49:52 toshi7 kernel: CPU6: Core temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU2: Core temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU4: Package temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU5: Package temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU1: Package temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU3: Package temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU7: Package temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU0: Package temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU2: Package temperature/speed normal
> Jan 28 08:49:52 toshi7 kernel: CPU6: Package temperature/speed normal
> 
> This suggests that it's overheating, throttling, and recovering pretty
> much instantaneously: my thought is that it's probably not a problem,
> but I thought I should check.
> 
> How should I proceed from here:
> - the processor is going funny, replace it
> - junk the laptop, it's toast
> - debug further (how?)
> - replace the power supply
> - uninstall kscreenlocker and see what happens
> 



More information about the talk mailing list