Where's the culprit?

Jarl Stefansson jarl.stefansson-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Mon Sep 5 04:43:11 UTC 2011


Looks to me like the first problem is at:

	BUG: unable to handle kernel paging request at ffffdfff

This would indicate a memory issue, I would open the box and make sure
all of your RAM modules are still properly seated and run a full
memtest if you can.

Jarl

On Sun, Sep 4, 2011 at 10:19 PM, Peter King <peter.king.1-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote:
> On Sun, Sep 04, 2011 at 09:57:07PM -0400, Lennart Sorensen wrote:
>
>> > Sep  4 15:44:00 theseus kernel: Pid: 16344, comm: sed Not tainted 3.0.3-gentoo #6 System Manufacturer System Name/A7V8X-X
>> > Sep  4 15:44:00 theseus kernel: EIP: 0060:[<c10920fb>] EFLAGS: 00210286 CPU: 0
>> > Sep  4 15:44:00 theseus kernel: EIP is at __destroy_inode+0x29/0x62
>> > Sep  4 15:44:00 theseus kernel: EAX: ffffdfff EBX: db04bb88 ECX: 00000003 EDX: ffffdffe
>> > Sep  4 15:44:00 theseus kernel: ESI: db04bb88 EDI: 00000000 EBP: db04bb88 ESP: ef6e7f44
>> > Sep  4 15:44:00 theseus kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> > Sep  4 15:44:00 theseus kernel: Process sed (pid: 16344, ti=ef6e6000 task=f5465be0 task.ti=ef6e6000)
>> > Sep  4 15:44:00 theseus kernel: Stack:
>> > Sep  4 15:44:00 theseus kernel: db04bb88 c109214a db3dfb14 c108f389 db3dfb14 00000000 db04bb88 c10911a2
>> > Sep  4 15:44:00 theseus kernel: f4e4e6c0 00000010 db04bb88 c10831e8 00000020 f4e4e6c8 db3dfb14 f5416620
>> > Sep  4 15:44:00 theseus kernel: f4e4e6c0 00000000 f542d660 ef6e6000 c108091d f542d660 00000000 00000000
>> > Sep  4 15:44:00 theseus kernel: Call Trace:
>> > Sep  4 15:44:00 theseus kernel: [<c109214a>] ? destroy_inode+0x16/0x37
>> > Sep  4 15:44:00 theseus kernel: [<c108f389>] ? d_kill+0x93/0xa5
>> > Sep  4 15:44:00 theseus kernel: [<c10911a2>] ? dput+0xf7/0x100
>> > Sep  4 15:44:00 theseus kernel: [<c10831e8>] ? fput+0x198/0x1b0
>> > Sep  4 15:44:00 theseus kernel: [<c108091d>] ? filp_close+0x54/0x5a
>> > Sep  4 15:44:00 theseus kernel: [<c108097b>] ? sys_close+0x58/0x85
>> > Sep  4 15:44:00 theseus kernel: [<c13bb490>] ? sysenter_do_call+0x12/0x26
>>
>> So this was the interesting bit.  Something called close on a file,
>> which then caused an inode to be removed, and then the kernel blew up.
>>
>> Would have been nice if it had said what pid 16344 was.  I thought it
>> usually did.  Maybe the kernel is missing some config for that.
>
> I take it, then, that it isn't sed (which is mentioned on the top line). It may well be that some
> debugging configuration is absent
>
>> > Sep  4 17:42:35 theseus kernel: BUG: unable to handle kernel NULL pointer dereference at   (null)
>> > Sep  4 17:42:35 theseus kernel: IP: [<f564c0c0>] 0xf564c0bf
>> > Sep  4 17:42:35 theseus kernel: *pde = 00000000
>> > Sep  4 17:42:35 theseus kernel: Oops: 0002 [#2] SMP
>> > Sep  4 17:42:35 theseus kernel: Modules linked in: e1000
>> > Sep  4 17:42:35 theseus kernel:
>> > Sep  4 17:42:35 theseus kernel: Pid: 17180, comm: emerge Tainted: G      D     3.0.3-gentoo #6 System Manufacturer System Name/A7V8X-X
>> > Sep  4 17:42:35 theseus kernel: EIP: 0060:[<f564c0c0>] EFLAGS: 00210256 CPU: 0
>> > Sep  4 17:42:35 theseus kernel: EIP is at 0xf564c0c0
>> > Sep  4 17:42:35 theseus kernel: EAX: 00000000 EBX: 0000013e ECX: f564c390 EDX: 00200286
>>
>> This one strangely doesn't provide any useful info.
>
> Again, I guess it isn't emerge, although I can attest that emerge was running at the
> time.
>
>> > ...and then silence. The machine has crashed and required a reboot at this
>> > point. Looks to me like there is a problem with a "kernel paging request" in
>> > SMP #1, leading to the first oops at 15:44. Then, amazingly, it chugs along
>> > more or less well until 17:42, when there is a "kernel NULL pointer dereference"
>> > in SMP #2, which brings the whole thing down.
>> >
>> > I can't get a sense from that whether it's the modules linked in (namely e1000),
>> > or syslog-ng, or something else. Anyone?
>>
>> SMP simply means symetric multi procesing, which pretty much all systems
>> are these days.
>
> I know about SMP, but I don't know why these are referenced to apparently different
> sources, since the processor is single-core. Perhaps because it's running make -j2.
>
> Well, now I'm less certain than ever where the problem lies. There were a few notes
> on kernel oopses and e1000, but mostly back in 2.4 days (although one in 2.6.26 IIRC).
> Puzzling.
>
> --
> Peter King                              peter.king-H217xnMUJC0sA/PxXw9srA at public.gmane.org
> Department of Philosophy
> 170 St. George Street #521
> The University of Toronto                   (416)-978-4951 ofc
> Toronto, ON  M5R 2M8
>       CANADA
>
> http://individual.utoronto.ca/pking/
>
> =========================================================================
> GPG keyID 0x7587EC42 (2B14 A355 46BC 2A16 D0BC  36F5 1FE6 D32A 7587 EC42)
> gpg --keyserver pgp.mit.edu --recv-keys 7587EC42
>



-- 
Regards,

Jarl Stefansson
jarl.stefansson-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
+1-647-869-6908
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list