Help debugging kernel BUG

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Mon Nov 12 19:07:03 UTC 2007


On Fri, Nov 09, 2007 at 11:00:29AM -0500, Ian Petersen wrote:
> This morning I booted my laptop to a scrolling list of kernel bugs
> that I didn't have last night.  Most of the text zips by too fast to
> read, but, by limiting the boot to single-user, no network, I was able
> to eventually get to a login prompt.  The output from dmesg is the
> same bug report repeated many times.  I've never had to deal with an
> error like this before, so I'm hoping someone can give me some
> direction.
> 
> Here's the output in dmesg (I've typed this by hand, so I hope there
> are no typos):
> 
> kernel BUG at net/core/skbuff,c:95!
> invalid opcode: 0000 [#399]
> PREEMPT SMP
> Modules linked in: msr coretemp hwmon thermal fan button battery ac
> cpufreq_ondemand acpi_cpufreq freq_table processor configs sdhci
> mmc_core sg ehci_hcd rtc soundcore uhci_hcd tg3 usbcore agpgart sr_mod
> cdrom psmouse evdev unix
> CPU:    1
> EIP:    0060:[<c0250fb3>]    Not tainted VLI
> EFLAGS: 00010296   (2.6.22-gentoo-r8 #3)
> EIP is at skb_over_panic+0x59/0x5d
> eax: 00000077   ebx: f6cd0ca8   ecx: c037a600   edx: 000c0b05
> esi: 00000000   edi: 000000a8   ebp: f7b7d980   esp: f7579d64
> ds: 007b   es: 007b   fs: 00d8   gs: 0033   ss: 0068
> Process udevd (pid: 6374, ti=f7578000 task=f798f580 task.ti=f7578000)
> Stack: c02fdaad f883cb9d 000000a8 000000a8 f6cd0ca8 f6cd0c00 f6cd0c00 f6cd0cc0
>        c02ecc7e 00000000 f6cd0c00 f883cba2 f7579dd0 00000000 f7579f44 f7579f60
>        f7579e74 f7710840 f7579ec4 00000022 01cb2f49 000018e6 00000000 00000000
> Call Trace:
>  [<f883cb9d>] unix_dgram_sendmsg+0x1b9/0x42e [unix]
>  [<f883cba2>] unix_dgram_sendmsg+0x1be/0x42e [unix]
>  [<c024d3a9>] sock_sendmsg+0xbc/0xd4
>  [<c0127324>] autoremove_wake_function+0x0/0x35
>  [<c015f758>] dput+0x16/0xe4
>  [<c0157c4c>] __follow_mount+0x1e/0x60
>  [<c0157cdd>] do_lookup+0x4f/0x140
>  [<c0159810>] __link_path_walk+0xa81/0xb5e
>  [<c024d6d2>] sys_sendto+0x118/0x138
>  [<c0142eb4>] __handle_mm_fault+0x35f/0x85b
>  [<c017350e>] inotify_d_instantiate+0x44/0x72
>  [<c016049a>] d_instantiate+0x3f/0x4c
>  [<c024e4c4>] sys_socketcall+0x15e/0x242
>  [<c0102576>] sysenter_past_esp+0x5f/0x85
>  [<c0290000>] fib_get_first+0x3c/0xbb
>  =======================
> Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 54 24 0c 89 5c 24 10 8b
> 40 54 89 4c 24 04 c7 04 24 ad da 24 c0 89 44 24 08 e8 a0 75 ec ff <0f>
> 0b eb fe 56 89 d6 53 89 c3 83 ec 0c 8b 50 54 39 c2 76 04 0f
> EIP: [<c0250fb3>] skb_over_panic+0x59/0x5d SS:ESP 0068:f7579d64
> 
> The previous kernel bug is the same, but with different register
> values, a lower PID, and on the other CPU (it's an Intel Core 2 Duo,
> so I have two cores).
> 
> Some of the bugs (I have numbers 385-399 in dmesg) end with a line like this:
> 
> skb_over_panic: text:f883cb9d len:235 put:235 head:f747d000
> tail:0xf747d000 end:0xf747d100 dev:<NULL>
> 
> The lines like that always have the same text and dev, but the other
> values vary.
> 
> I tried to google the error message, but most of the results seem
> unrelated.  The only one that I thought might be relevant was this:
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-08/msg10793.html
> , but I didn't learn anything by reading through it.
> 
> If I boot the machine and press F12, I'm given the option to run a
> diagnostic.  I tried that, thinking I might have some kind of hardware
> problem, but all the tests pass (it checked the RAM, the video, the
> drives, etc.).  The machine is a Dell Precision M90 and it's about a
> year old.  I'm completely clueless about what could be the source of
> the error (besides that it's related to udev), so I'm not sure what
> information to include this email.  Please ask me for anything that
> might be relevant.

I would not expect any diagnostic tool to find every fault.  A memory
test that fails is a good indication that you have bad memory.  A memory
test that passes just means it didn't find any errors, not that there
aren't any problems with memory.

Now if the error is consistently the same every time, it certainly
starts to sound like it is either a bug under a specific condition, or a
specific piece of hardware that is involved in that situation is broken.

What type of hardware is in the system?  What is it doing?

It seems to be calling inotify (which tells aplications when a file
changes), after which it tries to send a udp datagram.  The process
involved appears to be udevd, so who knows what it was doing at the
time.

--
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list