Kernel panic reading - Can you tell what triggered it?

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Sun Feb 12 17:44:51 UTC 2012


On Sat, Feb 11, 2012 at 06:07:18PM -0500, William Muriithi wrote:
> Hello,
> 
> I am sharing this as I am not sure what triggered it and therefore
> have not figured how to fix it.  This is the third time the system has
> crashed and the system is just just running vertica.  Vertica claim
> they do not have any kernel level module - I feel like they are not
> truthful but have no evidence to prove them wrong.  If they are
> telling the truth, then RHEL6 does have a bug would like to post on
> RedHat bugzilla
> 
> The most interesting part:-
> 
> BUG: Bad page state in process vertica  pfn:6db6d9adbf6d47af
> page:ffff8801dfe7ae48 flags:000000000000001f count:36561488
> mapcount:-5631 mapping:ffff8801dfe7ae60 index:ffff8801dfe7ae60
> (Tainted: G        W  ----------------  )
> Pid: 682, comm: vertica Tainted: G        W  ----------------
> 2.6.32-220.el6.x86_64 #1
> Call Trace:
>  [<ffffffff811212f7>] ? bad_page+0x107/0x160
> 
> 
> I read this and concluded that vertica is messing around with the
> kernel.  Vertica pushed back and said its a xen problem, something I
> am finding hard to figure out how they arrived to that decision.  Am I
> wrong to assume they are on drugs?  What else can taint a kernel other
> than a kernel module?

An application can not mess with the kernel.  So it isn't vertica
doing it.

> The odd part though is "cat /proc/sys/kernel/tainted" return zero, yet
> the system think its tainted when going down.
> 
> Look like the iptables module also have issues, but I doubt RedHat
> would would look at it as the kernel claim to be tainted.
> 
> 
> 
> Full logs below.
> 
> 
> 
> Microcode Update Driver: v2.00 <tigran-ppwZ4lME3+KI6QP4U9MhSdBc4/FLrbF6 at public.gmane.org>, Peter Oruba
> NET: Registered protocol family 10
> lo: Disabled Privacy Extensions
> ip6_tables: (C) 2000-2006 Netfilter Core Team
> nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
> ip_tables: (C) 2000-2006 Netfilter Core Team
> 
> Red Hat Enterprise Linux Server release 6.2 (Santiago)
> Kernel 2.6.32-220.el6.x86_64 on an x86_64
> 
> dev.bigdatalabs login: ip_tables: (C) 2000-2006 Netfilter Core Team
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted)
> list_del corruption. prev->next should be ffffea00027543d0, but was
> ffffea0002724588
> Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
> ip_tables autofs4 ipt_REJECT ip6t_REJECT nf_conntrack_ipv6
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6
> microcode xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror
> dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
> Pid: 768, comm: vertica Not tainted 2.6.32-220.el6.x86_64 #1
> Call Trace:
>  [<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0
>  [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10
>  [<ffffffff81069c66>] ? warn_slowpath_fmt+0x46/0x50
>  [<ffffffffa00481b4>] ? do_get_write_access+0x3b4/0x520 [jbd2]
>  [<ffffffff8127b7ae>] ? list_del+0x6e/0xa0
>  [<ffffffff8112343c>] ? free_pcppages_bulk+0x15c/0x390
>  [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff81124378>] ? free_hot_cold_page+0x1b8/0x220
>  [<ffffffff81124422>] ? __pagevec_free+0x42/0x90
>  [<ffffffff81127afc>] ? release_pages+0x21c/0x250
>  [<ffffffff81127e77>] ? ____pagevec_lru_add+0x167/0x180
>  [<ffffffff81128026>] ? __pagevec_release+0x26/0x40
>  [<ffffffff8112884b>] ? truncate_inode_pages_range+0x1fb/0x460
>  [<ffffffff8113a5e2>] ? unmap_mapping_range+0x72/0x140
>  [<ffffffff81128ac5>] ? truncate_inode_pages+0x15/0x20
>  [<ffffffff81128b17>] ? truncate_pagecache+0x47/0x70
>  [<ffffffff81128b59>] ? truncate_setsize+0x19/0x20
>  [<ffffffff81128b9e>] ? vmtruncate+0x3e/0x70
>  [<ffffffff811922b0>] ? inode_setattr+0x30/0x60
>  [<ffffffffa007c82c>] ? ext4_setattr+0x10c/0x360 [ext4]
>  [<ffffffff81192698>] ? notify_change+0x168/0x340
>  [<ffffffff8118eab7>] ? __d_lookup+0xa7/0x150
>  [<ffffffff81174de4>] ? do_truncate+0x64/0xa0
>  [<ffffffff8120d52f>] ? security_inode_permission+0x1f/0x30
>  [<ffffffff811876e9>] ? do_filp_open+0x829/0xd60
>  [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10
>  [<ffffffff811935e2>] ? alloc_fd+0x92/0x160
>  [<ffffffff81173ba9>] ? do_sys_open+0x69/0x140
>  [<ffffffff81173cc0>] ? sys_open+0x20/0x30
>  [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
> ---[ end trace 2de2020846513d27 ]---
> BUG: Bad page state in process vertica  pfn:6db6d9adbf6d47af
> page:ffff8801dfe7ae48 flags:000000000000001f count:36561488
> mapcount:-5631 mapping:ffff8801dfe7ae60 index:ffff8801dfe7ae60
> (Tainted: G        W  ----------------  )
> Pid: 682, comm: vertica Tainted: G        W  ----------------
> 2.6.32-220.el6.x86_64 #1
> Call Trace:
>  [<ffffffff811212f7>] ? bad_page+0x107/0x160
>  [<ffffffff811226b4>] ? get_page_from_freelist+0x724/0x820
>  [<ffffffff810046b6>] ? xen_mc_flush+0x106/0x250
>  [<ffffffff811238a1>] ? __alloc_pages_nodemask+0x111/0x940
>  [<ffffffff81010b4e>] ? __copy_from_user_inatomic+0xe/0x20
>  [<ffffffff810a22bc>] ? get_futex_value_locked+0x2c/0x50
>  [<ffffffff810a32d1>] ? futex_wait_setup+0x121/0x140
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81158c7a>] ? alloc_pages_vma+0x9a/0x150
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff8113beeb>] ? handle_pte_fault+0x76b/0xb50
>  [<ffffffff810a2cce>] ? futex_wake+0x10e/0x120
>  [<ffffffff81004a49>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff8113c4b4>] ? handle_mm_fault+0x1e4/0x2b0
>  [<ffffffff810a4c10>] ? do_futex+0x100/0xb00
>  [<ffffffff81042b39>] ? __do_page_fault+0x139/0x480
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814ef38c>] ? _spin_unlock_irqrestore+0x1c/0x20
>  [<ffffffff81038448>] ? pvclock_clocksource_read+0x58/0xd0
>  [<ffffffff814f248e>] ? do_page_fault+0x3e/0xa0
>  [<ffffffff814ef845>] ? page_fault+0x25/0x30
> Disabling lock debugging due to kernel taint
> general protection fault: 0000 [#1] SMP
> last sysfs file: /sys/module/nf_conntrack/initstate
> CPU 0
> Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
> ip_tables autofs4 ipt_REJECT ip6t_REJECT nf_conntrack_ipv6
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6
> microcode xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror
> dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
> 
> Pid: 910, comm: vertica Tainted: G    B   W  ----------------
> 2.6.32-220.el6.x86_64 #1
> RIP: e030:[<ffffffff8127b74c>]  [<ffffffff8127b74c>] list_del+0xc/0xa0
> RSP: e02b:ffff88009f6b9a28  EFLAGS: 00010096
> RAX: 0000000000000200 RBX: dead000000100100 RCX: 0000000000012b20
> RDX: 0000000000000030 RSI: ffff8801dfe7ae70 RDI: dead000000100100
> RBP: ffff88009f6b9a38 R08: 0000000000000002 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801dfe7ae40
> R13: ffff8800000106c0 R14: 000000000000073a R15: dead0000001000d8
> FS:  00007f5c92a65700(0000) GS:ffff88002804f000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f5d19d4c500 CR3: 00000001d74ea000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process vertica (pid: 910, threadinfo ffff88009f6b8000, task ffff8801d9b9e040)
> Stack:
>  ffff88009f6b9a38 0000000000000001 ffff88009f6b9b58 ffffffff81122218
> <0> 0000000000000180 0000000000000180 ffffffff00000002 ffffffff810046b6
> <0> 000000000002be6e 00000040ffffffff 0000000000000000 ffff880000029b08
> Call Trace:
>  [<ffffffff81122218>] get_page_from_freelist+0x288/0x820
>  [<ffffffff810046b6>] ? xen_mc_flush+0x106/0x250
>  [<ffffffff811238a1>] __alloc_pages_nodemask+0x111/0x940
>  [<ffffffff81062375>] ? enqueue_entity+0x125/0x410
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:51 list_del+0x8d/0xa0() (Tainted: G    B
>  W  ----------------  )
> list_del corruption. next->prev should be ffffea00005faac0, but was
> ffffea000204b060
> Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
> ip_tables autofs4 ipt_REJECT ip6t_REJECT nf_conntrack_ipv6
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6
> microcode xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror
> dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
> Pid: 25265, comm: rs:main Q:Reg Tainted: G    B   W  ----------------
>  2.6.32-220.el6.x86_64 #1
> Call Trace:
>  [<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0
>  [<ffffffff81069c66>] ? warn_slowpath_fmt+0x46/0x50
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff8127b7cd>] ? list_del+0x8d/0xa0
>  [<ffffffff81120823>] ? __rmqueue+0xc3/0x490
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81122528>] ? get_page_from_freelist+0x598/0x820
>  [<ffffffff811238a1>] ? __alloc_pages_nodemask+0x111/0x940
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff8115fbf4>] ? kmem_cache_free+0xc4/0x2b0
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814ef38c>] ? _spin_unlock_irqrestore+0x1c/0x20
>  [<ffffffff81051813>] ? __wake_up+0x53/0x70
>  [<ffffffff81158b7a>] ? alloc_pages_current+0xaa/0x110
>  [<ffffffff81110e57>] ? __page_cache_alloc+0x87/0x90
>  [<ffffffff81126a7b>] ? __do_page_cache_readahead+0xdb/0x210
>  [<ffffffff81126bd1>] ? ra_submit+0x21/0x30
>  [<ffffffff81112123>] ? filemap_fault+0x4c3/0x500
>  [<ffffffff8113b2c4>] ? __do_fault+0x54/0x510
>  [<ffffffff81112d60>] ? __generic_file_aio_write+0x250/0x480
>  [<ffffffff8113b877>] ? handle_pte_fault+0xf7/0xb50
>  [<ffffffff8111304e>] ? generic_file_aio_write+0xbe/0xe0
>  [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81004a49>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff8113c4b4>] ? handle_mm_fault+0x1e4/0x2b0
>  [<ffffffff81042b39>] ? __do_page_fault+0x139/0x480
>  [<ffffffff811b50c9>] ? fsnotify_put_event+0x49/0x70
>  [<ffffffff81038448>] ? pvclock_clocksource_read+0x58/0xd0
>  [<ffffffff81007b21>] ? xen_clocksource_read+0x21/0x30
>  [<ffffffff81007c09>] ? xen_clocksource_get_cycles+0x9/0x10
>  [<ffffffff8109b470>] ? getnstimeofday+0x60/0xf0
>  [<ffffffff814f248e>] ? do_page_fault+0x3e/0xa0
>  [<ffffffff814ef845>] ? page_fault+0x25/0x30
> ---[ end trace 2de2020846513d28 ]---
>  [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff81158c7a>] alloc_pages_vma+0x9a/0x150
>  [<ffffffff8113beeb>] handle_pte_fault+0x76b/0xb50
>  [<ffffffff810a2490>] ? wake_futex+0x40/0x60
>  [<ffffffff810a4590>] ? futex_requeue+0x310/0x890
>  [<ffffffff810a2cce>] ? futex_wake+0x10e/0x120
>  [<ffffffff81004a49>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff8113c4b4>] handle_mm_fault+0x1e4/0x2b0
>  [<ffffffff810a4c10>] ? do_futex+0x100/0xb00
>  [<ffffffff81042b39>] __do_page_fault+0x139/0x480
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81090280>] ? register_posix_clock+0x50/0xa0
>  [<ffffffff81090280>] ? register_posix_clock+0x50/0xa0
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814ef38c>] ? _spin_unlock_irqrestore+0x1c/0x20
>  [<ffffffff8105ca46>] ? task_sched_runtime+0x46/0xa0
>  [<ffffffff81090280>] ? register_posix_clock+0x50/0xa0
>  [<ffffffff81092478>] ? sample_to_timespec+0x38/0x50
>  [<ffffffff81092c0c>] ? cpu_clock_sample+0x4c/0x70
>  [<ffffffff814f248e>] do_page_fault+0x3e/0xa0
>  [<ffffffff814ef845>] page_fault+0x25/0x30
> Code: 00 ff ff ff 89 95 fc fe ff ff e9 ab fd ff ff 4c 8b ad e8 fe ff
> ff e9 db fd ff ff 90 90 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 <48>
> 8b 47 08 4c 8b 00 4c 39 c7 75 39 48 8b 03 4c 8b 40 08 4c 39
> RIP  [<ffffffff8127b74c>] list_del+0xc/0xa0
>  RSP <ffff88009f6b9a28>
> ---[ end trace 2de2020846513d29 ]---
> Kernel panic - not syncing: Fatal exception
> Pid: 910, comm: vertica Tainted: G    B D W  ----------------
> 2.6.32-220.el6.x86_64 #1
> Call Trace:
>  [<ffffffff814ec341>] ? panic+0x78/0x143
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814ef38c>] ? _spin_unlock_irqrestore+0x1c/0x20
>  [<ffffffff814f04d4>] ? oops_end+0xe4/0x100
>  [<ffffffff8100f26b>] ? die+0x5b/0x90
>  [<ffffffff814f0042>] ? do_general_protection+0x152/0x160
>  [<ffffffff814ef815>] ? general_protection+0x25/0x30
>  [<ffffffff8127b74c>] ? list_del+0xc/0xa0
>  [<ffffffff81122218>] ? get_page_from_freelist+0x288/0x820
>  [<ffffffff810046b6>] ? xen_mc_flush+0x106/0x250
>  [<ffffffff811238a1>] ? __alloc_pages_nodemask+0x111/0x940
>  [<ffffffff81062375>] ? enqueue_entity+0x125/0x410
>  [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff81158c7a>] ? alloc_pages_vma+0x9a/0x150
>  [<ffffffff8113beeb>] ? handle_pte_fault+0x76b/0xb50
>  [<ffffffff810a2490>] ? wake_futex+0x40/0x60
>  [<ffffffff810a4590>] ? futex_requeue+0x310/0x890
>  [<ffffffff810a2cce>] ? futex_wake+0x10e/0x120
>  [<ffffffff81004a49>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff8113c4b4>] ? handle_mm_fault+0x1e4/0x2b0
>  [<ffffffff810a4c10>] ? do_futex+0x100/0xb00
>  [<ffffffff81042b39>] ? __do_page_fault+0x139/0x480
>  [<ffffffff81007ca2>] ? check_events+0x12/0x20
>  [<ffffffff81090280>] ? register_posix_clock+0x50/0xa0
>  [<ffffffff81090280>] ? register_posix_clock+0x50/0xa0
>  [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff814ef38c>] ? _spin_unlock_irqrestore+0x1c/0x20
>  [<ffffffff8105ca46>] ? task_sched_runtime+0x46/0xa0
>  [<ffffffff81090280>] ? register_posix_clock+0x50/0xa0
>  [<ffffffff81092478>] ? sample_to_timespec+0x38/0x50
>  [<ffffffff81092c0c>] ? cpu_clock_sample+0x4c/0x70
>  [<ffffffff814f248e>] ? do_page_fault+0x3e/0xa0
>  [<ffffffff814ef845>] ? page_fault+0x25/0x30
> Xen Minimal OS!
>   start_info: 0x1890000(VA)
>     nr_pages: 0x1e0000
>   shared_inf: 0xbf712000(MA)
>      pt_base: 0x1893000(VA)
> nr_pt_frames: 0x11
>     mfn_list: 0x990000(VA)
>    mod_start: 0x0(VA)
>      mod_len: 0
>        flags: 0x0
>     cmd_line: root=/dev/sda1 ro 4
>   stack:      0x94f860-0x96f860
> MM: Init
>       _text: 0x0(VA)
>      _etext: 0x5ff6d(VA)
>    _erodata: 0x78000(VA)
>      _edata: 0x80b00(VA)
> stack start: 0x94f860(VA)
>        _end: 0x98fe68(VA)
>   start_pfn: 18a7
>     max_pfn: 1e0000
> Mapping memory range 0x1c00000 - 0x1e0000000
> setting 0x0-0x78000 readonly
> skipped 0x1000
> MM: Initialise page allocator for 27a0000(27a0000)-1e0000000(1e0000000)
> MM: done
> Demand map pfns at 1e0001000-21e0001000.
> Heap resides at 21e0002000-41e0002000.
> Initialising timer interface
> Initialising console ... done.
> gnttab_table mapped at 0x1e0001000.
> Initialising scheduler
> Thread "Idle": pointer: 0x21e0002010, stack: 0x36f0000
> Initialising xenbus
> Thread "xenstore": pointer: 0x21e00027c0, stack: 0x3700000
> Dummy main: start_info=0x96f960
> Thread "main": pointer: 0x21e0002f70, stack: 0x3710000
> "main" "root=/dev/sda1" "ro" "4"
> vbd 2049 is hd0

Well I can see why xen might be suspected.  Need someone that knows
those parts of the kernel to make sense of that.  It certainly appears
something corrupts alist structure in the kernel and messes up other
things as a result, and that might even be what set the taint flag.

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list