/bugzilla3/
Bugzilla – Bug 560
[Xen-HVM] 32bit dom0 crash during xm-test
Last modified: 2006-03-24 05:29:41 CST
HW: x460 OS: Sles9 sp2 32bit Test Tool: xm-test Description: 32bit dom0 will crash during xm-test run. Sometimes it will crash within 1 minute, and sometimes later. This problem is easy to reproduce on my system. Here is the serial console output: (XEN) HVM_PIT: guest freq in cycles=3002231 (XEN) ----[ Xen-3.0.0 Not tainted ]---- (XEN) CPU: 15 (XEN) EIP: e008:[<ff13c5a8>] hlt_timer_fn+0x28/0xd0 (XEN) EFLAGS: 00010286 CONTEXT: hypervisor (XEN) eax: ffd7f000 ebx: ffba1080 ecx: ffbb2000 edx: ffd7f000 (XEN) esi: 00000000 edi: ffba1080 ebp: 00000780 esp: ff1eff28 (XEN) cr0: 8005003b cr3: 00182000 (XEN) ds: e010 es: e010 fs: e010 gs: e010 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ff1eff28: (XEN) ff13c3d0 ffbdaa00 ff13c580 ff1121e8 ffba1080 00000403 ff13d447 00000001 (XEN) 2f2075a1 00000404 ffba1080 ff189790 00000780 00000003 00000003 ff12cfa0 (XEN) f3944e77 00000403 00000000 ffbdaa00 ffba1d0c 0000000f 00000780 00000780 (XEN) 0000000f ff1f0080 ffba1080 ff111772 00ef0000 ff11e2c9 00000780 ff11e2e5 (XEN) c0443120 00000780 ff1effb4 c0413000 c109e000 c010101a 0009fb00 c0443120 (XEN) 004b8007 00000000 bfd40004 c025deb2 00ef0060 00000000 c1103e70 00000202 (XEN) 0000007b 0000007b 00000000 00000000 0000000f ff1f0080 (XEN) Xen call trace: (XEN) [<ff13c5a8>] hlt_timer_fn+0x28/0xd0 (XEN) [<ff13c3d0>] pit_timer_fn+0x0/0xf0 (XEN) [<ff13c580>] hlt_timer_fn+0x0/0xd0 (XEN) [<ff1121e8>] timer_softirq_action+0x228/0x390 (XEN) [<ff13d447>] hvm_safe_block+0x47/0xd0 (XEN) [<ff12cfa0>] smp_invalidate_interrupt+0x10/0x50 (XEN) [<ff111772>] do_softirq+0x32/0x50 (XEN) [<ff11e2c9>] idle_loop+0x89/0xb0 (XEN) [<ff11e2e5>] idle_loop+0xa5/0xb0 (XEN) (XEN) Pagetable walk from ffd7f034: (XEN) L2 = 001ce063 55555555 (XEN) L1 = 00000000 55555555 (XEN) (XEN) **************************************** (XEN) Panic on CPU 15: (XEN) CPU15 FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffd7f034 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds...
Just one question. Is the domain 0 UP or SMP? And the vmx domain run on a seperated logic processor or shared with other domain?
dom0 is running as UP. And I am taking the default hvm config file so vmx domains share processors.
Rick, Could you please supply the usual set of information including: 1) xend.log 2) dmesg 3) qemu-dm.<PID>.log 4) xmtest.output file Thanks
I will provide that info.
I accidently losed it. Problem remains. I verified it in cs9226. I'm getting the files to attach them here.
Created attachment 337 [details] xm-test output file
Created attachment 338 [details] faulty qemu file
Created attachment 339 [details] xend log
Created attachment 340 [details] xend-debug log
The files attached are from the most recent run: Details: changeset: 9251:96ba0a2bc9de tag: tip user: kaf24@firebug.cl.cam.ac.uk date: Wed Mar 15 06:35:43 2006 +0100 summary: Remove unnecessary cr4 handling in vmx_set_cr0. xen dmesg(system crashed with the following message): (XEN) (GUEST: 59) (XEN) (GUEST: 59) ata0-0: PCHS=64/8/32 translation=none LCHS=64/8/32 (XEN) (GUEST: 59) ata0 master: QEMU HARDDISK ATA-2 Hard-Disk (8 MBytes) (XEN) (GUEST: 59) ata0 slave: Unknown device (XEN) (GUEST: 59) ata1 master: QEMU CD-ROM ATAPI-4 CD-Rom/DVD-Rom (XEN) (GUEST: 59) ata1 slave: Unknown device (XEN) (GUEST: 59) (XEN) (GUEST: 59) Booting from Hard Disk... (XEN) (GUEST: 59) int13_harddisk: function 15, unmapped device for ELDL=81 (XEN) (GUEST: 59) KBD: unsupported int 16h function 03 (XEN) (GUEST: 59) int13_harddisk: function 15, unmapped device for ELDL=81 (XEN) (GUEST: 59) *** int 15h function AX=E980, BX=E6F5 not yet supported! (XEN) (GUEST: 59) *** int 15h function AX=5300, BX=0000 not yet supported! (XEN) (GUEST: 59) int13_harddisk: function 02, unmapped device for ELDL=81 (XEN) (GUEST: 59) int13_harddisk: function 41, unmapped device for ELDL=81 (XEN) HVM_PIT: guest freq in cycles=3002257 (XEN) ----[ Xen-3.0-unstable Not tainted ]---- (XEN) CPU: 13 (XEN) EIP: e008:[<ff1317ae>] hlt_timer_fn+0x1e/0x40 (XEN) EFLAGS: 00010286 CONTEXT: hypervisor (XEN) eax: fff3b000 ebx: ffb9c080 ecx: ffb8e080 edx: fff3b000 (XEN) esi: ffb8ed10 edi: ff1eb900 ebp: 00000680 esp: ffbc7f48 (XEN) cr0: 8005003b cr3: 00172000 (XEN) ds: e010 es: e010 fs: e010 gs: e010 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ffbc7f48: (XEN) ffb8ed10 ff1eb900 00000271 ff1107f2 ffb8e080 00000000 0000000d ff123b13 (XEN) ffb8e080 00000680 e28f337c 00000271 ffbc7f7c 0000000d 00000680 00000680 (XEN) 0000000d ffbc8080 ffbc8080 ff10fd92 00ef0000 ff118522 00000680 ff118575 (XEN) c0443120 00000680 ffbc7fb4 c0413000 c1069aa0 c010101a 0009fb00 c0443120 (XEN) 004b8007 00000000 c1720003 01681000 00ef0000 00000000 0000e008 00000202 (XEN) 0000007b 0000007b 00000000 00000000 0000000d ffbc8080 (XEN) Xen call trace: (XEN) [<ff1317ae>] hlt_timer_fn+0x1e/0x40 (XEN) [<ff1107f2>] timer_softirq_action+0xb2/0x150 (XEN) [<ff123b13>] smp_call_function_interrupt+0x23/0x50 (XEN) [<ff10fd92>] do_softirq+0x32/0x50 (XEN) [<ff118522>] idle_loop+0x52/0xb0 (XEN) [<ff118575>] idle_loop+0xa5/0xb0 (XEN) (XEN) Pagetable walk from fff3b034: (XEN) L2 = 001be063 55555555 (XEN) L1 = 00000000 ffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 13: (XEN) CPU13 FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: fff3b034 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds...
I just ran on cs9325 and I get the following: xenbr0: port 2(peth0) entering forwarding state Eeek! page_mapcount(page) went negative! (-1) page->flags = 14 page->count = 0 page->mapping = 00000000 ------------[ cut here ]------------ kernel BUG at mm/rmap.c:560! invalid opcode: 0000 [#1] Modules linked in: thermal processor fan button battery ac sworks_agp agpgart CPU: 0 EIP: 0061:[<c014cc97>] Not tainted VLI EFLAGS: 00010286 (2.6.16-rc6-xen0 #1) EIP is at page_remove_rmap+0x97/0xb0 eax: ffffffff ebx: c137c600 ecx: 0000001b edx: fbfa9000 esi: 00000000 edi: 406fa000 ebp: dacd5d60 esp: dacd5d54 ds: 007b es: 007b ss: 0069 Process python (pid: 5081, threadinfo=dacd4000 task=cdd4c030) I am attaching a file containing more info as well as xend.log for today.
Created attachment 345 [details] dmesg and xend-debug.log output for 03/20/06
Created attachment 346 [details] 03/20/06 xend.log file
Here is a list of the processes state after I get the dump: ps -ef: root 11754 10826 0 01:27 ? 00:00:00 /opt/kde3/bin/kdm_greet root 31020 11083 0 09:35 ? 00:00:00 in.telnetd: kaliman.austin.ibm.c root 31024 31020 0 09:35 ? 00:00:00 login -- root root 31031 11151 0 09:35 ttyS0 00:00:00 -bash root 31059 31024 0 09:35 pts/0 00:00:00 -bash root 31294 1 0 09:40 ? 00:00:01 xenstored --pid-file=/var/run/xe root 31297 1 0 09:40 ? 00:00:00 python /usr/sbin/xend start root 31298 31297 0 09:40 ? 00:00:00 python /usr/sbin/xend start root 31300 1 0 09:40 ? 00:00:00 xenconsoled root 31301 31300 0 09:40 ? 00:00:00 xenconsoled root 31302 31298 0 09:40 ? 00:00:00 python /usr/sbin/xend start root 31303 31301 0 09:40 ? 00:00:00 xenconsoled root 31304 31302 0 09:40 ? 00:00:00 python /usr/sbin/xend start root 31305 31302 0 09:40 ? 00:00:00 python /usr/sbin/xend start root 31306 31302 0 09:40 ? 00:00:00 python /usr/sbin/xend start root 31307 31302 0 09:40 ? 00:00:00 python /usr/sbin/xend start root 31524 31302 0 09:40 ? 00:00:02 python /usr/sbin/xend start root 31637 31059 0 09:42 pts/0 00:00:00 screen root 31638 31637 0 09:42 ? 00:00:00 SCREEN root 31639 31638 0 09:42 pts/1 00:00:00 /bin/bash root 31704 31639 0 09:44 pts/1 00:00:00 /bin/sh ./runtest.sh -d /tmp/xm- root 32148 31704 0 09:44 pts/1 00:00:00 /bin/sh ./runtest.sh -d /tmp/xm- root 32149 32148 0 09:44 pts/1 00:00:00 make -k check root 32150 32149 0 09:44 pts/1 00:00:00 /bin/sh -c set fnord $MAKEFLAGS; root 32154 32150 0 09:44 pts/1 00:00:00 make check-am root 32155 32154 0 09:44 pts/1 00:00:00 make check-TESTS root 32156 32155 0 09:44 pts/1 00:00:00 /bin/sh -c failed=0; all=0; xfai root 32167 31524 0 09:44 ? 00:00:01 [qemu-dm] <defunct> root 32378 31524 0 09:45 ? 00:00:00 [qemu-dm] <defunct> root 32556 31524 0 09:46 ? 00:00:01 [qemu-dm] <defunct> root 32726 31524 0 09:46 ? 00:00:01 [qemu-dm] <defunct> root 425 31524 0 09:46 ? 00:00:01 [qemu-dm] <defunct> root 543 32156 0 09:47 pts/1 00:00:00 /usr/bin/python ./10_create_fast root 588 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 693 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 851 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 988 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 1260 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 1960 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 2481 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 2644 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 2781 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 3597 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 4179 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 4570 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 4716 31524 0 09:47 ? 00:00:00 [qemu-dm] <defunct> root 5003 31524 0 09:47 ? 00:00:01 /usr/lib/xen/bin/qemu-dm -d 41 - root 5029 5003 0 09:47 ? 00:00:00 /usr/lib/xen/bin/qemu-dm -d 41 - root 5030 5029 0 09:47 ? 00:00:00 /usr/lib/xen/bin/qemu-dm -d 41 - root 5069 1 0 09:47 ? 00:00:00 [loop0] root 5076 6 0 09:47 ? 00:00:00 [xvd 41 07:00] root 5080 543 0 09:47 pts/1 00:00:00 sh -c { xm destroy testdomain; } root 5081 5080 0 09:47 pts/1 00:00:00 [python] root 5907 31031 0 10:08 ttyS0 00:00:00 ps -ef system has not crashed but xm-test is not running anymore. I get more dumps eventually. I have to reboot the system to get it back.
I am closing this bug because I have not seen the CPU Fault problem in some time now.