/bugzilla3/ Bug 560 – [Xen-HVM] 32bit dom0 crash during xm-test
Bug 560 - [Xen-HVM] 32bit dom0 crash during xm-test
: [Xen-HVM] 32bit dom0 crash during xm-test
Status: CLOSED FIXED
Product: Xen
Hypervisor
: unstable
: x86 Linux-2.6
: P1 blocker
Assigned To: Xen Bug List
:
:
:
  Show dependency treegraph
 
Reported: 2006-03-07 20:33 CST by Rick Gonzalez
Modified: 2006-03-24 05:29 CST (History)
4 users (show)

See Also:


Attachments
xm-test output file (34.59 KB, text/plain)
2006-03-16 00:32 CST, Rick Gonzalez
Details
faulty qemu file (35.58 KB, text/plain)
2006-03-16 00:32 CST, Rick Gonzalez
Details
xend log (106.05 KB, text/plain)
2006-03-16 00:33 CST, Rick Gonzalez
Details
xend-debug log (2.39 KB, text/plain)
2006-03-16 00:33 CST, Rick Gonzalez
Details
dmesg and xend-debug.log output for 03/20/06 (4.17 KB, text/plain)
2006-03-20 16:07 CST, Rick Gonzalez
Details
03/20/06 xend.log file (600.23 KB, text/plain)
2006-03-20 16:08 CST, Rick Gonzalez
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rick Gonzalez 2006-03-07 20:33:35 CST
HW:
x460

OS:
Sles9 sp2 32bit

Test Tool:
xm-test

Description:

32bit dom0 will crash during xm-test run. Sometimes it will crash within 1
minute, and sometimes later. This problem is easy to reproduce on my system.

Here is the serial console output:

(XEN) HVM_PIT: guest freq in cycles=3002231
(XEN) ----[ Xen-3.0.0    Not tainted ]----
(XEN) CPU:    15
(XEN) EIP:    e008:[<ff13c5a8>] hlt_timer_fn+0x28/0xd0
(XEN) EFLAGS: 00010286   CONTEXT: hypervisor
(XEN) eax: ffd7f000   ebx: ffba1080   ecx: ffbb2000   edx: ffd7f000
(XEN) esi: 00000000   edi: ffba1080   ebp: 00000780   esp: ff1eff28
(XEN) cr0: 8005003b   cr3: 00182000
(XEN) ds: e010   es: e010   fs: e010   gs: e010   ss: e010   cs: e008
(XEN) Xen stack trace from esp=ff1eff28:
(XEN)    ff13c3d0 ffbdaa00 ff13c580 ff1121e8 ffba1080 00000403 ff13d447 00000001
(XEN)    2f2075a1 00000404 ffba1080 ff189790 00000780 00000003 00000003 ff12cfa0
(XEN)    f3944e77 00000403 00000000 ffbdaa00 ffba1d0c 0000000f 00000780 00000780
(XEN)    0000000f ff1f0080 ffba1080 ff111772 00ef0000 ff11e2c9 00000780 ff11e2e5
(XEN)    c0443120 00000780 ff1effb4 c0413000 c109e000 c010101a 0009fb00 c0443120
(XEN)    004b8007 00000000 bfd40004 c025deb2 00ef0060 00000000 c1103e70 00000202
(XEN)    0000007b 0000007b 00000000 00000000 0000000f ff1f0080
(XEN) Xen call trace:
(XEN)    [<ff13c5a8>] hlt_timer_fn+0x28/0xd0
(XEN)    [<ff13c3d0>] pit_timer_fn+0x0/0xf0
(XEN)    [<ff13c580>] hlt_timer_fn+0x0/0xd0
(XEN)    [<ff1121e8>] timer_softirq_action+0x228/0x390
(XEN)    [<ff13d447>] hvm_safe_block+0x47/0xd0
(XEN)    [<ff12cfa0>] smp_invalidate_interrupt+0x10/0x50
(XEN)    [<ff111772>] do_softirq+0x32/0x50
(XEN)    [<ff11e2c9>] idle_loop+0x89/0xb0
(XEN)    [<ff11e2e5>] idle_loop+0xa5/0xb0
(XEN)
(XEN) Pagetable walk from ffd7f034:
(XEN)   L2 = 001ce063 55555555
(XEN)    L1 = 00000000 55555555
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 15:
(XEN) CPU15 FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffd7f034
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
Comment 1 Yunhong Jiang 2006-03-08 02:50:08 CST
Just one question. Is the domain 0 UP or SMP? And the vmx domain run on a 
seperated logic processor or shared with other domain?
Comment 2 Rick Gonzalez 2006-03-08 07:04:23 CST
dom0 is running as UP. And I am taking the default hvm config file so vmx
domains share processors.
Comment 3 Daniel Stekloff 2006-03-10 19:14:20 CST
Rick,

Could you please supply the usual set of information including:

1) xend.log
2) dmesg
3) qemu-dm.<PID>.log
4) xmtest.output file

Thanks
Comment 4 Rick Gonzalez 2006-03-13 15:52:48 CST
I will provide that info. 
Comment 5 Rick Gonzalez 2006-03-14 22:49:37 CST
I accidently losed it. Problem remains. I verified it in cs9226. I'm getting
the
files to attach them here.
Comment 6 Rick Gonzalez 2006-03-16 00:32:16 CST
Created attachment 337 [details]
xm-test output file
Comment 7 Rick Gonzalez 2006-03-16 00:32:48 CST
Created attachment 338 [details]
faulty qemu file
Comment 8 Rick Gonzalez 2006-03-16 00:33:13 CST
Created attachment 339 [details]
xend log
Comment 9 Rick Gonzalez 2006-03-16 00:33:44 CST
Created attachment 340 [details]
xend-debug log
Comment 10 Rick Gonzalez 2006-03-16 00:39:08 CST
The files attached are from the most recent run:

Details:

changeset:   9251:96ba0a2bc9de
tag:         tip
user:        kaf24@firebug.cl.cam.ac.uk
date:        Wed Mar 15 06:35:43 2006 +0100
summary:     Remove unnecessary cr4 handling in vmx_set_cr0.


xen dmesg(system crashed with the following message): 

(XEN) (GUEST: 59)
(XEN) (GUEST: 59) ata0-0: PCHS=64/8/32 translation=none LCHS=64/8/32
(XEN) (GUEST: 59) ata0 master: QEMU HARDDISK ATA-2 Hard-Disk (8 MBytes)
(XEN) (GUEST: 59) ata0  slave: Unknown device
(XEN) (GUEST: 59) ata1 master: QEMU CD-ROM ATAPI-4 CD-Rom/DVD-Rom
(XEN) (GUEST: 59) ata1  slave: Unknown device
(XEN) (GUEST: 59)
(XEN) (GUEST: 59) Booting from Hard Disk...
(XEN) (GUEST: 59) int13_harddisk: function 15, unmapped device for ELDL=81
(XEN) (GUEST: 59) KBD: unsupported int 16h function 03
(XEN) (GUEST: 59) int13_harddisk: function 15, unmapped device for ELDL=81
(XEN) (GUEST: 59) *** int 15h function AX=E980, BX=E6F5 not yet supported!
(XEN) (GUEST: 59) *** int 15h function AX=5300, BX=0000 not yet supported!
(XEN) (GUEST: 59) int13_harddisk: function 02, unmapped device for ELDL=81
(XEN) (GUEST: 59) int13_harddisk: function 41, unmapped device for ELDL=81
(XEN) HVM_PIT: guest freq in cycles=3002257
(XEN) ----[ Xen-3.0-unstable    Not tainted ]----
(XEN) CPU:    13
(XEN) EIP:    e008:[<ff1317ae>] hlt_timer_fn+0x1e/0x40
(XEN) EFLAGS: 00010286   CONTEXT: hypervisor
(XEN) eax: fff3b000   ebx: ffb9c080   ecx: ffb8e080   edx: fff3b000
(XEN) esi: ffb8ed10   edi: ff1eb900   ebp: 00000680   esp: ffbc7f48
(XEN) cr0: 8005003b   cr3: 00172000
(XEN) ds: e010   es: e010   fs: e010   gs: e010   ss: e010   cs: e008
(XEN) Xen stack trace from esp=ffbc7f48:
(XEN)    ffb8ed10 ff1eb900 00000271 ff1107f2 ffb8e080 00000000 0000000d ff123b13
(XEN)    ffb8e080 00000680 e28f337c 00000271 ffbc7f7c 0000000d 00000680 00000680
(XEN)    0000000d ffbc8080 ffbc8080 ff10fd92 00ef0000 ff118522 00000680 ff118575
(XEN)    c0443120 00000680 ffbc7fb4 c0413000 c1069aa0 c010101a 0009fb00 c0443120
(XEN)    004b8007 00000000 c1720003 01681000 00ef0000 00000000 0000e008 00000202
(XEN)    0000007b 0000007b 00000000 00000000 0000000d ffbc8080
(XEN) Xen call trace:
(XEN)    [<ff1317ae>] hlt_timer_fn+0x1e/0x40
(XEN)    [<ff1107f2>] timer_softirq_action+0xb2/0x150
(XEN)    [<ff123b13>] smp_call_function_interrupt+0x23/0x50
(XEN)    [<ff10fd92>] do_softirq+0x32/0x50
(XEN)    [<ff118522>] idle_loop+0x52/0xb0
(XEN)    [<ff118575>] idle_loop+0xa5/0xb0
(XEN)
(XEN) Pagetable walk from fff3b034:
(XEN)   L2 = 001be063 55555555
(XEN)    L1 = 00000000 ffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 13:
(XEN) CPU13 FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: fff3b034
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
Comment 11 Rick Gonzalez 2006-03-20 16:07:12 CST
I just ran on cs9325 and I get the following: 

xenbr0: port 2(peth0) entering forwarding state
Eeek! page_mapcount(page) went negative! (-1)
  page->flags = 14
  page->count = 0
  page->mapping = 00000000
------------[ cut here ]------------
kernel BUG at mm/rmap.c:560!
invalid opcode: 0000 [#1]
Modules linked in: thermal processor fan button battery ac sworks_agp agpgart
CPU:    0
EIP:    0061:[<c014cc97>]    Not tainted VLI
EFLAGS: 00010286   (2.6.16-rc6-xen0 #1)
EIP is at page_remove_rmap+0x97/0xb0
eax: ffffffff   ebx: c137c600   ecx: 0000001b   edx: fbfa9000
esi: 00000000   edi: 406fa000   ebp: dacd5d60   esp: dacd5d54
ds: 007b   es: 007b   ss: 0069
Process python (pid: 5081, threadinfo=dacd4000 task=cdd4c030)


I am attaching a file containing more info as well as xend.log for today.
Comment 12 Rick Gonzalez 2006-03-20 16:07:59 CST
Created attachment 345 [details]
dmesg and xend-debug.log output for 03/20/06
Comment 13 Rick Gonzalez 2006-03-20 16:08:46 CST
Created attachment 346 [details]
03/20/06 xend.log file
Comment 14 Rick Gonzalez 2006-03-20 16:11:46 CST
Here is a list of the processes state after I get the dump:

ps -ef:

root     11754 10826  0 01:27 ?        00:00:00 /opt/kde3/bin/kdm_greet
root     31020 11083  0 09:35 ?        00:00:00 in.telnetd: kaliman.austin.ibm.c
root     31024 31020  0 09:35 ?        00:00:00 login -- root                  
root     31031 11151  0 09:35 ttyS0    00:00:00 -bash
root     31059 31024  0 09:35 pts/0    00:00:00 -bash
root     31294     1  0 09:40 ?        00:00:01 xenstored --pid-file=/var/run/xe
root     31297     1  0 09:40 ?        00:00:00 python /usr/sbin/xend start
root     31298 31297  0 09:40 ?        00:00:00 python /usr/sbin/xend start
root     31300     1  0 09:40 ?        00:00:00 xenconsoled
root     31301 31300  0 09:40 ?        00:00:00 xenconsoled
root     31302 31298  0 09:40 ?        00:00:00 python /usr/sbin/xend start
root     31303 31301  0 09:40 ?        00:00:00 xenconsoled
root     31304 31302  0 09:40 ?        00:00:00 python /usr/sbin/xend start
root     31305 31302  0 09:40 ?        00:00:00 python /usr/sbin/xend start
root     31306 31302  0 09:40 ?        00:00:00 python /usr/sbin/xend start
root     31307 31302  0 09:40 ?        00:00:00 python /usr/sbin/xend start
root     31524 31302  0 09:40 ?        00:00:02 python /usr/sbin/xend start
root     31637 31059  0 09:42 pts/0    00:00:00 screen
root     31638 31637  0 09:42 ?        00:00:00 SCREEN
root     31639 31638  0 09:42 pts/1    00:00:00 /bin/bash
root     31704 31639  0 09:44 pts/1    00:00:00 /bin/sh ./runtest.sh -d /tmp/xm-
root     32148 31704  0 09:44 pts/1    00:00:00 /bin/sh ./runtest.sh -d /tmp/xm-
root     32149 32148  0 09:44 pts/1    00:00:00 make -k check
root     32150 32149  0 09:44 pts/1    00:00:00 /bin/sh -c set fnord $MAKEFLAGS;
root     32154 32150  0 09:44 pts/1    00:00:00 make check-am
root     32155 32154  0 09:44 pts/1    00:00:00 make check-TESTS
root     32156 32155  0 09:44 pts/1    00:00:00 /bin/sh -c failed=0; all=0; xfai
root     32167 31524  0 09:44 ?        00:00:01 [qemu-dm] <defunct>
root     32378 31524  0 09:45 ?        00:00:00 [qemu-dm] <defunct>
root     32556 31524  0 09:46 ?        00:00:01 [qemu-dm] <defunct>
root     32726 31524  0 09:46 ?        00:00:01 [qemu-dm] <defunct>
root       425 31524  0 09:46 ?        00:00:01 [qemu-dm] <defunct>
root       543 32156  0 09:47 pts/1    00:00:00 /usr/bin/python ./10_create_fast
root       588 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root       693 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root       851 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root       988 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      1260 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      1960 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      2481 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      2644 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      2781 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      3597 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      4179 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      4570 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      4716 31524  0 09:47 ?        00:00:00 [qemu-dm] <defunct>
root      5003 31524  0 09:47 ?        00:00:01 /usr/lib/xen/bin/qemu-dm -d 41 -
root      5029  5003  0 09:47 ?        00:00:00 /usr/lib/xen/bin/qemu-dm -d 41 -
root      5030  5029  0 09:47 ?        00:00:00 /usr/lib/xen/bin/qemu-dm -d 41 -
root      5069     1  0 09:47 ?        00:00:00 [loop0]
root      5076     6  0 09:47 ?        00:00:00 [xvd 41 07:00]
root      5080   543  0 09:47 pts/1    00:00:00 sh -c { xm destroy testdomain; }
root      5081  5080  0 09:47 pts/1    00:00:00 [python]
root      5907 31031  0 10:08 ttyS0    00:00:00 ps -ef

system has not crashed but xm-test is not running anymore. I get more dumps
eventually. I have to reboot the system to get it back.
Comment 15 Rick Gonzalez 2006-03-24 05:29:11 CST
I am closing this bug because I have not seen the CPU Fault problem in some
time
now.