/bugzilla3/
Bugzilla – Bug 542
qemu-dm segfault with multiple HVM domains
Last modified: 2008-04-01 08:22:37 CDT
Running changeset 8932 (and earlier) of xen-unstable, on a large VT machine (4s, 2c, HT), I get qemu-dm segfaults when I try to lunch more than one windows domain at a time. Turning on core dumps, the core files created by qemu-dm appear to be garbage, and the segfault message on the dom0 console indicates hey tried to jump to a null code page: qemu-dm[4961]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000040800198 error 14 qemu-dm[4963]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000040800198 error 14 rsp seems to be the same each time. The easiest way to reproduce is set up a few (i have 4) windows domains (I'm using windows 2003), and create them in rapid succession. Within a few minutes, almost all the qemu-dm's will have segfaulted except for one. Launching one domain does not seem to trigger this problem. Marking as critical because you can't run multiple windows domains on the same machine, please change if that's too aggressive.
Any progress here. Saw this bug: qemu-dm[6841]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000041001198 error 14 too - but i got only one win2k3 domain running - but a little bit I/O load i guess caused this. Any progress at this front? Torsten
I wonder what the progress on this one is as well. I can confirm that this also happens during a windows installation (windows2000 server) on both xen-3.0.3 as xen-3.0.4-1 qemu-dm[24934]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000041001198 error 14 I was running an apt-get update on the dom0 when this happened. Anything I can do to make it work ok ?
Same problem here, when trying to install windows from CD. When it starts copying files from the CD after formating, then I get the segfault: qemu-dm[15506]: segfault at 0000000000000000 rip 0000000000000000 rsp 00000000410010f8 error 14
I had a similar problem too: Aug 3 11:09:23 [kernel] qemu-dm[18802]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000041000c08 error 14 This happens, while dom0 made a ntfsclone from a lvm-snapshot. I'm running xen-3.1 on amd64 and windows 2003 server (32bit).
I got a backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1090525504 (LWP 7085)] 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x000000000042b2e3 in dma_thread_func (opaque=<value optimized out>) at /var/tmp/portage/app-emulation/xen-tools-3.1.0/work/xen-3.1.0-src/tools/ioemu/hw/ide.c:2402 #2 0x00002b044f6e1135 in start_thread () from /lib/libpthread.so.0 #3 0x00002b044fd1562d in clone () from /lib/libc.so.6 #4 0x0000000000000000 in ?? () i hope this helps. i don't know if this is the same problem as the initial bug.
This is related to large disk IO in Dom0 This bug is still present in 3.1.3-rc2 After big Dom0 disk IO, find this: Jan 30 10:25:02 vps2a kernel: qemu-dm[4564]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000041000c18 error 14 Jan 30 10:25:02 vps2a kernel: qemu-dm[4295]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000000041000c18 error 14 This is an unbelievably critical bug.
Xen 3.2 doesn't use a DMA progression thread and just relies on POSIX aio, could you try to reproduce this bug?
Xen 3.2 doesn't use a DMA progression thread any more and just relies on POSIX aio, could you try to reproduce this bug?
(In reply to comment #8) > Xen 3.2 doesn't use a DMA progression thread any more and just relies on POSIX > aio, could you try to reproduce this bug? > (In reply to comment #8) > Xen 3.2 doesn't use a DMA progression thread any more and just relies on POSIX > aio, could you try to reproduce this bug? > Are you referring to the use of tap:aio via blktapctrl or qemu-dm internally using aio in 3.2? Soubir
No, I am referring to the fact that the source lines of the gdb backtrace do not even exist any more :)