/bugzilla3/ Bug 542 – qemu-dm segfault with multiple HVM domains
Bug 542 - qemu-dm segfault with multiple HVM domains
: qemu-dm segfault with multiple HVM domains
Status: NEW
Product: Xen
HVM
: unstable
: x86-64 Linux
: P2 critical
Assigned To: Xen Bug List
:
:
:
  Show dependency treegraph
 
Reported: 2006-02-22 22:07 CST by John Clemens
Modified: 2008-04-01 08:22 CDT (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Clemens 2006-02-22 22:07:17 CST
Running changeset 8932 (and earlier) of xen-unstable, on a large VT machine
(4s,
2c, HT), I get qemu-dm segfaults when I try to lunch more than one windows
domain at a time. 

Turning on core dumps, the core files created by qemu-dm appear to be garbage,
and the segfault message on the dom0 console indicates hey tried to jump to a
null code page:

qemu-dm[4961]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000040800198 error 14
qemu-dm[4963]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000040800198 error 14

rsp seems to be the same each time.  The easiest way to reproduce is set up a
few (i have 4) windows domains (I'm using windows 2003), and create them in
rapid succession.  Within a few minutes, almost all the qemu-dm's will have
segfaulted except for one.  Launching one domain does not seem to trigger this
problem. 

Marking as critical because you can't run multiple windows domains on the same
machine, please change if that's too aggressive.
Comment 1 Torsten Krah 2007-01-09 06:27:44 CST
Any progress here.

Saw this bug:

qemu-dm[6841]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000041001198 error 14

too - but i got only one win2k3 domain running - but a little bit I/O load i
guess caused this.

Any progress at this front?

Torsten
Comment 2 Michiel van Baak 2007-01-26 02:57:39 CST
I wonder what the progress on this one is as well.

I can confirm that this also happens during a windows installation (windows2000
server) on both xen-3.0.3 as xen-3.0.4-1

qemu-dm[24934]: segfault at 0000000000000000 rip 0000000000000000 rsp
0000000041001198 error 14

I was running an apt-get update on the dom0 when this happened.

Anything I can do to make it work ok ?
Comment 3 mario 2007-03-28 04:34:54 CDT
Same problem here, when trying to install windows from CD.

When it starts copying files from the CD after formating, then I get the
segfault:

qemu-dm[15506]: segfault at 0000000000000000 rip 0000000000000000 rsp
00000000410010f8 error 14
Comment 4 spam 2007-08-03 02:56:14 CDT
I had a similar problem too:

Aug  3 11:09:23 [kernel] qemu-dm[18802]: segfault at 0000000000000000 rip
0000000000000000 rsp 0000000041000c08 error 14

This happens, while dom0 made a ntfsclone from a lvm-snapshot.

I'm running xen-3.1 on amd64 and windows 2003 server (32bit).
Comment 5 spam 2007-08-06 20:06:36 CDT
I got a backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1090525504 (LWP 7085)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x000000000042b2e3 in dma_thread_func (opaque=<value optimized out>)
    at
/var/tmp/portage/app-emulation/xen-tools-3.1.0/work/xen-3.1.0-src/tools/ioemu/hw/ide.c:2402
#2  0x00002b044f6e1135 in start_thread () from /lib/libpthread.so.0
#3  0x00002b044fd1562d in clone () from /lib/libc.so.6
#4  0x0000000000000000 in ?? ()

i hope this helps. i don't know if this is the same problem as the initial bug.
Comment 6 Randy McAnally 2008-01-30 07:38:13 CST
This is related to large disk IO in Dom0

This bug is still present in 3.1.3-rc2

After big Dom0 disk IO, find this:

Jan 30 10:25:02 vps2a kernel: qemu-dm[4564]: segfault at 0000000000000000 rip
0000000000000000 rsp 0000000041000c18 error 14
Jan 30 10:25:02 vps2a kernel: qemu-dm[4295]: segfault at 0000000000000000 rip
0000000000000000 rsp 0000000041000c18 error 14

This is an unbelievably critical bug.
Comment 7 Samuel Thibault 2008-03-21 10:38:14 CDT
Xen 3.2 doesn't use a DMA progression thread and just relies on POSIX aio,
could you try to reproduce this bug?
Comment 8 Samuel Thibault 2008-03-21 10:38:33 CDT
Xen 3.2 doesn't use a DMA progression thread any more and just relies on POSIX
aio, could you try to reproduce this bug?
Comment 9 Soubir Acharya 2008-03-25 21:17:25 CDT
(In reply to comment #8)
> Xen 3.2 doesn't use a DMA progression thread any more and just relies on POSIX
> aio, could you try to reproduce this bug?
> 

(In reply to comment #8)
> Xen 3.2 doesn't use a DMA progression thread any more and just relies on POSIX
> aio, could you try to reproduce this bug?
> 

Are you referring to the use of tap:aio via blktapctrl or qemu-dm internally
using aio in 3.2?

Soubir
Comment 10 Samuel Thibault 2008-03-26 03:53:05 CDT
No, I am referring to the fact that the source lines of the gdb backtrace do
not even exist any more :)