/bugzilla3/ Bug 762 – xen_net: Memory squeeze in netback driver.
Bug 762 - xen_net: Memory squeeze in netback driver.
: xen_net: Memory squeeze in netback driver.
Status: RESOLVED FIXED
Product: Xen
Guest-OS
: 3.0.3
: x86 Linux
: P2 major
Assigned To: Xen Bug List
:
:
:
  Show dependency treegraph
 
Reported: 2006-09-11 03:34 CDT by Edouard Bourguignon
Modified: 2010-12-07 02:33 CST (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Edouard Bourguignon 2006-09-11 03:34:53 CDT
On xen-3.0.2-2, i've got a lot of "xen_net: Memory squeeze in netback driver."
in /var/log/messages but network seems to work on dom0 and domU.

is it normal?
Comment 1 Edouard Bourguignon 2006-09-11 03:36:53 CDT
sorry I forgot to mention that it occurs on a xen with PAE support
Comment 2 Edouard Bourguignon 2006-09-14 07:26:45 CDT
the machine froze with a lot of printk messages saying "network table overflow"

any idea?
Comment 3 Sebastian Malcolm 2006-11-08 01:36:40 CST
I'm using Debian packages of Xen (2.6.17-2 kernel) installed from the Etch
(testing) distribution on a Dell SC430 (dual core PentiumD with 2Gig Ram).
I have a minimal installation of Debian Etch (amd64) running as host Dom0.

To provide isolated networks for my DomU's across one dummy ethernet interface
and two physical ethernet cards (one addin 100mb/s card for Internet & the
onboard 1000mb/s Intel for private LAN), I configured Xend startup to create 3
Bridges using brctl -> 1 bridge for each interface (xenbrInet = ethInet,
xenbrDMZ = ethDMZ, xenbrPriv = ethPriv). I have then been able to startup
several DomUs with vif's in 1,2 or all 3 of these bridges.

I only see this error message after I add "one too many" DomUs (>3?) but I've
not done enough testing yet to be certain what quantity or combination of the
following quantities trigger this error message:
 * The total number of DomUs
 * The amount of Memory allocate per DomU
 (Eg: ~4x 256mb out of 2gig physical triggered the error, but does 8x 128mb?)
 * The total number (virtual) network interfaces
 * The number of virtual network interfaces per-domain

I have been able to reproduce this error message using either Debian package
"xen-hypervisor-3.0-unstable-1-amd64" or the hopefully stable+reliable package
version "xen-hypervisor-3.0.2-1-amd64" with the recommended 2.6.16 or 2.6.17
Xen kernel packages. The next scenario I think I must test is trying to
reproduce my current Xen Dom0 configuration with 4 or more DomU's  using the
"Demo CD Image" that I have downloaded from XenSource.com. If all the current
stable release binaries available are unable to run >3 DomU's on my hardware
without producing errors then I shall try checking out the latest source code
and compiling it to hopefully produce a working kernel and hypervisor.

Use the source browser on lxr.xensource.com to read thru the Xen source code to
see that the error is probably printed from within
"/linux-2.6-xen-sparse/drivers/xen/netback/netback.c".

The code from xen-3.0.3_0-src.tgz in netback.c:net_rx_action() is:

        if (!xen_feature(XENFEAT_auto_translated_physmap) &&
            check_mfn(nr_frags + 1)) {
            /* Memory squeeze? Back off for an arbitrary while. */
            if ( net_ratelimit() )
                WPRINTK("Memory squeeze in netback "
                    "driver.\n");
            mod_timer(&net_timer, jiffies + HZ);
            skb_queue_head(&rx_queue, skb);
            break;
        }

Researching this error message using Google and searching the mailing lists
suggests to me that this error message has been a problem for a long time
(perhaps since the 3.0 release?).

The must-read thread on xen-devel hinting at a solution to this error message
is dated "6 Jun 2006" from Anthony Liguori:
http://lists.xensource.com/archives/cgi-bin/mesg.cgi?a=xen-devel&i=4485F70E.4020602%40us.ibm.com
"Do this mean that the netback driver needs to be able to increase it's
reservation?"
Keir Fraser writes in reply: "The kernel should be able to do it for itself."

Relevant postings to the xen-users mailing list reporting the same or similar
problem are:

http://lists.xensource.com/archives/html/xen-users/2006-08/msg00947.html
30 Aug 2006: "Memory squeeze in netback driver" "...doesn't appear to affect
performance until you add one too many guest domains, then the whole lot drop
off the net. It's not hardware related as I can recreate the issue on 3
completely separate servers with dissimilar hardware."
[Increasing the quantity of DomUs certainly seems to be a common theme in all
reports of this bug. Does it occur only on 64bit (amd64 and em64t) systems
running 64bit enabled Kernels?]

http://lists.xensource.com/archives/html/xen-users/2005-11/msg00534.html
22 Nov 2005: "...Xen 3.0 release using the Fedora Core 4 Wiki instructions.."
[Distro used probably isn't a factor.]

http://lists.xensource.com/archives/html/xen-users/2006-02/msg00077.html
2 Feb 2006: "...if dom0_mem set to 196MB then I can start 20 domains using all 
available ram without any problems..."
"Before I rebooted with dom0_mem set to 196MB...
...I also saw:  "xen_net: Memory squeeze in netback driver." in dom0..."
"I have a solution now (using dom0_mem and 3.0.1)..."
[So it seems that the amount of RAM allocated to Dom0 is a factor?]

http://lists.xensource.com/archives/html/xen-users/2006-06/msg00524.html
13 Jun 2006: "> When I start more than about 3 domU's..."
"I'm seeing this error as soon I as start a second domU."
Follow up in the same thread, 14 Jun 2006:
"I had forgotten to copy the modules for the xen linux kernel from dom0
to the new domU. So, I did that, and lowered the memory allocated to
the first domU. The 2nd domU booted without any problems."
And again, 15 Jun 2006:
"...resolved the problem by lowering the memory allocated to the first domU.
After I did that, the 2nd domU started up and the "memory squeeze..." error
stopped."

http://www.linode.com/xen/irc/logs/xen.log-2006-08-15
Possibly relevant info about how setting "maxmem value < memory" might be
relevant to this error message: 
"17:21 <CosmicRay> rharper: oh, so it is a bug in xen 3.0.2 then?"
 17:22 <rharper> yes, in 3.0.2 they ignored the maxmem value"

------------------
In summary:

Nothing I've read indicates that downloading the latest version of Xen will
contain any changes that prevent the "Memory squeeze in netback driver" error
message from being printk'ed endlessly until the number of DomUs is reduced.
The thread on xen-devel started by Anthony Liguori in response to the detailed
error report by Erik Hensema is the best description I've seen of a workaround
that is stable (but might require Xen >=3.0.3 if the maxmem parameter is
ignored on all 3.0.2.X versions). I too shall try new configuration options to
see if I am able to run >4 DomUs without this error message infinitely repeated
in my system logs. I'll report back here with another comment on my success or
failure.
Comment 4 Adam Jacob 2007-04-05 18:26:53 CDT
I fixed this error message by setting (dom0_max_mem 1G) in the
/etc/xen/xend-config.sxp file.  I'm fairly certain it's simply making the dom0
not balloon past a reasonable level that makes this behavior become sane again.
Comment 5 Fermín Galán Márquez 2007-04-21 11:52:59 CDT
After unsuccessfully test maxmem and (dom0_max_mem 1G), I found the solution in
the following thread:
http://lists.xensource.com/archives/html/xen-users/2007-01/msg00428.html.

In conclusion, set the dom0_mem hypervisor boot switch to the same value that
dom0-mim-mem in /etc/xen/xend-config.sxp (dom0_mem == dom0-mim-mem). After
that, the domUs networking problems and error messages in /var/log/messages
disappear.
Comment 6 Fermín Galán Márquez 2007-04-21 11:54:57 CDT
(In reply to comment #4)
> I fixed this error message by setting (dom0_max_mem 1G) in the
> /etc/xen/xend-config.sxp file.  I'm fairly certain it's simply making the dom0
> not balloon past a reasonable level that makes this behavior become sane again.

Did you really mean 'dom0_max_mem'? I doesn't seem to be a
/etc/xen/xend-config.sxp parameter... (at least in Xen 3.0.3)
Comment 7 Andre Blum 2007-09-03 11:44:31 CDT
(In reply to comment #5)
> After unsuccessfully test maxmem and (dom0_max_mem 1G), I found the solution in
> the following thread:
> http://lists.xensource.com/archives/html/xen-users/2007-01/msg00428.html.
> 
> In conclusion, set the dom0_mem hypervisor boot switch to the same value that
> dom0-mim-mem in /etc/xen/xend-config.sxp (dom0_mem == dom0-mim-mem). After
> that, the domUs networking problems and error messages in /var/log/messages
> disappear.
> 

This does not work for me. On a production machine I want to run 10 user
domain.  I can bring up 8 without problems. Up for months. Whenever I bring up
a ninth I get these errors. dom0_mem in kernel opts is 1G, dom0-min-mem is 1024
too.
Whenever the memory squeezes happen my network functionality really breaks. Is
there a fix that *will* work? 
Comment 8 Andre Blum 2007-09-03 11:48:23 CDT
I forgot to add that this is on a 3.0.3 xen on debian etch, running on amd64
architecture. all user domains connected to default br-xen bridge device in
hypervisor.
Comment 9 Tariq Subra 2008-09-24 04:36:17 CDT
On Redhat release 5.2 get the same error, net result is we loose connection to
DomU virtual servers. Up to 7 virtual servers there were no problem, when I
introduced 2 more virtual servers. I lost connection with domU virtual servers.
Dom0 is fine.
Comment 10 Munroe 2009-02-05 20:36:21 CST
I experience this as well on a debian system 3.0.3, kernel 2.6.18
Comment 11 John 2010-10-18 08:27:10 CDT
I am getting this on CentOS 5.5 also, running Xen 3.1.2-194.11.4.el5.  This is
causing a big problem because all my VM images reside on a iSCSI NAS device,
and when this error occurs, it knocks the NFS mount offline and causes all the
VMs to crash.  I see several suggestions, and other replies saying it did not
work.  Has anyone come up with a reliable fix this?
Comment 12 Paolo Bonzini 2010-12-07 02:33:08 CST
This will be fixed in RHEL/CentOS 5.6.  The cause is usually ballooning.

The workaround that you can apply in older guests is to add the rx_copy=1
option to netfront.  Unfortunately that is not enough, as you need the patch of
https://bugzilla.redhat.com/show_bug.cgi?id=653501 in the host.  This one is
also scheduled for 5.6, but it will also be backported to 5.5.