/bugzilla3/
Bugzilla – Bug 762
xen_net: Memory squeeze in netback driver.
Last modified: 2010-12-07 02:33:08 CST
On xen-3.0.2-2, i've got a lot of "xen_net: Memory squeeze in netback driver." in /var/log/messages but network seems to work on dom0 and domU. is it normal?
sorry I forgot to mention that it occurs on a xen with PAE support
the machine froze with a lot of printk messages saying "network table overflow" any idea?
I'm using Debian packages of Xen (2.6.17-2 kernel) installed from the Etch (testing) distribution on a Dell SC430 (dual core PentiumD with 2Gig Ram). I have a minimal installation of Debian Etch (amd64) running as host Dom0. To provide isolated networks for my DomU's across one dummy ethernet interface and two physical ethernet cards (one addin 100mb/s card for Internet & the onboard 1000mb/s Intel for private LAN), I configured Xend startup to create 3 Bridges using brctl -> 1 bridge for each interface (xenbrInet = ethInet, xenbrDMZ = ethDMZ, xenbrPriv = ethPriv). I have then been able to startup several DomUs with vif's in 1,2 or all 3 of these bridges. I only see this error message after I add "one too many" DomUs (>3?) but I've not done enough testing yet to be certain what quantity or combination of the following quantities trigger this error message: * The total number of DomUs * The amount of Memory allocate per DomU (Eg: ~4x 256mb out of 2gig physical triggered the error, but does 8x 128mb?) * The total number (virtual) network interfaces * The number of virtual network interfaces per-domain I have been able to reproduce this error message using either Debian package "xen-hypervisor-3.0-unstable-1-amd64" or the hopefully stable+reliable package version "xen-hypervisor-3.0.2-1-amd64" with the recommended 2.6.16 or 2.6.17 Xen kernel packages. The next scenario I think I must test is trying to reproduce my current Xen Dom0 configuration with 4 or more DomU's using the "Demo CD Image" that I have downloaded from XenSource.com. If all the current stable release binaries available are unable to run >3 DomU's on my hardware without producing errors then I shall try checking out the latest source code and compiling it to hopefully produce a working kernel and hypervisor. Use the source browser on lxr.xensource.com to read thru the Xen source code to see that the error is probably printed from within "/linux-2.6-xen-sparse/drivers/xen/netback/netback.c". The code from xen-3.0.3_0-src.tgz in netback.c:net_rx_action() is: if (!xen_feature(XENFEAT_auto_translated_physmap) && check_mfn(nr_frags + 1)) { /* Memory squeeze? Back off for an arbitrary while. */ if ( net_ratelimit() ) WPRINTK("Memory squeeze in netback " "driver.\n"); mod_timer(&net_timer, jiffies + HZ); skb_queue_head(&rx_queue, skb); break; } Researching this error message using Google and searching the mailing lists suggests to me that this error message has been a problem for a long time (perhaps since the 3.0 release?). The must-read thread on xen-devel hinting at a solution to this error message is dated "6 Jun 2006" from Anthony Liguori: http://lists.xensource.com/archives/cgi-bin/mesg.cgi?a=xen-devel&i=4485F70E.4020602%40us.ibm.com "Do this mean that the netback driver needs to be able to increase it's reservation?" Keir Fraser writes in reply: "The kernel should be able to do it for itself." Relevant postings to the xen-users mailing list reporting the same or similar problem are: http://lists.xensource.com/archives/html/xen-users/2006-08/msg00947.html 30 Aug 2006: "Memory squeeze in netback driver" "...doesn't appear to affect performance until you add one too many guest domains, then the whole lot drop off the net. It's not hardware related as I can recreate the issue on 3 completely separate servers with dissimilar hardware." [Increasing the quantity of DomUs certainly seems to be a common theme in all reports of this bug. Does it occur only on 64bit (amd64 and em64t) systems running 64bit enabled Kernels?] http://lists.xensource.com/archives/html/xen-users/2005-11/msg00534.html 22 Nov 2005: "...Xen 3.0 release using the Fedora Core 4 Wiki instructions.." [Distro used probably isn't a factor.] http://lists.xensource.com/archives/html/xen-users/2006-02/msg00077.html 2 Feb 2006: "...if dom0_mem set to 196MB then I can start 20 domains using all available ram without any problems..." "Before I rebooted with dom0_mem set to 196MB... ...I also saw: "xen_net: Memory squeeze in netback driver." in dom0..." "I have a solution now (using dom0_mem and 3.0.1)..." [So it seems that the amount of RAM allocated to Dom0 is a factor?] http://lists.xensource.com/archives/html/xen-users/2006-06/msg00524.html 13 Jun 2006: "> When I start more than about 3 domU's..." "I'm seeing this error as soon I as start a second domU." Follow up in the same thread, 14 Jun 2006: "I had forgotten to copy the modules for the xen linux kernel from dom0 to the new domU. So, I did that, and lowered the memory allocated to the first domU. The 2nd domU booted without any problems." And again, 15 Jun 2006: "...resolved the problem by lowering the memory allocated to the first domU. After I did that, the 2nd domU started up and the "memory squeeze..." error stopped." http://www.linode.com/xen/irc/logs/xen.log-2006-08-15 Possibly relevant info about how setting "maxmem value < memory" might be relevant to this error message: "17:21 <CosmicRay> rharper: oh, so it is a bug in xen 3.0.2 then?" 17:22 <rharper> yes, in 3.0.2 they ignored the maxmem value" ------------------ In summary: Nothing I've read indicates that downloading the latest version of Xen will contain any changes that prevent the "Memory squeeze in netback driver" error message from being printk'ed endlessly until the number of DomUs is reduced. The thread on xen-devel started by Anthony Liguori in response to the detailed error report by Erik Hensema is the best description I've seen of a workaround that is stable (but might require Xen >=3.0.3 if the maxmem parameter is ignored on all 3.0.2.X versions). I too shall try new configuration options to see if I am able to run >4 DomUs without this error message infinitely repeated in my system logs. I'll report back here with another comment on my success or failure.
I fixed this error message by setting (dom0_max_mem 1G) in the /etc/xen/xend-config.sxp file. I'm fairly certain it's simply making the dom0 not balloon past a reasonable level that makes this behavior become sane again.
After unsuccessfully test maxmem and (dom0_max_mem 1G), I found the solution in the following thread: http://lists.xensource.com/archives/html/xen-users/2007-01/msg00428.html. In conclusion, set the dom0_mem hypervisor boot switch to the same value that dom0-mim-mem in /etc/xen/xend-config.sxp (dom0_mem == dom0-mim-mem). After that, the domUs networking problems and error messages in /var/log/messages disappear.
(In reply to comment #4) > I fixed this error message by setting (dom0_max_mem 1G) in the > /etc/xen/xend-config.sxp file. I'm fairly certain it's simply making the dom0 > not balloon past a reasonable level that makes this behavior become sane again. Did you really mean 'dom0_max_mem'? I doesn't seem to be a /etc/xen/xend-config.sxp parameter... (at least in Xen 3.0.3)
(In reply to comment #5) > After unsuccessfully test maxmem and (dom0_max_mem 1G), I found the solution in > the following thread: > http://lists.xensource.com/archives/html/xen-users/2007-01/msg00428.html. > > In conclusion, set the dom0_mem hypervisor boot switch to the same value that > dom0-mim-mem in /etc/xen/xend-config.sxp (dom0_mem == dom0-mim-mem). After > that, the domUs networking problems and error messages in /var/log/messages > disappear. > This does not work for me. On a production machine I want to run 10 user domain. I can bring up 8 without problems. Up for months. Whenever I bring up a ninth I get these errors. dom0_mem in kernel opts is 1G, dom0-min-mem is 1024 too. Whenever the memory squeezes happen my network functionality really breaks. Is there a fix that *will* work?
I forgot to add that this is on a 3.0.3 xen on debian etch, running on amd64 architecture. all user domains connected to default br-xen bridge device in hypervisor.
On Redhat release 5.2 get the same error, net result is we loose connection to DomU virtual servers. Up to 7 virtual servers there were no problem, when I introduced 2 more virtual servers. I lost connection with domU virtual servers. Dom0 is fine.
I experience this as well on a debian system 3.0.3, kernel 2.6.18
I am getting this on CentOS 5.5 also, running Xen 3.1.2-194.11.4.el5. This is causing a big problem because all my VM images reside on a iSCSI NAS device, and when this error occurs, it knocks the NFS mount offline and causes all the VMs to crash. I see several suggestions, and other replies saying it did not work. Has anyone come up with a reliable fix this?
This will be fixed in RHEL/CentOS 5.6. The cause is usually ballooning. The workaround that you can apply in older guests is to add the rx_copy=1 option to netfront. Unfortunately that is not enough, as you need the patch of https://bugzilla.redhat.com/show_bug.cgi?id=653501 in the host. This one is also scheduled for 5.6, but it will also be backported to 5.5.