/bugzilla3/ Bug 195 – Many Timer ISR: Time went backwards logs
Bug 195 - Many Timer ISR: Time went backwards logs
: Many Timer ISR: Time went backwards logs
Status: NEW
Product: Xen
2.6.18 dom0
: 3.0.2
: x86 Linux-2.6
: P2 major
Assigned To: Xen Bug List
:
:
:
  Show dependency treegraph
 
Reported: 2005-08-28 18:27 CDT by Alexander Junghans
Modified: 2010-05-04 15:02 CDT (History)
23 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Junghans 2005-08-28 18:27:06 CDT
Hello,
afer months of using stabe 2.0.6 and 2.0.7 i changed the CPU from Athlon 64 to a
Athlon 62 x2 dual core. 
Works fine and fast but the syslog on the UDomains logs many entries like:

Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -2129859000
169370000000 141000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -2106071000
169390000000 3929000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -2087345000
169410000000 2655000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -2079980000
169420000000 20000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -2049817000
169450000000 183000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -2005001000
169490000000 4999000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1994341000
169500000000 5659000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1983325000
169510000000 6675000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1970842000
169520000000 9158000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1969970000
169530000000 30000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1946520000
169550000000 3481000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1929923000
169570000000 77000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1912453000
169580000000 7547000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1909926000
169590000000 74000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1887345000
169610000000 2655000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1863340000
169630000000 6660000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1859734000
169640000000 266000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1848726000
169650000000 1274000 171500000000
Aug 28 20:03:23 music kernel: Timer ISR: Time went backwards: -1827531000
169670000000 2469000 171500000000
...

I am afraid of overloading the UDomais with thousands of these entries.

Using a plain 2.0.7 from src on a Asus A8V with APIC enabled + disabled in BIOS
with no change. Changing Preemptible Kernel to yes/no with no change.
APM + Cool & Quiet are disabled.

Please help.
Alle further questions will bw answered.
Thanks.
Comment 1 Alexander Junghans 2005-08-29 06:00:57 CDT
All User Domains are crashed because of 100% filled root partitions, caused by
millions of logfile entries (Timer ISR: Time went backwards). Removed the
dualcore and no backwards logenties occur.
Comment 2 Ian Pratt 2005-08-29 09:52:43 CDT
Please can you try -unstable and verify this is fixed.

Comment 3 Alexander Junghans 2005-08-29 19:03:13 CDT
I made up a clean new System. New mainboard new debian system from source.
(Asus A8V + AMD Athlon X2)
Running 3 User domains sending tcp date to each other. No service or demons
running except inetd.
Same result: No X2 fine, X2 many many log entries.
I pinned all domains to cpu0 and there are only some error lines left a minute.

Then I followed your advice and installed unstabe.
The DOM0 was not able to boot.
XEN stops and displays:

Could not setup DOM0 guest OS
Aieee! CPU0 ist toast.

Memory checked. Total 1GB , DOM0 200M or 100M the same result.

I will double check tomorrow, and try the unstable on an K7.

Any ideas?
Thanks
Comment 4 Alexander Junghans 2005-09-01 06:35:07 CDT
I verified unstable and it seems to be fixed.
(The CPU toased message was caused by my fault, wrong link)

I have tried testing with no success. The DOM0 kernel will always crash.

I also tryed lapic, nolapic on stabe, with no difference.
When using nosmp, to errors occur. (Of cause one CPU is disabled)

Still no chance to run stable on x2 with both CPUs enabeld.
Comment 5 Alexander Junghans 2005-09-04 20:22:26 CDT
In the meantime testing was fixed to run again, and has the same problem as
stable with the AMD x2.
Comment 6 Ryan Harper 2005-09-15 20:15:18 CDT
I'm seeing this in xen-unstable, on a two-way Opteron box.

changeset:   6874:c27431cf81f96a6d98c4e25bd08e6762918019e4
tag:         tip
user:        kaf24@firebug.cl.cam.ac.uk
date:        Thu Sep 15 08:17:24 2005
summary:     Fix the libxc Makefile.

Timer ISR/0: Time went backwards: delta=-10048465 cpu_delta=9951535
shadow=706605145 0: 71640046095
 1: 71660000000
Timer ISR/0: Time went backwards: delta=-10044816 cpu_delta=19955184
shadow=71660475 0: 71660046095
 1: 71690000000
Timer ISR/0: Time went backwards: delta=-6038496 cpu_delta=13961504
shadow=716604705 0: 71670046095
 1: 71700000000
Timer ISR/1: Time went backwards: delta=-580042799 cpu_delta=460003296
shadow=716600 0: 73240046095
 1: 72200000000
Timer ISR/0: Time went backwards: delta=-10039377 cpu_delta=9960623
shadow=732371225 0: 73240046095
 1: 73260000000
Timer ISR/0: Time went backwards: delta=-10040262 cpu_delta=29959738
shadow=73237125 0: 73240046095
 1: 73280000000
Comment 7 Ted K 2005-09-29 13:39:37 CDT
changeset:   7121:f069a06e650f
Dom0 - FC4 - Taym 2462 SMP

I ran a for an extended period of time(over 12 hours) before getting one
Timer ISR/1: Time went backwards: delta=-120006819 cpu_delta=31293235
shadow=34837964135648 off=997157858 process0 0: 34839081300054
 1: 34838930000000

Preceeding this I brought a second brought bridge interface online, I had to
stop xend to get it to allow me to add eth1 to it. At this point their is only a
dom0 running and no domU's. 
Comment 8 Ryan Harper 2005-10-13 16:12:25 CDT
Anyone seeing this issue, can you check which timer is selected?  You can get
this value by running:

% xm dmesg | grep -i "platform timer"
(XEN) Platform timer is 1.193MHz PIT

I've not seen this when either HPET or Cyclone timer is used, only PIT.
Comment 9 Alexander Junghans 2005-10-17 13:13:36 CDT
Hi,

i have no platform timer, but

master1:~# xm dmesg | grep -i "timer"
(XEN) ..TIMER: vector=0x41 pin1=2 pin2=0
(XEN) Using local APIC timer interrupts.
(XEN) Calibrating APIC timer for CPU0...
Comment 10 Ryan Harper 2005-10-17 14:57:04 CDT
Interesting.  So your system is using the local apic for interrupt generation,
but you should still have one of the three platform timers running.  Looking at
xen/arch/x86/time.c, I don't see how you could boot without one of the three
timers running:

static void init_platform_timer(void)
{
    if ( !init_cyclone() && !init_hpet() )
        BUG_ON(!init_pit());
}

What hardware are you using?  SMP? real or just hyperthreads?  You still see
this error on latest unstable?
Comment 11 Alexander Junghans 2005-10-28 12:00:48 CDT
Hi,
today i got a dual xenon to play with.
With same problem and i have no platform timer like you.

xm dmesg

ERROR: cannot use unconfigured serial port COM1
 __  __            ____    ___  _____
 \ \/ /___ _ __   |___ \  / _ \|___  |
  \  // _ \ '_ \    __) || | | |  / /
  /  \  __/ | | |  / __/ | |_| | / /
 /_/\_\___|_| |_| |_____(_)___(_)_/

 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 2.0.7 (root@leukefeld.com) (gcc-Version 3.3.5 (Debian 1:3.3.5-13))
Mi Okt 19 15:12:53 CEST 2005
 Latest ChangeSet:

(XEN) Physical RAM map:
(XEN)  0000000000000000 - 000000000009dc00 (usable)
(XEN)  000000000009dc00 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000007fee0000 (usable)
(XEN)  000000007fee0000 - 000000007fee3000 (ACPI NVS)
(XEN)  000000007fee3000 - 000000007fef0000 (ACPI data)
(XEN)  000000007fef0000 - 000000007ff00000 (reserved)
(XEN)  00000000fec00000 - 0000000100000000 (reserved)
(XEN) System RAM: 2046MB (2095604kB)
(XEN) Xen heap: 10MB (10732kB)
(XEN) CPU0: Before vendor init, caps: bfebfbff 00000000 00000000, vendor = 0
(XEN) CPU#0: Physical ID: 0, Logical ID: 0
(XEN) CPU caps: bfebfbff 00000000 00000000 00000000
(XEN) found SMP MP-table at 000f56b0
(XEN) ACPI: RSDP (v000 IntelR                                    ) @ 0x000f7700
(XEN) ACPI: RSDT (v001 IntelR AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x7fee3000
(XEN) ACPI: FADT (v001 IntelR AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x7fee3040
(XEN) ACPI: MADT (v001 IntelR AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x7fee7080
(XEN) ACPI: DSDT (v001 INTELR AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 Pentium 4(tm) XEON(tm) APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
(XEN) Processor #6 Pentium 4(tm) XEON(tm) APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
(XEN) Processor #1 Pentium 4(tm) XEON(tm) APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
(XEN) Processor #7 Pentium 4(tm) XEON(tm) APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
(XEN) Using ACPI for processor (LAPIC) configuration information
(XEN) Intel MultiProcessor Specification v1.4
(XEN)     Virtual Wire compatibility mode.
(XEN) OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
(XEN) I/O APIC #4 Version 17 at 0xFEC00000.
(XEN) I/O APIC #5 Version 17 at 0xFEC10000.
(XEN) Enabling APIC mode: Flat. Using 2 I/O APICs
(XEN) Processors: 4
(XEN) Using scheduler: Borrowed Virtual Time (bvt)
(XEN) Initializing CPU#0
(XEN) Detected 2806.470 MHz processor.
(XEN) CPU0: Before vendor init, caps: bfebfbff 00000000 00000000, vendor = 0
(XEN) CPU#0: Physical ID: 0, Logical ID: 0
(XEN) CPU caps: bfebfbff 00000000 00000000 00000000
(XEN) CPU0 booted
(XEN) enabled ExtINT on CPU#0
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) Booting processor 1/1 eip 90000
(XEN) Initializing CPU#1
(XEN) masked ExtINT on CPU#1
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) CPU1: Before vendor init, caps: bfebfbff 00000000 00000000, vendor = 0
(XEN) CPU#1: Physical ID: 0, Logical ID: 1
(XEN) CPU caps: bfebfbff 00000000 00000000 00000000
(XEN) CPU1 has booted.
(XEN) Booting processor 2/6 eip 90000
(XEN) Initializing CPU#2
(XEN) masked ExtINT on CPU#2
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) CPU2: Before vendor init, caps: bfebfbff 00000000 00000000, vendor = 0
(XEN) CPU#2: Physical ID: 3, Logical ID: 0
(XEN) CPU caps: bfebfbff 00000000 00000000 00000000
(XEN) CPU2 has booted.
(XEN) Booting processor 3/7 eip 90000
(XEN) Initializing CPU#3
(XEN) masked ExtINT on CPU#3
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) CPU3: Before vendor init, caps: bfebfbff 00000000 00000000, vendor = 0
(XEN) CPU#3: Physical ID: 3, Logical ID: 1
(XEN) CPU caps: bfebfbff 00000000 00000000 00000000
(XEN) CPU3 has booted.
(XEN) Total of 4 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN) Setting 4 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 4 ... ok.
(XEN) Setting 5 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 5 ... ok.
(XEN) init IO_APIC IRQs
(XEN) ..TIMER: vector=0x41 pin1=2 pin2=0
(XEN) Using local APIC timer interrupts.
(XEN) Calibrating APIC timer for CPU0...
(XEN) ..... CPU speed is 2806.3682 MHz.
(XEN) ..... Bus speed is 133.6364 MHz.
(XEN) ..... bus_scale = 0x000088DC
(XEN) checking TSC synchronization across CPUs: passed.
(XEN) Time init:
(XEN) .... System Time: 11289036ns
(XEN) .... cpu_freq:    00000000:A7475890
(XEN) .... scale:       00000001:6CDEFE5B
(XEN) .... Wall Clock:  1130503621s 120000us
(XEN) PCI: PCI BIOS revision 2.10 entry at 0xfb890, last bus=3
(XEN) PCI: Using configuration type 1
(XEN) PCI: Probing PCI hardware
(XEN) PCI: Probing PCI hardware (bus 00)
(XEN) PCI: Ignoring BAR0-3 of IDE controller 00:1f.1
(XEN) Transparent bridge - PCI device 8086:244e
(XEN) PCI: Using IRQ router default [8086/25a1] at 00:1f.0
(XEN) PCI->APIC IRQ transform: (B0,I29,P0) -> 16
(XEN) PCI->APIC IRQ transform: (B0,I29,P1) -> 19
(XEN) PCI->APIC IRQ transform: (B0,I29,P3) -> 23
(XEN) PCI->APIC IRQ transform: (B0,I31,P0) -> 16
(XEN) PCI->APIC IRQ transform: (B0,I31,P1) -> 17
(XEN) PCI->APIC IRQ transform: (B1,I1,P0) -> 18
(XEN) PCI->APIC IRQ transform: (B3,I3,P0) -> 20
(XEN) PCI->APIC IRQ transform: (B3,I4,P0) -> 23
(XEN) PCI->APIC IRQ transform: (B3,I5,P0) -> 22
(XEN) mtrr: v2.0 (20020519)
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Xen-ELF header found:
'GUEST_OS=linux,GUEST_VER=2.6,XEN_VER=2.0,VIRT_BASE=0xC0000000,LOADER=generic,PT_MODE_WRITABLE'
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Kernel image:  00c00000->0109a0f0
(XEN)  Initrd image:  00000000->00000000
(XEN)  Dom0 alloc.:   01400000->09100000
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0100000->c05c90e8
(XEN)  Init. ramdisk: c05ca000->c05ca000
(XEN)  Phys-Mach map: c05ca000->c05e9400
(XEN)  Page tables:   c05ea000->c05ed000
(XEN)  Start info:    c05ed000->c05ee000
(XEN)  Boot stack:    c05ee000->c05ef000
(XEN)  TOTAL:         c0000000->c0800000
(XEN)  ENTRY ADDRESS: c0100000
(XEN) Scrubbing DOM0 RAM: ..done.
(XEN) Scrubbing Free RAM: .....................done.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen).
Comment 12 Jani Karlsson 2006-01-27 06:59:28 CST
This happened on my Dual Xeon system (64-bit), on Domain0..
Same errormessages and crash soon after. Xen used was 3.0 (bittorrent src
download), compiled with GCC 3.3 on Debian Sarge 64-bit.
Comment 13 Alexander Junghans 2006-02-14 18:49:34 CST
Hi, I updated to 3.0.1 still using the AMD X2.
Sadly the bug is still there.

After the Time went backwards the user domain crashed.
Logfile from user domain.

Disabling the second core with nosmp no problems.

Feb 13 02:09:28 localhost kernel:  0: 100089097992644
Feb 13 02:09:28 localhost kernel: Timer ISR/0: Time went backwards: delta=-26905
cpu_delta=-26905 shadow=100089124849613 off=403116849 processed=1000895279926
44 cpu_processed=100089527992644
Feb 13 02:09:28 localhost kernel:  0: 100089527992644
Feb 13 02:09:28 localhost kernel: Timer ISR/0: Time went backwards: delta=-26839
cpu_delta=-26839 shadow=100089124849613 off=673116896 processed=1000897979926
44 cpu_processed=100089797992644
Feb 13 02:09:28 localhost kernel:  0: 100089797992644
Feb 13 02:14:00 localhost kernel: Timer ISR/0: Time went backwards: delta=-24351
cpu_delta=-24351 shadow=100361125595817 off=722373311 processed=1003618479926
44 cpu_processed=100361847992644
Feb 13 02:14:00 localhost kernel:  0: 100361847992644
Feb 13 02:15:21 localhost kernel: Timer ISR/0: Time went backwards: delta=-26759
cpu_delta=-26759 shadow=100442126075542 off=661890944 processed=1004427879926
44 cpu_processed=100442787992644
Feb 13 02:15:21 localhost kernel:  0: 100442787992644
Feb 13 02:15:35 localhost kernel: Timer ISR/0: Time went backwards: delta=-26965
cpu_delta=-26965 shadow=100456126244826 off=721721498 processed=1004568479926
44 cpu_processed=100456847992644
Feb 13 02:15:35 localhost kernel:  0: 100456847992644
Feb 13 02:16:44 localhost kernel: Timer ISR/0: Time went backwards: delta=-26424
cpu_delta=-26424 shadow=100524127010194 off=920956621 processed=1005250479926
44 cpu_processed=100525047992644
Feb 13
Comment 14 Harald Koenig 2006-02-15 17:12:42 CST
one more data point for this quest:

I see the same "Timer ..." kernel messages on a dual-core dual-Opteron:

Timer ISR/2: Time went backwards: delta=-2665898 cpu_delta=13384384
shadow=446519338822167 off=982562370 processed=446520324050282
cpu_processed=446520308000000
 0: 446520320050282
 1: 446520324000000
 2: 446520308000000
 3: 446520316000000

# dmesg  |grep -c Timer
1023

# grep -c Timer /var/log/warn
14320 

from today 15:00:32 to 19:15:15, so ~1 message per second, but they show up in
long bursts with pretty random "idle" periods of ~30-300 seconds between those
bursts...



kernel being used (SUSE 10.1 SL-OSS-Factory):

 Xen version 3.0_8659-2 (abuild@suse.de) (gcc version 4.1.0 20060123
(prerelease) (SUSE Linux)) Tue Jan 31 06:15:27 UTC 2006
 Latest ChangeSet: 8659

# xm dmesg | grep -i "platform timer"
(XEN) Platform timer is 1.193MHz PIT
Comment 15 David F Barrera 2006-03-01 16:12:31 CST
*** Bug 550 has been marked as a duplicate of this bug. ***
Comment 16 David F Barrera 2006-03-01 16:17:06 CST
I am seeing these messages on an x86_64 (EMT64) machine, SLES 9 SP2 platform:

Timer ISR/0: Time went backwards: delta=10015222 cpu_delta=-19984778
shadow=7912025006818 off=855029678 processed=7912870000000
cpu_processed=7912900000000
 0: 7912900000000
printk: 1 messages suppressed.
Timer ISR/0: Time went backwards: delta=10014636 cpu_delta=-19985364
shadow=7932025007360 off=315033645 processed=7932330000000
cpu_processed=7932360000000
 0: 7932360000000
Timer ISR/0: Time went backwards: delta=10009494 cpu_delta=-9990506
shadow=7932025007360 off=325003314 processed=7932340000000
cpu_processed=7932360000000
 0: 7932360000000
Comment 17 Douglas Willis 2006-03-14 11:18:38 CST
Further details.

I am running Xen on a quad Opteron Sun 40z box with Opteron 850 CG stepping
processors and am getting thse messages also.

Initial OS install was CentOS 4.2 x86_64 then installed Xen 3.0.1-src.tar which
built the 2.6.12.6 kernel.

The timer shown in xm dmesg is 
(XEN) Platform timer is 1.193MHz PIT

The message is infrequent. 

I've put the output from dmesg in Domain0 and  xm dmesg in a zip file on ouf ftp
server at ftp.bas.ac.uk/ddw/xen_dmesg.ZIP
Comment 18 Nivedita Singhvi 2006-03-15 17:40:43 CST
Hello! I believe there is a fix in the tree (xen-unstable) now. Could you
please
retest and validate that is has fixed the problem? If so, can also go ahead and
close this bug.
Comment 19 David F Barrera 2006-03-20 20:14:48 CST
Tested with changeset 9329. Problem is gone. Closing defect.
Comment 20 Alexander Junghans 2006-04-17 23:45:36 CDT
Hi,

running 3.0.2-2 on AMD X2 for 12h.
No Problems!

Thanks!!
Comment 21 Torsten 2006-07-16 13:55:07 CDT
Hello

I using XEN 3.0.2 and Kernel 2.6.16.23 and I got the following message to the
console. Then I type 'dmesg' I got the messages, too.

The messages are show in Dom0 and DomU.

printk: 19 messages suppressed.
Timer ISR/0: Time went backwards: delta=-36773821 delta_cpu=63226179
shadow=76412091215682 off=782267151 processed=76412910083807
cpu_processed=76412810083807
 0: 76412810083807
 1: 76412910083807
printk: 32 messages suppressed.
Timer ISR/1: Time went backwards: delta=-17602207 delta_cpu=52397793
shadow=76417239052095 off=573635376 processed=76417830083807
cpu_processed=76417760083807
 0: 76417830083807
 1: 76417760083807
printk: 23 messages suppressed.
Timer ISR/1: Time went backwards: delta=-59002054 delta_cpu=80997946
shadow=76422453452515 off=337851328 processed=76422850083807
cpu_processed=76422710083807
 0: 76422850083807
 1: 76422710083807
printk: 21 messages suppressed.
Timer ISR/1: Time went backwards: delta=-30093323 delta_cpu=39906677
shadow=76427542969853 off=1047247351 processed=76428620083807
cpu_processed=76428550083807
 0: 76428620083807
 1: 76428550083807

What the reason for this bug? Is there a way to solve the problem?

Torsten
Comment 22 Ingo Reimann 2006-08-10 04:24:08 CDT
(In reply to comment #21)
> Hello
> 
> I using XEN 3.0.2 and Kernel 2.6.16.23 and I got the following message to the
> console. Then I type 'dmesg' I got the messages, too.
> 
> The messages are show in Dom0 and DomU.
> 
> [...]

Hi, same is true for me; i have a dual Pentium III (Coppermine). When i start
Dom0, first all semss fine:

 __  __            _____  ___   ____    ____
 \ \/ /___ _ __   |___ / / _ \ |___ \  |___ \
  \  // _ \ '_ \    |_ \| | | |  __) |__ __) |
  /  \  __/ | | |  ___) | |_| | / __/|__/ __/
 /_/\_\___|_| |_| |____(_)___(_)_____| |_____|

 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 3.0.2-2 (root@lv-h.de) (gcc-Version 3.3.6 (Debian 1:3.3.6-13)) Mi
Aug  9 16:54:10 CEST 2006
 Latest ChangeSet: Thu Apr 13 15:18:37 2006 +0100 9617:5802713c159b

(XEN) Physical RAM map:
(XEN)  0000000000000000 - 000000000009fc00 (usable)
(XEN)  000000000009fc00 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000007fff0000 (usable)
(XEN)  000000007fff0000 - 000000007fff3000 (ACPI NVS)
(XEN)  000000007fff3000 - 0000000080000000 (ACPI data)
(XEN)  00000000fec00000 - 0000000100000000 (reserved)
(XEN) System RAM: 2047MB (2096700kB)
(XEN) Xen heap: 10MB (10560kB)
(XEN) Using scheduler: Simple EDF Scheduler (sedf)
(XEN) PAE disabled.
(XEN) found SMP MP-table at 000f56b0
(XEN) DMI 2.2 present.
(XEN) Using APIC driver default
(XEN) ACPI: RSDP (v000 VIA694                                ) @ 0x000f6ff0
(XEN) ACPI: RSDT (v001 VIA694 AWRDACPI 0x30302e32 AWRD 0x00000000) @ 0x7fff3000
(XEN) ACPI: FADT (v002 VIA694 AWRDACPI 0x30302e32 AWRD 0x00000000) @ 0x7fff3040
(XEN) ACPI: MADT (v001 VIA694 AWRDACPI 0x30302e32 AWRD 0x00000000) @ 0x7fff5880
(XEN) ACPI: DSDT (v001 VIA694 AWRDACPI 0x00001000 MSFT 0x0100000c) @ 0x00000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 6:8 APIC version 17
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 6:8 APIC version 17
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 dfl dfl)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Initializing CPU#0
(XEN) Detected 997.181 MHz processor.
(XEN) CPU: L1 I cache: 16K, L1 D cache: 16K
(XEN) CPU: L2 cache: 256K
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: Intel Pentium III (Coppermine) stepping 0a
(XEN) Booting processor 1/1 eip 90000
(XEN) Initializing CPU#1
(XEN) CPU: L1 I cache: 16K, L1 D cache: 16K
(XEN) CPU: L2 cache: 256K
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#1.
(XEN) CPU1: Intel Pentium III (Coppermine) stepping 0a
(XEN) Total of 2 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN) ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) checking TSC synchronization across 2 CPUs: passed.
(XEN) Platform timer is 1.193MHz PIT
(XEN) Brought up 2 CPUs
(XEN) Machine check exception polling timer started.
(XEN) mtrr: your CPUs had inconsistent variable MTRR settings
(XEN) mtrr: probably your BIOS does not setup all CPUs.
(XEN) mtrr: corrected configuration.
(XEN) Using IPI Shortcut mode
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Domain 0 kernel supports features = { 0000001f }.
(XEN) Domain 0 kernel requires features = { 00000000 }.
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   7c000000->7e000000 (476593 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0100000->c0452254
(XEN)  Init. ramdisk: c0453000->c110c800
(XEN)  Phys-Mach map: c110d000->c12e66c4
(XEN)  Start info:    c12e7000->c12e8000
(XEN)  Page tables:   c12e8000->c12ee000
(XEN)  Boot stack:    c12ee000->c12ef000
(XEN)  TOTAL:         c0000000->c1400000
(XEN)  ENTRY ADDRESS: c0100000
(XEN) Dom0 has maximum 2 VCPUs
(XEN) Initrd len 0xcb9800, start at 0xc0453000
(XEN) Scrubbing Free RAM: .....................done.
(XEN) Xen trace buffers: disabled
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
Xen).

i also was able to start a DomU. Some time later, these errors occure and
starting an additional DomU hangs:
Linux version 2.6.16.13-xen (apeace@paploo) (gcc version 4.0.3 (Debian
4.0.3-1)) #1 SMP Mon May 15 17:26:36 BST 2006
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 0000000010000000 (usable)
0MB HIGHMEM available.
264MB LOWMEM available.
ACPI in unprivileged domain disabled
IRQ lockup detection disabled
Built 1 zonelists
Kernel command line:  root=/dev/sda1 ro
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 32768 bytes)
Xen reported: 1243.249 MHz processor.
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Software IO TLB disabled
vmalloc area: d1000000-fb7fe000, maxmem 33ffe000
Memory: 242816k/270336k available (2231k kernel code, 19196k reserved, 694k
data, 180k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.

what is it, how may i fix that?
I tested official Kernels from the Live-CD as well as Debian Backports al well
a a selfcompiled.

Best whishes,
Ingo 
Comment 23 Ternes 2006-08-21 04:32:33 CDT
I experience exactly the same problem on a FSC AMILO Pro V3205 with Intel Core
Duo T2050 (OEM). It doesn't matter which OS runs in dom0. I've tried FC5,
OpenSUSE 10.1 and Debian unstable.


Booting of xenU generates nothing more than this:

debian37:~# xm create /etc/xen/vmsid1.cfg -c
Using config file "/etc/xen/vmsid1.cfg".
Started domain vmsid1
Linux version 2.6.16-2-xen-686 (Debian 2.6.16-17) (waldi@debian.org) (gcc
version 4.0.4 20060630 (prerelease) (Debian 4.0.3-4)) #1 SMP Sun Jul 16
01:42:25 UTC 2006
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 0000000008000000 (usable)
0MB HIGHMEM available.
136MB LOWMEM available.
ACPI in unprivileged domain disabled
IRQ lockup detection disabled
Built 1 zonelists
Kernel command line:  root=/dev/sda1 ro
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 1024 (order: 10, 16384 bytes)
Xen reported: 1596.512 MHz processor.
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 6556 bytes)
Software IO TLB disabled
vmalloc area: b9000000-fb7fe000, maxmem 43ffe000
Memory: 121696k/139264k available (1507k kernel code, 9232k reserved, 527k
data, 144k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
-------------------------------------------------------------------------------

debian37:~# dmesg
printk: 133 messages suppressed.
Timer ISR/1: Time went backwards: delta=-15995812 delta_cpu=12004188
shadow=918553 off=2713088566475 processed=2713104000000
cpu_processed=2713076000000
 0: 2713100000000
 1: 2713076000000
------------------------------------------------------------------------------ 

debian37:~# xm dmesg
 Xen version 3.0.2-3 (Debian 3.0.2+hg9697-1) (ultrotter@debian.org) (gcc
version 4.0.4 20060507 (prerelease) (Debian 4.0.3-3)) Wed May 31 10:05:36 UTC
2006
(XEN) Command line: /boot/xen-3.0-i386.gz
(XEN) Physical RAM map:
(XEN)  0000000000000000 - 000000000009f800 (usable)
(XEN)  000000000009f800 - 00000000000a0000 (reserved)
(XEN)  00000000000dc000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000007f690000 (usable)
(XEN)  000000007f690000 - 000000007f698000 (ACPI data)
(XEN)  000000007f698000 - 000000007f700000 (ACPI NVS)
(XEN)  000000007f700000 - 0000000080000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec10000 (reserved)
(XEN)  00000000fed14000 - 00000000fed1a000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed90000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff000000 - 0000000100000000 (reserved)
(XEN) System RAM: 2038MB (2087100kB)
(XEN) Xen heap: 10MB (10528kB)
(XEN) Using scheduler: Simple EDF Scheduler (sedf)
(XEN) PAE disabled.
(XEN) found SMP MP-table at 000f76d0
(XEN) DMI present.
(XEN) Using APIC driver default
(XEN) ACPI: RSDP (v000 PTLTD                                 ) @ 0x000f7590
(XEN) ACPI: RSDT (v001 FUJ___ DW1_____ 0x20060608  LTP 0x00000000) @ 0x7f691f1d
(XEN) ACPI: FADT (v001 FUJ___ DW1_____ 0x20060608 LOHR 0x0000005a) @ 0x7f697e20
(XEN) ACPI: MADT (v001 FUJ___ DW1_____ 0x20060608 LOHR 0x0000005a) @ 0x7f697e94
(XEN) ACPI: BOOT (v001 FUJ___ DW1_____ 0x20060608  LTP 0x00000001) @ 0x7f697fd8
(XEN) ACPI: MCFG (v001 FUJ___ DW1_____ 0x20060608 LOHR 0x0000005a) @ 0x7f697f34
(XEN) ACPI: MADT (v001 FUJ___ DW1_____ 0x20060608  LTP 0x00000000) @ 0x7f697f70
(XEN) ACPI: SSDT (v001  PmRef    CpuPm 0x00003000 INTL 0x20050228) @ 0x7f691f59
(XEN) ACPI: DSDT (v001 FUJ___ DW1_____ 0x20060608 MSFT 0x0100000e) @ 0x00000000
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: 2 duplicate APIC table ignored.
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 6:14 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 6:14 APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 1 already used, trying 2
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Initializing CPU#0
(XEN) Detected 1596.500 MHz processor.
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 0
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: Intel Genuine Intel(R) CPU           T2050  @ 1.60GHz stepping 08
(XEN) Booting processor 1/1 eip 90000
(XEN) Initializing CPU#1
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 2048K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#1.
(XEN) CPU1: Intel Genuine Intel(R) CPU           T2050  @ 1.60GHz stepping 08
(XEN) Total of 2 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) checking TSC synchronization across 2 CPUs: passed.
(XEN) Platform timer is 1.193MHz PIT
(XEN) Brought up 2 CPUs
(XEN) Machine check exception polling timer started.
(XEN) Using IPI Shortcut mode
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Domain 0 kernel supports features = { 0000001f }.
(XEN) Domain 0 kernel requires features = { 00000000 }.
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   7e000000->7f000000 (478453 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: b0100000->b0368dd4
(XEN)  Init. ramdisk: b0369000->b082e000
(XEN)  Phys-Mach map: b082e000->b0a053d4
(XEN)  Start info:    b0a06000->b0a07000
(XEN)  Page tables:   b0a07000->b0a0b000
(XEN)  Boot stack:    b0a0b000->b0a0c000
(XEN)  TOTAL:         b0000000->b0c00000
(XEN)  ENTRY ADDRESS: b0100000
(XEN) Dom0 has maximum 2 VCPUs
(XEN) Initrd len 0x4c5000, start at 0xb0369000
(XEN) Scrubbing Free RAM: .....................done.
(XEN) Xen trace buffers: disabled
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
Xen).

-----------------------------------------------------------------------------
debian37:~# xm info
host                   : debian37
release                : 2.6.16-2-xen-686
version                : #1 SMP Sun Jul 16 01:42:25 UTC 2006
machine                : i686
nr_cpus                : 2
nr_nodes               : 1
sockets_per_node       : 1
cores_per_socket       : 2
threads_per_core       : 1
cpu_mhz                : 1596
hw_caps                : bfe9fbff:00100000:00000000:00000140:0000c189
total_memory           : 2039
free_memory            : 108
xen_major              : 3
xen_minor              : 0
xen_extra              : .2-3
xen_caps               : xen-3.0-x86_32
platform_params        : virt_start=0xfc000000
xen_changeset          : Tue May 30 18:14:05 2006 +0100
cc_compiler            : gcc version 4.0.4 20060507 (prerelease) (Debian
4.0.3-3)
cc_compile_by          : ultrotter@debia
cc_compile_domain      : [unknown]
cc_compile_date        : Wed May 31 10:05:36 UTC 2006
------------------------------------------------------------------------------

If I shall provide more informations please let me know.
Comment 24 John Little 2006-09-19 10:38:20 CDT
Hi

I've recently installed xen 3.0.2 from Gentoo portage with the same problem on
HP xw9300 with dual AMD Opteron/dual core.  As of now xend won't even start I
believe because of this.

From /var/log/xend.log:

Xend changeset: Sat Apr  8 12:14:27 2006 +0100 9598:1bce05ff1e52.
Comment 25 Andre Pang 2006-10-13 07:05:02 CDT
I'm also seeing the same problem on a 4-processor Xeon, also using a PIT
platform timer.  The machine was running perfectly for several months, and then
we saw a huge number of timer messages.  The network interfaces of the xenUs
then go offline, occasionally followed by a hard crash of the xen0.  I'm not
sure if the network/crash problems are related to the timer problems, though.

Currently running Fedora Core 5 with kernel 2.6.17-1.2187_FC5xen0 and a
xen.i386 package of 3.0.2-3.FC5.
Comment 26 Andrew Hollingsworth 2006-10-17 14:30:30 CDT
Same issue 2x dual core Intel Xeons. FC5 built from SRPMS changed only with :

CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y

xm info
host                   : xxxxx.eyespot.com
release                : 2.6.17-1.2187_FC5xen0
version                : #1 SMP Tue Oct 10 16:28:30 PDT 2006
machine                : x86_64
nr_cpus                : 4
nr_nodes               : 1
sockets_per_node       : 2
cores_per_socket       : 2
threads_per_core       : 1
cpu_mhz                : 2000
hw_caps                :
00000000:00000000:bfebfbff:20000800:00000000:00000140:0004e33d
total_memory           : 4095
free_memory            : 1
xen_major              : 3
xen_minor              : 0
xen_extra              : -unstable
xen_caps               : xen-3.0-x86_64
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable 
cc_compiler            : gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)
cc_compile_by          : root
cc_compile_domain      : [unknown]
cc_compile_date        : Tue Oct 10 15:42:08 PDT 2006

dmesg extract from dom0
Timer ISR/2: Time went backwards: delta=-29419945 delta_cpu=10580055
shadow=77978342665733 off=19914551 processed=77978392000000
cpu_processed=77978352000000
 0: 77978392000000
 1: 77978352000000
 2: 77978352000000
 3: 77978344000000

xm dmesg | grep -i timer
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Platform timer is 1.193MHz PIT
(XEN) Machine check exception polling timer started.

xm dmesg xdev01 | grep -i timer
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Platform timer is 1.193MHz PIT
(XEN) Machine check exception polling timer started.
Comment 27 miguel cruz 2007-05-08 04:22:42 CDT
Also seeing this using Xen 2.0.7 with AMD Opteron 1212 (dual core), NetBSD 3.1
Dom0.

All Linux DomU machines report hundreds (in some cases thousands) of these
errors per hour.

I cannot provide the output of 'xm dmesg' because that just returns "Error:
Internal Server Error".
Comment 28 Ian Marlier 2007-10-29 16:10:32 CDT
Additionally, this bug is similar to bug 195:
http://bugzilla.xensource.com0

However, that bug was reported against the 2.0 codebase, and I cannot tell
whether it is actually the same issue.
Comment 29 Charlie Wyse 2007-11-26 10:54:46 CST
I am experiencing the same problem on an HP xw9300 using RHEL5.1 with Xen
3.3.3-41 and the 2.6.18-55.el5xen kernel.  
Comment 30 Charlie Wyse 2007-11-27 12:30:47 CST
Just wanted to report that on a HP wx9400 I do not see this problem.  I used
the exact same repos and kickstart file as the wx9300.  Kinda makes me think
hardware related?

wx9300:/proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.162
cache size      : 1024 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni
cmp_legacy
bogomips        : 5486.95
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

wx9400:/proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 2218
stepping        : 2
cpu MHz         : 2600.014
cache size      : 1024 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16
lahf_lm cmp_legacy svm cr8_legacy
bogomips        : 6503.29
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
Comment 31 Charlie Wyse 2007-11-29 16:49:14 CST
Just a note, I managed to resolve this issue on the xw9300 by updating the bios
to v2.09 from v2.05.  I did that after finding this in the RHEL5 release notes:

Installing the Virtualization feature may cause a time went backwards warning
on HP systems with model numbers xw9300 and xw9400.

To work around this issue for xw9400 machines, configure the BIOS settings to
enable the HPET timer. Note that this option is not available on xw9300
machines.

HP will notify xw9300 and xw9400 users when a new BIOS image is available.
Comment 32 Christian Tramnitz 2008-01-07 12:58:48 CST
Same problem here using a Tyan S2877 board (HP's xw9300 is also Tyan based!)
with 2x Opteron 265 on Xen 3.1.4, Gentoo (64bit) dom0.
Comment 33 Prarit Bhargava 2008-03-03 06:27:02 CST
This is what is happening and I am currently working on a solution that will
use epochs for the shadow data structures.

The current Xen timing mechanism uses a common data area (the timer "shadow"
structure) that is written by HV and is read by dom0.

The HV launches four timers that run independent of dom0 (one percpu) that are
responsible for updating the common data area.

Assume (as there is in the case of amd-tyan-08) that there are four cpus {0, 1,
2, 3}.

A timer interrupt happens on cpu0.  The HV checks to see what timers have
expired and determines that the shadow update timer has expired.  The data is
updated.  Additionally, the HV also "passes" the interrupt down to dom0.

dom0 updates the global processed_system_time via a calculation that depends on
the updated shadow data.

A timer interrupt happens on cpu1.  The HV checks to see what timers have
expired, and in this cpu's case the shadow update timer has NOT expired, but
the
HV determines that the interrupt is to be passed down to dom0.

dom0 updates the global processed_system_time via a calculation that depends on
cpu1's *OLD* shadow data.

This causes a miscalculation of time on cpu1.
(Please note in the console output below that I modified the way the debug data
is output...)

(XEN) PRE0: tsc=3039265399 stime=1000034361 master=1000034361
(XEN) CUR0: tsc=5787569479 stime=2059723792 master=1565395832 -> -494327960
(XEN) PRE1: tsc=3067514902 stime=1010927101 master=1010927101
(XEN) CUR1: tsc=5787569035 stime=2059723804 master=1565404213 -> -494319591
(XEN) ---0: ab5080ff 8cf762d1 -2
(XEN) ---1: ab50bfe0 8bb0695b -2
(XEN) PRE3: tsc=3067585753 stime=1010954758 master=1010954758
(XEN) CUR3: tsc=5787697730 stime=2059775568 master=1565446118 -> -494329450
(XEN) ---3: ab5075cd 8bb05584 -2
(XEN) PRE2: tsc=3067551545 stime=1010941349 master=1010941349
(XEN) CUR2: tsc=5787633515 stime=2059749985 master=1565420975 -> -494329010
(XEN) ---2: ab50791b 8baffb7a -2

>>>> XEN updates shadow data structs for cpu 0, 1, 2, 3

[    2.264103] PCI: Setting latency timer of device 0000:00:08.0 to 64
[    2.264140] PCI: Setting latency timer of device 0000:00:09.0 to 64
[    2.264177] PCI: Setting latency timer of device 0000:00:0a.0 to 64
[    2.264213] PCI: Setting latency timer of device 0000:00:0b.0 to 64
[    2.264251] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    2.266344] Real Time Clock Driver v1.12ac
[    2.266371] Non-volatile memory driver v1.2
[    2.266372] Linux agpgart interface v0.101 (c) Dave Jones
[    2.266677] RAMDISK driver initialized: 16 RAM disks of 16384K size 4096
blocksize
[    2.266730] Xen virtual console successfully installed as ttyS0
[    2.266759] Event-channel device installed.
[    2.266762] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[    2.266763] ide: Assuming 33MHz system bus speed for PIO modes; override
with
idebus=xx
[    2.266774] SvrWks HT1000: IDE controller at PCI slot 0000:00:02.1
[    2.266787] SvrWks HT1000: chipset revision 0
[    2.266787] SvrWks HT1000: not 100% native mode: will probe irqs later
[    2.266795]     ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio,
hdb:DMA
[    2.266807] Probing IDE interface ide0...
(XEN) PRE0: tsc=5787569479 stime=2059723792 master=1565395832
(XEN) CUR0: tsc=13051614511 stime=3059710050 master=3871936336 -> 812226286
(XEN) ---0: 00000000 a29314ad -1

>>>> Xen updates shadow data struct for cpu 0, and evaluates
processed_system_time based on new shadow data for cpu 0.

CPU 3, however, has old shadow data and attempts to calculate time.

[    3.055958] Timer ISR/3: Time went backwards
[    3.055960]      now = 13090293974 (in cycles)
[    3.055960]      overflow (0xffffffff) = 585897312
[    3.055961]      overflow (0xffffffff) = 0.585897312
[    3.055962] delta =     0.-812041871
[    3.055963] delta_cpu = 0.59923768
[    3.055964] offset = 0.996189412
[    3.055965] per_cpu shadow structure
[    3.055966] shadow->tsc_timestamp = 5787697730 (in cycles)
[    3.055966] shadow->system_timestamp = 2.59775568
[    3.055968] shadow->tsc_to_nsec_mul = 2343589252
[    3.055968] shadow->tsc_to_usec_mul = 2343589
[    3.055969] shadow->tsc_shift = -2
[    3.055970] shadow->version = 4
[    3.055970] global processed_system_time = 3.868000000
[    3.055971] percpu processed_system_time = 2.996034361
[    3.055973] percpu last_tsc_stamp = 12680092061
[    3.055973] stolen = -186696971
[    3.055974] blocked = 236896752
[    3.055975] other cpus percpu processed_system_time
[    3.055976]  0: 3868000000
[    3.055976]  1: 2536034361
[    3.055977]  2: 2548034361
[    3.055978]  3: 2996034361

>>>> Xen updates cpu 1, 2, 3's shadow data 

(XEN) PRE1: tsc=5787569035 stime=2059723804 master=1565404213
(XEN) CUR1: tsc=13118030068 stime=3059709703 master=3897545212 -> 837835509
(XEN) ---1: 00000000 a2e3bfbc -1
(XEN) PRE3: tsc=5787697730 stime=2059775568 master=1565446118
(XEN) CUR3: tsc=13118076672 stime=3059748102 master=3897562812 -> 837814710
(XEN) ---3: 00000000 a2e3c809 -1
(XEN) PRE2: tsc=5787633515 stime=2059749985 master=1565420975
(XEN) CUR2: tsc=13118197450 stime=3059737919 master=3897609746 -> 837871827
(XEN) ---2: 00000000 a2e4048e -1

>>>> No further errors are seen.
Comment 34 Dominic Battré 2008-05-07 01:27:50 CDT
Is there any workaround for this problem until it is properly fixed? Our DomUs
crash on a daily basis. Thanks for your work!
Comment 35 Stephen Spector 2008-05-07 06:36:14 CDT
Group following this bug:

As this bug was originally created a few years back, I would recommend that you
try and see if it still is occurring with a more current version of Xen. If so,
go ahead and create a new bug in Bugzilla with all the details. I am tracking
all new bugs being reported and will ensure that xen-devel is made aware of the
problem. This report is using an outdated version of Xen so it will likely not
get looked at. Thanks.

Stephen Spector
Xen.org Community Manager
Comment 36 Prarit Bhargava 2008-05-07 06:41:30 CDT
This is _absolutely_ happening on the latest version of Xen.  No changes have
been made to the method of timekeeping in a while.

(I haven't had anytime to look into this further.  Xen timekeeping is horribly
busted IMO -- it's going to take a significant effort to get it right.  Having
said that, there are band-aids we could apply to make the window that this
happens in smaller ...)

P.
Comment 37 Dominic Battré 2008-06-18 08:51:40 CDT
Stephen, can you tell us whether anything happened regarding this issue? I did
not find any thread on xen-devel addressing this. Thanks a lot!
Comment 38 Stephen Spector 2008-06-19 10:50:50 CDT
Updated comment on this bug from Keir:

If this bug is still occurring with up-to-date Xen (basically xen-unstable or a
recent 3.2 release) then please post some specific details about the setup,
workload, and precise symptoms.  

I'll emphasize that we do *not* see this regularly. In fact I don't think we've
seen it in any of our regular tests on a wide range of hardware in a very long
time.  
Comment 39 ps 2008-07-03 20:28:18 CDT
(In reply to comment #38)
> Updated comment on this bug from Keir:
> 
> If this bug is still occurring with up-to-date Xen (basically xen-unstable or a
> recent 3.2 release) then please post some specific details about the setup,
> workload, and precise symptoms.  
> 
> I'll emphasize that we do *not* see this regularly. In fact I don't think we've
> seen it in any of our regular tests on a wide range of hardware in a very long
> time.  
> 

It seems like I have got hit by this bug and I can reproduce it with a
suspend/resume of a domU.
I'm running stock OpenSuse 10.3 so 

# xm dmesg|grep -E "Xen version|timer"
 Xen version 3.1.0_15042-51.3 (abuild@suse.de) (gcc version 4.2.1 (SUSE Linux))
Thu Dec 20 19:57:34 UTC 2007
(XEN) Platform timer is 25.000MHz HPET

The systems in question are two dualcore athlon servers so SMP is involved in
the equation.

I had this issue that the domain locked up before but after changing
clocksource to jiffies at least that problem was resolved.
Now I started to play with xen migration and the showblocker is that after
suspend/resume the clock in domU starts to go all over the place including
backwards. Only way to recover is reboot domU, which defeats the whole purpose
of migration.
Comment 40 The Doctor What 2008-08-08 13:14:25 CDT
This is still a problem as of 2008/8/8:
 * Ubuntu 8.4.1 Hardy Heron
 * Xen 3.2.0
 * Dom0 & DomU kernel: 2.6.24-19-xen #1 SMP Sat Jul 12 00:15:59 UTC 2008 x86_64
GNU/Linux

The workaround for this is to change your setup not to save/restore, but to
instead reboot.
In Ubuntu, this in done by altering dom0:/etc/defaults/xendomains and changing
the value of XENDOMAINS_SAVE to the empty string.

Ciao!
Comment 41 ps 2008-08-08 15:40:40 CDT
(In reply to comment #40)
> This is still a problem as of 2008/8/8:
>  * Ubuntu 8.4.1 Hardy Heron
>  * Xen 3.2.0
>  * Dom0 & DomU kernel: 2.6.24-19-xen #1 SMP Sat Jul 12 00:15:59 UTC 2008 x86_64
> GNU/Linux
> 
> The workaround for this is to change your setup not to save/restore, but to
> instead reboot.
> In Ubuntu, this in done by altering dom0:/etc/defaults/xendomains and changing
> the value of XENDOMAINS_SAVE to the empty string.
> 
> Ciao!
> 

That I can accept when rebooting the host but when I migrate the guest between
hosts it defeat 95% of the migration if I have to restart the guest.
Comment 42 The Doctor What 2008-08-08 18:15:16 CDT
(In reply to comment #41)
> That I can accept when rebooting the host but when I migrate the guest between
> hosts it defeat 95% of the migration if I have to restart the guest.

Yeah, that's true.

I just included a work around because it wasn't mentioned (directly, anyway). 
I figured it out because somebody mentioned that the domU was being suspended
and I had thought that the domU was restarted on reboot of dom0.

I've only had a Xen domain for a few days, so I'm still new. ;-)

Ciao!
Comment 43 Sebastian Muniz 2008-08-28 21:21:12 CDT
Still an issue when rebooting the machine on an Athlon Dual Core.
AMD Athlon(tm) 64 X2 Dual Core Processor 5000+
zillions of log lines on both virtual machines.
root@orion64:/home/seba# xm dmesg
 __  __            _____  ____    _             _
 \ \/ /___ _ __   |___ / |___ \  / |   _ __ ___/ |   _ __  _ __ ___
  \  // _ \ '_ \    |_ \   __) | | |__| '__/ __| |__| '_ \| '__/ _ \
  /  \  __/ | | |  ___) | / __/ _| |__| | | (__| |__| |_) | | |  __/
 /_/\_\___|_| |_| |____(_)_____(_)_|  |_|  \___|_|  | .__/|_|  \___|
                                                    |_|
(XEN) Xen version 3.2.1-rc1-pre (buildd@buildd) (gcc version 4.2.3 (Ubuntu
4.2.3-2ubuntu7)) Fri Apr 11 01:13:51 UTC 2008

Aug 29 03:57:01 mail kernel: [    0.000322] clocksource/0: Time went backwards:
ret=8c54af6f5 delta=-910042363166
6 shadow=89dab1e9d offset=279fd933
Aug 29 03:57:01 mail kernel: [    0.000339] clocksource/0: Time went backwards:
ret=8c54b3b96 delta=-910042361409
7 shadow=89dab1e9d offset=27a01dcc
Aug 29 03:57:01 mail kernel: [    0.000351] clocksource/0: Time went backwards:
ret=8c54b6876 delta=-910042360260
9 shadow=89dab1e9d offset=27a04aae
Aug 29 03:57:01 mail kernel: [    0.001238] Setting mem allocation to 131072
kiB
Aug 29 03:57:01 mail kernel: [    4.301077] printk: 2363 messages suppressed.
Aug 29 03:57:01 mail kernel: [    4.301088] clocksource/0: Time went backwards:
ret=9c7b194cd delta=-909608838485
8 shadow=9c7652874 offset=4c9ff2

Trying the workarround...
Comment 44 Jerry Amundson 2010-05-04 12:41:08 CDT
I've seen this for forever on Dell PE2600 boxen running CentOS 5.x, and note
the hpet_force workaround seems completely ignored. Also, in my current
example, there are no domU's involved whatsoever - just the running dom0 is
enough...

[root@xenC ~]# xm dmesg|grep -E "Xen version|timer"
 Xen version 3.1.2-164.15.1.el5 (mockbuild@centos.org) (gcc version 4.1.2
20080704 (Red Hat 4.1.2-46)) Wed Mar 17 11:29:22 EDT 2010
(XEN) Platform timer overflows in 2 jiffies.
(XEN) Platform timer is 1.193MHz PIT

[root@xenC ~]# xm dmesg
 __  __            _____  _   ____     _  __   _  _    _ ____   _       _ ____  
 \ \/ /___ _ __   |___ / / | |___ \   / |/ /_ | || |  / | ___| / |  ___| | ___| 
  \  // _ \ \047_ \    |_ \ | |   __) |__| | \047_ \| || |_ | |___ \ | | / _ \
|___ \ 
  /  \  __/ | | |  ___) || |_ / __/|__| | (_) |__   _|| |___) || ||  __/ |___)
|
 /_/\_\___|_| |_| |____(_)_(_)_____|  |_|\___/   |_|(_)_|____(_)_(_)___|_|____/ 

 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 3.1.2-164.15.1.el5 (mockbuild@centos.org) (gcc version 4.1.2
20080704 (Red Hat 4.1.2-46)) Wed Mar 17 11:29:22 EDT 2010
 Latest ChangeSet: unavailable

(XEN) Command line: dom0_mem=288M hpet_force=1
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 2 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 00000000000a0000 (usable)
(XEN)  0000000000100000 - 00000000f7fd0000 (usable)
(XEN)  00000000f7fd0000 - 00000000f7fdfc00 (ACPI data)
(XEN)  00000000f7fdfc00 - 00000000f7fff000 (reserved)
(XEN)  00000000fec00000 - 00000000fec90000 (reserved)
(XEN)  00000000fee00000 - 00000000fee10000 (reserved)
(XEN)  00000000ffb00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000108000000 (usable)
(XEN) System RAM: 4095MB (4193728kB)
(XEN) Xen heap: 9MB (9980kB)
(XEN) Domain heap initialised: DMA width 32 bits
(XEN) PAE enabled, limit: 16 GB
(XEN) Processor #0 15:2 APIC version 20
(XEN) Processor #6 15:2 APIC version 20
(XEN) Processor #1 15:2 APIC version 20
(XEN) Processor #7 15:2 APIC version 20
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
(XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
(XEN) IOAPIC[2]: apic_id 10, version 32, address 0xfec81000, GSI 72-95
(XEN) IOAPIC[3]: apic_id 11, version 32, address 0xfec82000, GSI 120-143
(XEN) IOAPIC[4]: apic_id 12, version 32, address 0xfec82800, GSI 144-167
(XEN) Enabling APIC mode:  Flat.  Using 5 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2791.826 MHz processor.
(XEN) I/O virtualisation disabled
(XEN) CPU0: Intel(R) Xeon(TM) CPU 2.80GHz stepping 07
(XEN) Booting processor 1/6 eip 90000
(XEN) CPU1: Intel(R) Xeon(TM) CPU 2.80GHz stepping 07
(XEN) Booting processor 2/1 eip 90000
(XEN) CPU2: Intel(R) Xeon(TM) CPU 2.80GHz stepping 07
(XEN) Booting processor 3/7 eip 90000
(XEN) CPU3: Intel(R) Xeon(TM) CPU 2.80GHz stepping 07
(XEN) Total of 4 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) Platform timer overflows in 2 jiffies.
(XEN) Platform timer is 1.193MHz PIT
(XEN) Brought up 4 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) elf_parse_binary: phdr: paddr=0xc0400000 memsz=0x282708
(XEN) elf_parse_binary: phdr: paddr=0xc0683000 memsz=0x163000
(XEN) elf_parse_binary: memory: 0xc0400000 -> 0xc07e6000
(XEN) elf_xen_parse_note: GUEST_OS = "linux"
(XEN) elf_xen_parse_note: GUEST_VERSION = "2.6"
(XEN) elf_xen_parse_note: XEN_VERSION = "xen-3.0"
(XEN) elf_xen_parse_note: VIRT_BASE = 0xc0000000
(XEN) elf_xen_parse_note: PADDR_OFFSET = 0xc0000000
(XEN) elf_xen_parse_note: ENTRY = 0xc0400000
(XEN) elf_xen_parse_note: HYPERCALL_PAGE = 0xc0401000
(XEN) elf_xen_parse_note: FEATURES =
"writable_page_tables|writable_descriptor_tables|auto_translated_physmap|pae_pgdir_above_4gb|supervisor_mode_kernel"
(XEN) elf_xen_parse_note: PAE_MODE = "yes"
(XEN) elf_xen_parse_note: LOADER = "generic"
(XEN) elf_xen_addr_calc_check: addresses:
(XEN)     virt_base        = 0xc0000000
(XEN)     elf_paddr_offset = 0xc0000000
(XEN)     virt_offset      = 0x0
(XEN)     virt_kstart      = 0xc0400000
(XEN)     virt_kend        = 0xc07e6000
(XEN)     virt_entry       = 0xc0400000
(XEN)  Xen  kernel: 32-bit, PAE, lsb
(XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0xc0400000 -> 0xc07e6000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   000000003c000000->000000003e000000 (65536 pages to be
allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0400000->c07e6000
(XEN)  Init. ramdisk: c07e6000->c0f2c000
(XEN)  Phys-Mach map: c0f2c000->c0f74000
(XEN)  Start info:    c0f74000->c0f7446c
(XEN)  Page tables:   c0f75000->c0f84000
(XEN)  Boot stack:    c0f84000->c0f85000
(XEN)  TOTAL:         c0000000->c1400000
(XEN)  ENTRY ADDRESS: c0400000
(XEN) Dom0 has maximum 4 VCPUs
(XEN) elf_load_binary: phdr 0 at 0xc0400000 -> 0xc0682708
(XEN) elf_load_binary: phdr 1 at 0xc0683000 -> 0xc072dcc4
(XEN) Initrd len 0x746000, start at 0xc07e6000
(XEN) Scrubbing Free RAM: ......................................done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type \047CTRL-a\047 three times to switch input
to Xen).
(XEN) Freed 108kB init memory.