Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
xen-tscmode(7)			      Xen			xen-tscmode(7)

NAME
       xen-tscmode - Xen TSC (time stamp counter) and timekeeping discussion

OVERVIEW
       As of Xen 4.0, a	new config option called tsc_mode may be specified for
       each domain.  The default for tsc_mode handles the vast majority	of
       hardware	and software environments.  This document is targeted for Xen
       users and administrators	that may need to select	a non-default
       tsc_mode.

       Proper selection	of tsc_mode depends on an understanding	not only of
       the guest operating system (OS),	but also of the	application set	that
       will ever run on	this guest OS.	This is	because	tsc_mode applies
       equally to both the OS and ALL apps that	are running on this domain,
       now or in the future.

       Key questions to	be answered for	the OS and/or each application are:

       o   Does	the OS/app use the rdtsc instruction at	all?  (We will explain
	   below how to	determine this.)

       o   At what frequency is	the rdtsc instruction executed by either the
	   OS or any running apps?  If the sum exceeds about 10,000 rdtsc
	   instructions	per second per processor, we call this a "high-TSC-
	   frequency" OS/app/environment.  (This is relatively rare, and
	   developers of OS's and apps that are	high-TSC-frequency are usually
	   aware of it.)

       o   If the OS/app does use rdtsc, will it behave	incorrectly if "time
	   goes	backwards" or if the frequency of the TSC suddenly changes?
	   If so, we call this a "TSC-sensitive" app or	OS; otherwise it is
	   "TSC-resilient".

       This last is the	US$64,000 question as it may be	very difficult (or,
       for legacy apps,	even impossible) to predict all	possible failure
       cases.  As a result, unless proven otherwise, any app that uses rdtsc
       must be assumed to be TSC-sensitive and,	as we will see,	this is	the
       default starting	in Xen 4.0.

       Xen's new tsc_mode parameter determines the circumstances under which
       the family of rdtsc instructions	are executed "natively"	vs emulated.
       Roughly speaking, native	means rdtsc is fast but	TSC-sensitive apps
       may, under unpredictable	circumstances, run incorrectly;	emulated means
       there is	some performance degradation (unobservable in most cases), but
       TSC-sensitive apps will always run correctly.  Prior to Xen 4.0,	all
       rdtsc instructions were native: "fast but potentially incorrect."
       Starting	at Xen 4.0, the	default	is that	all rdtsc instructions are
       "correct	but potentially	slow".	The tsc_mode parameter in 4.0 provides
       an intelligent default but allows system	administrator's	to adjust how
       rdtsc instructions are executed differently for different domains.

       The non-default choices for tsc_mode are:

       o   tsc_mode=1 (always emulate).

	   All rdtsc instructions are emulated;	this is	the best choice	when
	   TSC-sensitive apps are running and it is necessary to understand
	   worst-case performance degradation for a specific hardware
	   environment.

       o   tsc_mode=2 (never emulate).

	   This	is the same as prior to	Xen 4.0	and is the best	choice if it
	   is certain that all apps running in this VM are TSC-resilient and
	   highest performance is required.

       o   tsc_mode=3 (PVRDTSCP).

	   This	mode has been removed.

       If tsc_mode is left unspecified (or set to tsc_mode=0), a hybrid
       algorithm is utilized to	ensure correctness while providing the best
       performance possible given:

       o   the requirement of correctness,

       o   the underlying hardware, and

       o   whether or not the VM has been saved/restored/migrated

       To understand this in more detail, the rest of this document must be
       read.

DETERMINING RDTSC FREQUENCY
       To determine the	frequency of rdtsc instructions	that are emulated, an
       "xl" command can	be used	by a privileged	user of	domain0.  The command:

	   # xl	debug-key s; xl	dmesg |	tail

       provides	information about TSC usage in each domain where TSC emulation
       is currently enabled.

TSC HISTORY
       To understand tsc_mode completely, some background on TSC is required:

       The x86 "timestamp counter", or TSC, is a 64-bit	register on each
       processor that increases	monotonically.	Historically, TSC incremented
       every processor cycle, but on recent processors,	it increases at	a
       constant	rate even if the processor changes frequency (for example, to
       reduce processor	power usage).  TSC is known by x86 programmers as the
       fastest,	highest-precision measurement of the passage of	time so	it is
       often used as a foundation for performance monitoring.  And since it is
       guaranteed to be	monotonically increasing and, at 64 bits, is
       guaranteed to not wraparound within 10 years, it	is sometimes used as a
       random number or	a unique sequence identifier, such as to stamp
       transactions so they can	be replayed in a specific order.

       On most older SMP and early multi-core machines,	TSC was	not
       synchronized between processors.	 Thus if an application	were to	read
       the TSC on one processor, then was moved	by the OS to another
       processor, then read TSC	again, it might	appear that "time went
       backwards".  This loss of monotonicity resulted in many obscure
       application bugs	when TSC-sensitive apps	were ported from a
       uniprocessor to an SMP environment; as a	result,	many applications --
       especially in the Windows world -- removed their	dependency on TSC and
       replaced	their timestamp	needs with OS-specific functions, losing both
       performance and precision. On some more recent generations of multi-
       core machines, especially multi-socket multi-core machines, the TSC was
       synchronized but	if one processor were to enter certain low-power
       states, its TSC would stop, destroying the synchrony and	again causing
       obscure bugs.  This reinforced decisions	to avoid use of	TSC
       altogether.  On the most	recent generations of multi-core machines,
       however,	synchronization	is provided across all processors in all power
       states, even on multi-socket machines, and provide a flag that
       indicates that TSC is synchronized and "invariant".  Thus TSC is	once
       again useful for	applications, and even newer operating systems are
       using and depending upon	TSC for	critical timekeeping tasks when
       running on these	recent machines.

       We will refer to	hardware that ensures TSC is both synchronized and
       invariant as "TSC-safe" and any hardware	on which TSC is	not (or	may
       not remain) synchronized	as "TSC-unsafe".

       As a result of TSC's sordid history, two	classes	of applications	use
       TSC: old	applications designed for single processors, and the most
       recent enterprise applications which require high-frequency high-
       precision timestamping.

       We will refer to	apps that might	break if running on a TSC-unsafe
       machine as "TSC-sensitive"; apps	that don't use TSC, or do use TSC but
       use it in a way that monotonicity and frequency invariance are
       unimportant as "TSC-resilient".

       The emergence of	virtualization once again complicates the usage	of
       TSC.  When features such	as save/restore	or live	migration are
       employed, a guest OS and	all its	currently running applications may be
       invisibly transported to	an entirely different physical machine.	 While
       TSC may be "safe" on one	machine, it is essentially impossible to
       precisely synchronize TSC across	a data center or even a	pool of
       machines.  As a result, when run	in a virtualized environment, rare and
       obscure "time going backwards" problems might once again	occur for
       those TSC-sensitive applications.  Worse, if a guest OS moves from, for
       example,	a 3GHz machine to a 1.5GHz machine, attempts by	an OS/app to
       measure time intervals with TSC may without notice be incorrect by a
       factor of two.

       The rdtsc (read timestamp counter) instruction is used to read the TSC
       register.  The rdtscp instruction is a variant of rdtsc on recent
       processors.  We refer to	these together as the rdtsc family of
       instructions, or	just "rdtsc".  Instructions in the rdtsc family	are
       non-privileged, but privileged software may set a cpuid bit to cause
       all rdtsc family	instructions to	trap.  This trap can be	detected by
       Xen, which can then transparently "emulate" the results of the rdtsc
       instruction and return control to the code following the	rdtsc
       instruction.

       To provide a "safe" TSC,	i.e. to	ensure both TSC	monotonicity and a
       fixed rate, Xen provides	rdtsc emulation	whenever necessary or when
       explicitly specified by a per-VM	configuration option.  TSC emulation
       is relatively slow -- roughly 15-20 times slower	than the rdtsc
       instruction when	executed natively.  However, except when an OS or
       application uses	the rdtsc instruction at a high	frequency (e.g.	more
       than about 10,000 times per second per processor), this performance
       degradation is not noticeable (i.e. <0.3%).  And, TSC emulation is
       nearly always faster than OS-provided alternatives (e.g.	Linux's
       gettimeofday).  For environments	where it is certain that all apps are
       TSC-resilient (e.g.  "TSC-safeness" is not necessary) and highest
       performance is a	requirement, TSC emulation may be entirely disabled
       (tsc_mode==2).

       The default mode	(tsc_mode==0) checks TSC-safeness of the underlying
       hardware	on which the virtual machine is	launched.  If it is TSC-safe,
       rdtsc will execute at hardware speed; if	it is not, rdtsc will be
       emulated.  Once a virtual machine is save/restored or migrated,
       however,	there are two possibilities: TSC remains native	IF the source
       physical	machine	and target physical machine have the same TSC
       frequency (or, for HVM/PVH guests, if TSC scaling support is
       available); else	TSC is emulated.  Note that, though emulated, the
       "apparent" TSC frequency	will be	the TSC	frequency of the initial
       physical	machine, even after migration.

       Finally,	tsc_mode==1 always enables TSC emulation, regardless of	the
       underlying physical hardware. The "apparent" TSC	frequency will be the
       TSC frequency of	the initial physical machine, even after migration.
       This mode is useful to measure any performance degradation that might
       be encountered by a tsc_mode==0 domain after migration occurs, or a
       tsc_mode==3 domain when it is running on	TSC-unsafe hardware.

       Note that while Xen ensures that	an emulated TSC	is "safe" across
       migration, it does not ensure that it continues to tick at the same
       rate during the actual migration.  As an	oversimplified example,	if TSC
       is ticking once per second in a guest, and the guest is saved when the
       TSC is 1000, then restored 30 seconds later, TSC	is only	guaranteed to
       be greater than or equal	to 1001, not precisely 1030.  This has some OS
       implications as will be seen in the next	section.

TSC INVARIANT BIT and NO_MIGRATE
       Related to TSC emulation, the "TSC Invariant" bit is architecturally
       defined in a cpuid bit on the most recent x86 processors.  If set, TSC
       invariance ensures that the TSC is "safe", that is it will increment at
       a constant rate regardless of power events, will	be synchronized	across
       all processors, and was properly	initialized to zero on all processors
       at boot-time by system hardware/BIOS.  As long as system	software never
       writes to TSC, TSC will be safe and continuously	incremented at a fixed
       rate and	thus can be used as a system "clocksource".

       This bit	is used	by some	OS's, and specifically by Linux	starting with
       version 2.6.30(?), to select TSC	as a system clocksource.  Once
       selected, TSC remains the Linux system clocksource unless manually
       overridden.  In a virtualized environment, since	it is not possible to
       synchronize TSC across all the machines in a pool or data center, a
       migration may "break" TSC as a usable clocksource; while	time will not
       go backwards, it	may not	track wallclock	time well enough to avoid
       certain time-sensitive consequences.  As	a result, Xen can only expose
       the TSC Invariant bit to	a guest	OS if it is certain that the domain
       will never migrate.  As of Xen 4.0, the "no_migrate=1" VM configuration
       option may be specified to disable migration.  If no_migrate is
       selected	and the	VM is running on a physical machine with "TSC
       Invariant", Linux 2.6.30+ will safely use TSC as	the system
       clocksource.  But, attempts to migrate or, once saved, restore this
       domain will fail.

       There is	another	cpuid-related complication: The	x86 cpuid instruction
       is non-privileged.  HVM domains are configured to always	trap this
       instruction to Xen, where Xen can "filter" the result.  In a PV OS, all
       cpuid instructions have been replaced by	a paravirtualized equivalent
       of the cpuid instruction	("pvcpuid") and	also trap to Xen.  But apps in
       a PV guest that use a cpuid instruction execute it directly, without a
       trap to Xen.  As	a result, an app may directly examine the physical TSC
       Invariant cpuid bit and make decisions based on that bit.

HARDWARE TSC SCALING
       Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read by
       guest rdtsc/p increasing	in a different frequency than the host TSC
       frequency.

       If a HVM	container in default TSC mode (tsc_mode=0) is created on a
       host that provides constant TSC,	its guest TSC frequency	will be	the
       same as the host. If it is later	migrated to another host that provides
       constant	TSC and	supports Intel VMX TSC scaling/AMD SVM TSC ratio, its
       guest TSC frequency will	be the same before and after migration.

       For above HVM container in default TSC mode (tsc_mode=0), if above
       hosts support rdtscp, both guest	rdtsc and rdtscp instructions will be
       executed	natively before	and after migration.

AUTHORS
       Dan Magenheimer <dan.magenheimer@oracle.com>

4.14.0				  2020-07-23			xen-tscmode(7)

NAME | OVERVIEW | DETERMINING RDTSC FREQUENCY | TSC HISTORY | TSC INVARIANT BIT and NO_MIGRATE | HARDWARE TSC SCALING | AUTHORS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=xen-tscmode&sektion=7&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help