From: Jamie Hanrahan [jeh@cmkrnl.com]
Sent: Friday, August 13, 1999 10:59 PM
To: Jan Bottorff; ntdev
Subject: RE: [ntdev] Timekeeping in NT (was: DDK documentation 'hole')

> From: owner-ntdev@atria.com [mailto:owner-ntdev@atria.com]On Behalf
Of
> Jan Bottorff
> Sent: Wednesday, August 11, 1999 18:05

> My view is KeQuerySystemTime() is an abstract TOD function.  Does
anybody
> else see any alternative kernel TOD function? I should not have to
know
> anything about it's underlying architecture to get an accurate TOD
value.
> My expectation of a modern TOD function is it's absolute error
relative to
> UTC may be not so good, but's it's short term relative measurement
should
> be maybe a few microseconds. In 1982, old 8088 DOS boxes kept short
term
> relative time accurate to about 55 milliseconds (18.2 ticks/sec). In
about
> 1986, the 286 based IBM AT improved relative resolution to about 1
> microsecond (you could read back the timer chip current count). Are
we
> saying the relative time keeping ability of current computers
running NT
> has only improved 5x in the last 17 years?

There are basically two ways that NT could get better resolution out
of KeQuerySystemTime.

One:  Run the interval timer faster and add smaller numbers to the
system time on every tick, instead of 100,000 as is done usually.
Hey, wait, you can have that today if you want, at least you can have
a factor of 10 improvement.  Just use the "multimedia" timers from
user mode (timeSetTime and all that), requesting a 1 msec delay.  As
of NT5 you'll be able to make the same tweak from kernel mode by
calling ExSetTimerResolution.  Either way, ALL timed events in the
system (except quantum expiration) get evaluated to the finer
resolution from then on (until the next boot).

But the cost is 1000 interrupts per second from the clock, instead of
just 100.  Since the clock is serviced at a higher IRQL than any
device, this DOES have a measurable impact on interrupt latency on
other devices, even on modern CPUs.  You see, interrupt latency and
the cache-hit costs of servicing interrupts and so on just don't scale
linearly with processor speed.  Modern processors are several hundred
times faster than the first PCs.  That doesn't mean they can handle
even ten times the interrupt rates of the first PCs with impunity.

You seem awfully willing to give away my processor cycles for your
convenience.  Experience  has shown that this isn't a good tradeoff.


Two:  Provide something like the Pentium cycle counter -- a
fine-grained thing that will tell you where you are "within" an OS
tick -- but make it readable without raising MP synch or other
performance issues.  This would not let you implement timed requests
to any better granularity (the system is still only going to check "is
the earliest event in the timer list due?" on every "tick" interrupt)
but would let GetSystemTime and so on return a higher-res time.

Great plan -- but current PC hardware doesn't have such a beast.


In any case, in a multitasking, interrupt-driven OS it does little
good for a simple returned value of "time" to be expressed with a much
finer resolution than the thread scheduling timeslice.  So the system
fetches a value of "time" that's measured down to the microsecond.  So
what?  By the time you get to look at it, many microseconds... or even
tens of milliseconds...  might have elapsed since the time value was
fetched.  So the value seen by the requester won't be accurate, even
if it was accurate when it was read.


> If I ask my Linux box what it thinks about the TOD (by
> running ntptime), it suggests it knows the relative time to about
microsecond
> resolution, and is synced to absolute time (via NTP) with an
estimated error of 132192
> microseconds and a maximum error of 350904 microseconds.

"Suggests" being the operative word here.

First, the ludicrous "error" figures.  An estimated error of 132192
microseconds.  Gee, is the program sure it isn't really 132193?

If those figures were honest they'd be rounded off to two significant
digits.  At most.  They're just the result of averaging a series of
numbers, none of which had anywhere near six significant digits to
begin with.  An "estimated error" figure quoted to six
apparently-significant digits with a "maximum error" of over twice the
"estimated" value is self-evidently ludicrous; the digits past the
first two or so tell us nothing.

(Well, let me take that back. THey DO tell us that the programmer knew
nothing about "accuracy", "resolution", or significant digits.)

If you like, you can have NT sync its clock to an NNTP server as often
as you care to (the tools are in the resource kit).  You can then run
a similar utility on NT and get the same result (bogus "estimated
error" and all).

As for "Knows the relative time to about a microsecond" -- what does
that mean?  Does it mean it can measure elapsed times to the
microsecond?  Well, maybe, but so can NT (KeQueryPerformanceCounter).

But neither result has anything to do with the granularity at which
the operating systems are counting ticks.  Or the value you'll get
from KeSystemTime, or from time() on Linux.

Try doing a "sleep" on your Linux system for 3 microseconds and see
what happens.  Better yet, call time() in a tight loop for a second or
so, recording the results *in memory* (don't write them out anywhere
until the end of the run), and see the "grain" with which the time
advances.

Microseconds?  I don't think so.

> It sounds like NT can't keep track of absolute or relative time any
better than 10
> milliseconds, even if my hardware can.

As I said, the standard NT timekeeping -- without resort to the
Pentium cycle counter, etc. -- can actually get down to millisecond
resolution if a multimedia timer request has been made that requires
that.  As of NT 5, ExSetTimeResolution is exposed to kernel mode, so
this can be done from k-mode also.

But the cost is, of course, 10x the number of timer interrupts per
second.  There is a very good reason that NT doesn't run this way by
default:  Tests have shown that IO performance suffers; interrupt
latency in particular.

> NT also seems unable to tell me anything about the resolution and
absolute
> accuracy of it's TOD function. For the important function of time
keeping,
> Linux seems to be extremely more capable than NT.

Oh?  I quote from the "Linux FUD FAQ"
(http://www1.linkonline.net/rodpad/linux02.html) :

> The basic unit of time in Linux (and most Unix-like systems) is
time_t. This format
> expresses the time as the number of *seconds* [my emphasis - jeh]
since midnight, 1 Jan,
> 1970.

In other words, if we ask the Linux kernel for the time, we're going
to get a time_t, which advances every second.

I think you mean "Linux with an external connection to an nntp server"
will let you find out what time it is wth greater accuracy than will
an NT system without such a connection.  Well, no kidding.  NT with
such a connection will give you the same capability.

But in neither case is the progression of the system time counter, nor
the expiration of timed events, handled with any greater resolution
than without such a connection.

> A few moments of web surfing finds RFC 1589
> (http://andrew2.andrew.cmu.edu/rfc/rfc1589.html), which
> describes just wonderful detail about accurate time keeping on
computers.

Wherein it states quite clearly that Unix kernels run on interval
timers with pretty much the same resolution as NT (remember, NT will
get down to 1 msec if you ask):

> In order to understand how the new software works, it is useful to
review how most Unix
> kernels maintain the system time. In the Unix design a hardware
counter interrupts the
> kernel at a fixed rate: 100 Hz in the SunOS kernel, 256 Hz in the
Ultrix kernel and 1024 Hz
> in the OSF/1 kernel. Since the Ultrix timer interval (reciprocal of
the rate) does not
> evenly divide one second in microseconds, the Ultrix kernel adds 64
microseconds once each
> second, so the timescale consists of 255 advances of 3906 us plus
one of 3970 us.
> Similarly, the OSF/1 kernel adds 576 us once each second, so its
timescale consists of 1023
> advances of 976 us plus one of 1552 us.

And please note: this article does not propose running the interval
timers at any higher rates (and so does not propose that time in these
systems progress at any finer resolution).  Rather, it proposes a
mechanism whereby the amount by which system time is advanced after
each clock interrupt -- or the interrupt rate itself -- can be tweaked
so as to achieve long-term accuracy in TOD despite inaccuracies and
instabilities in the rate at which the clock interrupts.

These techniques are nothing new; VMS has been using them for at least
ten years.  More to the point, they have nothing to do with the
resolution by which the OS reckons time -- only the long-term
accuracy, two parameters that have very little to do with each other.

	--- Jamie Hanrahan, Kernel Mode Systems  ( http://www.cmkrnl.com/ )

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
[ To unsubscribe, send email to ntdev-request@atria.com with body
UNSUBSCRIBE (the subject is ignored). ]
