Skip to content
  • John Stultz's avatar
    NTP: correct inconsistent ntp interval/tick_length usage · bbe4d18a
    John Stultz authored
    
    
    I recently noticed on one of my boxes that when synched with an NTP
    server, the drift value reported for the system was ~283ppm. While in
    some cases, clock hardware can be that bad, it struck me as unusual as
    the system was using the acpi_pm clocksource, which is one of the more
    trustworthy and accurate clocksources on x86 hardware.
    
    I brought up another system and let it sync to the same NTP server, and
    I noticed a similar 280some ppm drift.
    
    In looking at the code, I found that the acpi_pm's constant frequency
    was being computed correctly at boot-up, however once the system was up,
    even without the ntp daemon running, the clocksource's frequency was
    being modified by the clocksource_adjust() function.
    
    Digging deeper, I realized that in the code that keeps track of how much
    the clocksource is skewing from the ntp desired time, we were using
    different lengths to establish how long an time interval was.
    
    The clocksource was being setup with the following interval:
    	NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ
    
    While the ntp code was using the tick_length_base value:
    	tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ)
    					/NTP_INTERVAL_FREQ
    
    The subtle difference is:
    	(tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC
    
    This difference in calculation was causing the clocksource correction
    code to apply a correction factor to the clocksource so the two
    intervals were the same, however this results in the actual frequency of
    the clocksource to be made incorrect. I believe this difference would
    affect all clocksources, although to differing degrees depending on the
    clocksource resolution.
    
    The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1,
    so my apologies for the mistake, and for not noticing it until now.
    
    The following patch, corrects the clocksource's initialization code so
    it uses the same interval length as the code in ntp.c. After applying
    this patch, the drift value for the same system went from ~283ppm to
    only 2.635ppm.
    
    I believe this patch to be good, however it does affect all arches and
    I've only tested on x86, so some caution is advised. I do think it would
    be a likely candidate for a stable 2.6.24.x release.
    
    Any thoughts or feedback would be appreciated.
    
    Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    bbe4d18a