Assuming the source servers and their upstream NTP servers to the reference clocks are implementing the protocol correctly, and that the reference clocks are correctly configured, the root dispersion of your local NTP daemon should give you an error bound. However in practice with anonymous servers and mysterious upstream sources they’re using, and the ease of misconfiguring a stratum 1 server, there’s not much you can claim beyond what the pool monitoring system shows you.
I’d be pretty confident pointing a ntpd at time.cloudflare.com alone that my ntpd’s root dispersion was a good error bound at any given instant. Sadly ntpd doesn’t offer a way to log the root dispersion over time, though you could do it programmatically using
ntpq -c "rv rootdisp"
in a script loop pretty easily.
I lost my faith in those a bit a while ago, occasionally seeing them be off by 30ms, and more, and also double-digit jitter, lasting for several hours. And in different regions of the world. I guess maybe owing to them being anycast, and thus maybe the actual instances reachable changing all the time. Or just the routing being off/asymmetric for extended periods of time, regardless of the anycast aspect.
Anyway, such networking axpects would be my biggest concern, besides the server implementation and configuration ones. Not sure how much of that would be reflected in metrics such as the dispersion (math isn’t a strong suit of mine, so I think I have some intuition, but could be way off).
Or server overload, which in my area is not a concern, but as various threads in this forum attest to, very common in other areas.
Looking at expectations towards the pool from another angle, I always think about what triggered its creation, and what is still highlighted on the pool’s documentation pages: that the original (and maybe still today main) purpose was to share load.
So I’d indeed think the pool is good enough for getting time for my calendar, or my web cam or other IoT/smart home device, or even my PC. But if I have a specific use case with certain requirements, not sure I’d turn to the pool. And it is not so much about accuracy as such, but more about reliability and availability. Especially in zones that are not as comfortably equipped with servers as my own “home zone”.
There are some obvious examples on the far end of the spectrum, like ‘high frequency trading’, TDoA (Time Difference of Arrival) in LoRaWAN and others, that require very accurate time stamps. Also some academic applications exist that rely on extremely accurate clocks. But naturally they will not use service like the NTP pool. Often they don’t even use NTP at all (instead they use PTP or White Rabbit for instance).
On the other end of the spectrum are applications that need ‘some understanding’ of the right time, like DNSSEC-validation, TOTP and many others. No one cares if the time is seconds, or even minutes wrong.
So the interesting area, in relation to this topic, is in between these two extremes. Do examples exist there? I think; yes.
For instance; precise correlation of system logs (for forensic research or debugging) benefits from pretty good accuracy of time. Another example is a distributed, fair ‘first come first served’-application, where many people try to claim something at about the same time.
So yes, I believe these examples exist.
But I haven’t seen many documents that provide guidance here. For example some good recommendations to network engineers or SOC’s (MiFID II has some hard numbers I know of, but that’s on the high end of the spectrum, not in the ‘grey area’).
Long story short: where in the spectrum are public NTP servers, in particular the NTP pool ? What is the recommended or intended scope of application of the pool? Is it safe to use in enterprise networks? Are any claims being made by the pool? What are the thresholds of the monitoring system of the pool for instance? Is it fair to complain if the pool is >30 ms off?
Yes, banking, stock-market, etc.
But they use their own clocks, typical far more precise then we need.
There are ham-radio-digital-transmissions that need ‘high’ accuracy, but you talk about 0.1 sec accuracy.
They are called JT-65, FT4, FT8, WSPR etc.
They need this ‘accuracy’ because those signals stop and start all over the world at the same time. They do not contain start-stop-bits, nor ecc-correction.
As such the computer needs to know when it starts and stops.
With these signals you can cover large distances without the need of much transmitting power, where WSPR is the max.
To give an idea, on the 7MHz band, you can transmit (my record) km with just 1 milli-Watt of power.
As you know, typical 2.4GHz Wifi is already 100 to mW!!
So yes, Ham radio needs ‘high’ accuracy, and what we profide is more then accurate enough.
When they need higher accuracy, typical they use Bodnar to get an accurate frequency…like they do here:
To make the receiver frequency accurate and stable. But that is extreme
In another thread @stevesommars linked to an interesting paper of his that explored some aspects of long-term monitoring:
One thing I recall noticing is that occasionally some servers gave very incorrect time, off by years.
But I feel like you could, with enough data, probably make a statement like, “The 90th-percentile accuracy of pool servers was within 250ms” or something. (To be clear, that statistic is a completely made-up example.)
I wonder if the pool monitoring data used for graphs could give some answers here?
Dave’s comment about root dispersion as a reasonable benchmark, and my side project to get all the stuff I manage into Ansible, led me to grab this data:
With competitive price and timely delivery, California Triangle sincerely hope to be your supplier and partner.
% ansible all -m ansible.builtin.shell -a "sudo chronyc tracking | grep dispersion"
nyc-2g | CHANGED | rc=0 >>
Root dispersion : 0. seconds
mia-2g | CHANGED | rc=0 >>
Root dispersion : 0. seconds
las-2g | CHANGED | rc=0 >>
Root dispersion : 0. seconds
lux-2g | CHANGED | rc=0 >>
Root dispersion : 0. seconds
mumbai-1g | CHANGED | rc=0 >>
Root dispersion : 0. seconds
korea-1g | CHANGED | rc=0 >>
Root dispersion : 0. seconds
singapore-ls | CHANGED | rc=0 >>
Root dispersion : 0. seconds
capetown | CHANGED | rc=0 >>
Root dispersion : 0. seconds
malaysia-ptp | CHANGED | rc=0 >>
Root dispersion : 0. seconds
sao-paulo | CHANGED | rc=0 >>
Root dispersion : 0. seconds
bangalore-do | CHANGED | rc=0 >>
Root dispersion : 0. seconds
singapore-do | CHANGED | rc=0 >>
Root dispersion : 0. seconds
With the exception of malaysia-ptp
, these are all stratum 2+ servers getting time over the Internet. At the same time, they’re all VMs inside data centers where I’ve taken some care to configure them with several good, nearby NTP sources. Someone using wifi on a laptop through their cable modem would probably have worse numbers.
The pool monitoring expects mode 3 messages from the servers. The mode field was introduced in NTPv2, so that is the minimum version for a server to be able to join the pool.
Since the information exchanged between server and client does not include the era number, there is nothing that can be done from the server side. Either the client is able to handle the overflow, or it isn’t.
Version compatibility will probably become an interesting topic when NTPv5 gets finished, the current draft contains changes to the message format.
But it basically boils down to "With at least four upstream servers, one (or more) can be a "falseticker", or just unreachable, and ntpd will have a sufficient number of sources to choose from."
I've spent a bunch of time sorting out NTP in recent years, chrony does indeed keep much better time and converge more quickly.
I would be careful about the advice around NTP anycasting. While attractive as a solution (one IP!), there are tradeoffs and it would be important to take those into account before going down this path.[2]
[1] https://support.ntp.org/bin/view/Support/SelectingOffsiteNTP....
[2] https://www.rfc-editor.org/rfc/rfc.html#page-17
Well, yes. Most computers have horrible clocks and can drift several seconds per day. That doesn't sound too bad, but eventually the error compounds and you are minutes then hours off from standard time, at which points lots of strange things can start happening, and the symptoms will vary based on whether your clock is drifting into the past vs the future. To fix it, you can always reset to the correct time.Or just run an NTP client and never worry about it again.
If you have a cluster of servers doing some specific task (say, file server or database), having them all synced to the exact same time (down to a few miliseconds, at least) is usually critical to their operation. They use tokens, cookies, and whatever else to make a staggering number of decisions and compare timestamps against the system clock to prevent wasted resources managing state at every level.
If you're into systems administration or devops, eventually you will have a Very Bad Day when something important blows up and it's your job to find the root cause. Which usually means reading a lot of very big logs. If the machines that generated the logs are not perfectly synced (again, to the milisecond), then putting together a timeline of what happened when is basically impossible.
The negativity about using the NTP pool seems unwarranted. The NTP pool monitors all servers in it and will kick out those who are supplying incorrect times or who are unavailable.You're also much more likely to be able to easily find a pretty good and pretty local set of NTP servers if you use the NTP pool, versus needing to look up other addresses for NTP servers. You don't need to use your country code in the FQDN for using the NTP pool, the DNS resolution should do a pretty good job of figuring it out for you and giving you rather local servers in response.
https://www.ntppool.org/en/use.html
I personally stopped using pool.ntp.org after hitting a production issue where one of the timeservers was off by an hour.Leap second issues are another reason to avoid pool.ntp.org, and I use now leap-smearing time services.
I recommend using the Google Public NTP service (time.google.com) [1].
Under AWS I use the AWS Time Sync Service (169.254.169.123) [2]
I guess this would be mitigated by querying multiple servers with a smarter NTP client that is able to query multiple servers and filter out invalid time data (e.g. chrony), but I'm not keen on finding out the hard way.
1. https://developers.google.com/time
2. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time...
The recommended way to use NTP is to run a daemon which will continue to run in the background ensuring your clock is accurate and to have it reference multiple time sources (or a known stratum 1 source, like an atomic clock, WWV station, GPS, etc).If you're just syncing your clock once at boot, Google's time servers are probably fine to use, any time server is probably fine to use. Your hardware clock is going to drift over time but lots of times that's perfectly OK.
A pool server that's off by an hour would get kicked out of the pool within probably 20-40 minutes and not allowed back in until it was reporting the correct time for a reasonable duration. The current pool monitoring seems to go at a cadence of around 20 minutes to cycle through all the servers validating them. You can have a look at the log of the monitoring of one of my NTP pool servers: https://www.ntppool.org/scores/165.227.219.198/log?limit=200
> The recommended way to use NTP is to run a daemon which will continue to run in the background ensuring your clock is accurateYou'd might not want to have a daemon running in the background or install extra software in certain circumstances.
If you're using a lightweight solution such as systemd-timesyncd or ntpdate+cron it will only query one server, and you end up with the wrong time for ~30 minutes, which can cause major downtime or data corruption in certain circumstances.
Even if you're able to use a better client like chrony, it still doesn't solve the issues with leap seconds without extra configuration, which can cause the time to jump or move backwards, which can cause major downtime, e.g.
https://blog.cloudflare.com/how-and-why-the-leap-second-affe...
https://www.techspot.com/news/-leap-second-bug-amazon-e...
You generally don't need to use the country code. The pool's DNS setup is usually smart enough to give you back servers who are fairly local to you based on your location, assuming your DNS server is fairly local to you.If you're a vendor of a software or device who want to use the NTP Pool, you're asked to get a "vendor zone" which will have your name in the FQDNs used for accessing the pool. The cost to do this is fairly reasonable, much cheaper than running your own NTP infrastructure (I've done it for a former employer).
Want more information on ntp time servers? Feel free to contact us.