20150221 – Time in a changing network

One of the most challenging aspects of programming, is making programs actually usable for people, because the devil is in the details and the details are hard to get right.

One such detail is finding the IP numbers of the servers we should talk to and decide what to do about them.

Lets start ntimed-client:

ntimed-client ntp-east.example.com ntp-west.example.com xx.pool.ntp.org

ntimed-client calls getaddrinfo(3) to resolve the names and we get:

ntp-east.example.com
        192.168.1.123
        1.2.3.5
        1111:2222:3333::5555
ntp-west.example.com
        192.168.1.123
        1.2.3.4
        1111:2222:3333::4444
xx.pool.ntp.org
        2.0.0.1
        2.0.0.2
        2.0.0.3
        2222:0000:0000::2222
        2222:0000:0000::3333
        2222:0000:0000::4444

Now, anybody want to hazard a guess to how many NTP servers hide behind those twelve IP numbers ? Any answer from six to twelve can make sense.

When we check again later, we will get different DNS answers, how do we integrate them with the previous answers ?

Ohh, and if this is a laptop or VM image, it’s reachability may change dramatically during suspend/resume events, so that we should maybe prefer the IPv6 to the IPv4 address for a particular server.

Now you know why I havn’t posted anything for a couple of weeks.

I think I have come up with a heuristic which will Do The Right Thing without causing undue harm, this blog entry is an attempt to see if I can also explain it.

Current Heuristic

During startup, we perform some simple sanity checks, for instance it makes no sense to receive the same argument more than once. We treat that as a startup failure.

Any arguments which successfully resolve to no ip addresses also cause a startup failure.

(I’m not quite decided what to do on resolver failure during startup. When systems boot this is a common transient state until interfaces and routing stabilizes, so some kind of “hang on and try again” behaviour would make sense.)

Next we create a “ntp_group” for each argument, and create “ntp_peer” instance for all the addresses returned by getaddrinfo(3).

Any ntp_peer that has the same IP number as another peer, is marked as a duplicate and put to the side. (ie: 192.168.1.123 in the example above.)

Now we have a brutto list of plausible servers.

To manage transitions in and out of the groups as DNS changes over time, we give each ntp_peer a shift-register called “present”, and call getaddrinfo(3) again periodically.

If the call to getaddrinfo(3) succeeds, we shift “present” to the left for all peers in the group.

For each IP number getaddrinfo(3) returned, we set the low bit in the “present” register for that peer, and we create new peers for new IP numbers.

Any peers with a zero “present” register gets removed.

The effect of this is to retain a memory of the old servers for some period of time and to tide over DNS server trouble, which means less traffic in and out of the set of plausible servers.

(There are nitty gritty details behind this, for instance if we delete a peer which has displaced other peers because they all have the same IP#, we don’t want to loose the state, but move that “good” version of the peer to one of the other groups it appears in.)

Now we poll all the servers and consider the results.

First we look for evidence that different IP numbers reach the same NTP server by looking for replies which has identical stratum, refid and reference timestamp.

(There is room for false positives here, but only if vendors of stratum 1 reference clocks do something stupid with their reference timestamps, and so far my collected data does not indicate that to be the case.)

If we detect multihomed servers, we keep only the one with the lowest round-trip-time, putting the “mirages” off to the side.

We now have a netto list of plausible servers which we hand over to the NTP filtering and clock combining modules, which will in turn tell us which subset of peers should be continuously polled for active timekeeping.

pool.ntp.org

The NTP Pool is special, in that it deliberately and rapidly rolls through the available servers in each pool in the DNS replies, and it may require special casing.

Given the heuristic above, each client would end up keeping track of all servers in a configured pool, and pick the best one(s) seen from its network location.

This may not be what we want. Any really good and well connected server in a pool would attract many more clients than any less well endowed servers, and it would become impossible to balance the load over the servers in the pool.

I don’t know of any other setup like pool.ntp.org, nor any other configuration of NTP servers which might need similar special casing, so I am awfully tempted to simply give all domains which end in “pool.ntp.org” special treatment, but this is still TBD.

phk