NSLU2-Linux
view · edit · print · history

Please note that this document is interesting, but partially out of date. The underlying timing issue has been completely fixed in Openslug 2.6 Beta and higher, worked around in Openslug 2.5 beta, and partially fixed in Unslung 5.5.

Comment [Lee] Which parts of this document still apply to Unslung 5.5 given that the problem has been partially fixed? Which parts have become obsolete?

--Lee

Comment [Bear] Why are you peering with the upstream NTP server? Peering is used by systems at the same stratum, e.g., you're in a large office and each department has its own NTP server feed by a different upstream time server. You could use peering among them to ensure that everyone's clock is synchronized even if one or more upstream servers go insane.

A slug should probably use 'server' upstream servers, not peers. Unless you're a stratum-1 server, of course. That may actually be a good project! Stratum-1 servers get their time from GPS units or WWV radio and aren't that expensive -- a few hundred bucks.

--Bear

When configured properly, a slug can act as an effective NTP server for time synchronization on your LAN. However, as various people have observed, the ntp package generally does not work.

The first problem is that sometimes ntp will set the clock backwards- if your clock is 10 seconds off, then trying to correct it will double the error to 20 seconds instead of correcting the time. This is apparently due to a bug in the floating point emulation code in the kernel. This problem has been fixed in Unslung 3.17 and higher.

The second problem is what several people have observed - that no matter how long you leave ntpd running, it will never peer with another server. The problem here is that ntpd, by design, cannot correct for an error of more than about 500 parts per million (ppm). The default configuration of the slug has an error of about 10000 ppm, which is twenty-fold too much error for ntpd to correct.

The cause of the slug's terrible timekeeping is actually a kernel value that controls how many microseconds are added to the clock for every clock interrupt- this value is referred to as tick. The slug comes configured with a value of 9903 when it boots. However, if you measure the clock drift of the slug and calculate the correction, using for example the instructions at this page then you will find that the correct value should be 10000. Interestingly enough, 10000 is also the value on every other Linux box I have available to test, so why the slug ships with a value that is so clearly wrong is beyond me. Can anyone else explain this? Why didn't Linksys fix this in their firmware instead of implementing the hwclock kludge? (The answer is simple incompetence - they fixed it, but the wrong way so doubled their problem. The problem has, however been fixed in the latest iterations of the Openslug 2.6-series NSLU2 kernel, coincidentally included in Openslug 2.6 and higher)

In any event, to make this correction permanently, we will edit the /opt/etc/init.d/S77ntp script. You should have the ntp package installed at this point, but not running. If it is running, issue a killall ntpd to get rid of it.

Edit /opt/etc/init.d/S77ntp and add a line to call tickadj with an argument of 10000. For example, my script looks like the following:

 
#!/bin/sh

if [ -n "`pidof ntpd`" ]; then 
    /bin/killall ntpd 2>/dev/null
fi

if [ ! -d /var/spool/ntp ] ;  then
    mkdir -p /var/spool/ntp
fi

# correct the incorrect tick value on the slug before starting ntpd!
/opt/bin/tickadj 10000

/opt/bin/ntpd \
  -c /opt/etc/ntp/ntp.conf \
  -f /var/spool/ntp/ntp.drift \
  -s /var/spool/ntp \
  -k /opt/etc/ntp \
  -l /var/spool/ntp/ntp.log

This will ensure that the tick value is set properly before ntpd loads.

You will probably also want to ensure that time gets synchronized on boot. The best way I have found to do this is to divert rc.rstimezone. This is the diversion script I use:

 
#!/bin/sh
# Diversion script to get control of time

# Extract the GUI timezone from the .conf file
# Copy the corresponding /usr/zoneinfo file over /usr/local/localtime
/usr/sbin/Set_TimeZone >/dev/null

# Do an initial clock set
/opt/bin/ntpd \
  -q \
  -c /opt/etc/ntp/ntp.conf \
  -f /var/spool/ntp/ntp.drift \
  -s /var/spool/ntp \
  -k /opt/etc/ntp \
  -l /var/spool/ntp/ntp.log

# Do not execute the Linksys script
return 0

# EOF - include this line

Essentially, this calls ntpd with exactly the same arguments as the S77ntp startup script, but adds a -q which tells ntpd to quickly set the time, then get out. (Note: This may warrant some testing, are there any ways that this can freeze up the boot process under normal circumstances? If this does mess anyone up, follow the normal procedure of booting your slug with no disks, telnet in, connect the disks, then remove the diversion, to fix the problem)

Note that this does require DNS to be functioning at the time this diversion runs- if, like me, you have your slug running BIND and acting as its own nameserver, then you will find that this breaks/freezes since ntpd can't resolve the names in ntp.conf yet- BIND won't start until later. In that case you may want to skip this step. It should be OK to skip it most of the time, since the hardware clock keeps half decent time and ntpd will (should?) keep it updated while the slug is running. I'm open to better ways to deal with this, though.


Note:
The initial clock setting did not work in my case - I ended up with a system time of January 1st, 1970 - which of course is far beyond what can be corrected by the ntpd.
I replaced the line in the diversion script beginning with
/opt/bin/ntpd -q -c /opt/etc...
by
/opt/bin/ntpdate -b 2.de.pool.ntp.org >/dev/null 2>&1
to initially set the system time from my regional timeserver - this seemed to work.

 -- nbehrent

You will also want to make sure that you remove anything in /etc/crontab that will affect the time, such as the Linksys kludge to call hwclock. I find that nothing will normally touch the /etc/crontab file, so you can probably get away with editing it directly to remove the offending lines. Alternately, the better way to edit it is with a diversion script to make sure this sticks across a firmware upgrade.

For example, my /etc/crontab looks like this, but your mileage may vary if you have other tasks scheduled. Note that the hwclock line is commented out:

 
SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=""
HOME=/
# ---------- ---------- Default is Empty ---------- ---------- #
#0 0-23/8 * * * root /usr/sbin/CheckDiskFull &>/dev/null
0 0 * * * root /usr/sbin/WatchDog &>/dev/null
#1 * * * * root /usr/sbin/hwclock -s &>/dev/null

As for the default /opt/etc/ntp/ntp.conf file, it is just fine for a simple NTP server, so unless you have a need to add security to the server or to add additional time sources, you should be good to go. One thing you might look into in the long run though is editing the configuration so it uses local servers. For example, I'm in Canada, so I use the following in my ntp.conf:

 
# /etc/ntp.conf, configuration for ntpd

server 0.ca.pool.ntp.org
server 1.ca.pool.ntp.org
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org

This pulls two servers from Canada and two from the USA. This way I don't get servers from Europe or otherwise, which will have high network latency and less usefulness for synchronization.

The last thing to do before you can start ntpd is to make sure that the clock on your slug has a nearly-correct time. You could set it by hand with the date command, or using the web interface. Perhaps simpler is to issue the command /opt/bin/ntpd -q -c /opt/etc/ntp/ntp.conf -f /var/spool/ntp/ntp.drift -s /var/spool/ntp -k /opt/etc/ntp -l /var/spool/ntp/ntp.log as discussed above. However you do it, you want to make sure it's nearly-correct or else ntpd could take a very long time to amortize the error.

At this point, you can start the server by issuing /opt/etc/init.d/S77ntp

If everything goes well, you should find that after about 15-20 minutes, ntpd has selected a server for peering. You can verify this by running ntpq and typing pe. If a server has been selected for peering, it will have an asterisk next to it. My output, for example, looks like this:

 
ntpq> pe
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+neate.neateroll 199.212.17.15    3 u    7   64  377   70.286   32.691  28.241
+raptor.tera-byt 132.246.168.164  3 u    8   64  377    0.940   34.024  23.233
-ntp1.chorus.net .GPS.            1 u    9   64  377   88.826  -25.300  32.153
*ip-207-145-113- .GPS.            1 u    7   64  377  121.088   35.874  26.715

If it still hasn't peered after a reasonable amount of time, you've either messed up the instructions (or I've messed up the instructions!) or have hit a problem I haven't encountered yet.

Note that these instructions are still rather sketchy as I just set this all up today. Any comments or corrections are appreciated.

To set your timezone when using uClibc, check this link: http://leaf.sourceforge.net/doc/guide/buci-tz.html#id2594640


The crontab line with hwclock is run standard every minute. It seems that Linksys wanted here to sync the system clock to the hardware clock every minute. In fact when you have an ntp-server running, you do not want that hwclock syncs the system clock to the hardware clock, you rather want to sync the hardware clock to the system clock, which is being kept up to date by the ntp deamon. I also believe that syncing once a day is sufficient. To achieve this change the hwclock line in the /etc/crontab file as:

0 5 * * * root /usr/sbin/hwclock --systohc &>/dev/null

No hwclock will run once a day at 5am in the morning and sync the hardware clock to the system clock.

view · edit · print · history · Last edited by Three Stones.
Based on work by Three Stones, Bear, Lee Kimber, blaster8, tman, repvik, nbehrent, and Lem.
Originally by Lem.
Page last modified on April 06, 2007, at 12:36 AM