[Olsr-users] 0.5.6 Routes disappear after 4 mins of uptime, then all OK - suspect clock sync
Eric Malkowski
(spam-protected)
Tue Sep 2 17:30:57 CEST 2008
Bernd-
Thanks for the info. I've seen the jiffies overflow approach on
buildroot setups I've used in the past to weed out timer overflow
problems early on.
I'm familiar w/ the -ERRNO stuff in kernel system call handling too --
it's interesting to see times() getting sort of "tainted" by kernel
system call handling into returning -1 while it winds through the
ERRNOs. It causes the coma in olsr until timers start to appear fired
in olsr's code and it keeps going. At least it's a short outage for
people that might run into this w/o a fix -- however I'm running w/ a
100 hz tick and at 5ms configured tick rate if my math is right I'll get
the same problem at about 4 minutes of uptime and then again after 248
days of additional uptime. People running 1000 hz will have it every 24
days etc. -- actually the numbers should be doubled since it's treated
like an unsigned and it's only tainted by kernel syscall when close to
zero -- so every 496 days for me and 48 for 1000hz setups.
I like the sane_times() snippet.
So this is great -- I'd be happy to test any proposed fixes to the
handful of times() calls -- I've got a setup that will make such testing
easy.
Thanks again both Bernd and Hannes -- are you guys regular developers
for olsr? I'll go have a peek at the devel mailing list.
It's amazing how quick a problem can be squashed w/ opensource.
-Eric
More information about the Olsr-users
mailing list