[Olsr-users] 0.5.6 Routes disappear after 4 mins of uptime, then all OK - suspect clock sync

Eric Malkowski (spam-protected)
Tue Sep 2 17:30:57 CEST 2008


Bernd-

Thanks for the info.  I've seen the jiffies overflow approach on 
buildroot setups I've used in the past to weed out timer overflow 
problems early on.
I'm familiar w/ the -ERRNO stuff in kernel system call handling too -- 
it's interesting to see times() getting sort of "tainted" by kernel 
system call handling into returning -1 while it winds through the 
ERRNOs.  It causes the coma in olsr until timers start to appear fired 
in olsr's code and it keeps going.  At least it's a short outage for 
people that might run into this w/o a fix -- however I'm running w/ a 
100 hz tick and at 5ms configured tick rate if my math is right I'll get 
the same problem at about 4 minutes of uptime and then again after 248 
days of additional uptime.  People running 1000 hz will have it every 24 
days etc. -- actually the numbers should be doubled since it's treated 
like an unsigned and it's only tainted by kernel syscall when close to 
zero -- so every 496 days for me and 48 for 1000hz setups.

I like the sane_times() snippet.

So this is great -- I'd be happy to test any proposed fixes to the 
handful of times() calls -- I've got a setup that will make such testing 
easy.

Thanks again both Bernd and Hannes -- are you guys regular developers 
for olsr?  I'll go have a peek at the devel mailing list.

It's amazing how quick a problem can be squashed w/ opensource.

-Eric





More information about the Olsr-users mailing list