On Tue, 2008-09-02 at 11:30 -0400, Eric Malkowski wrote: > Bernd- > > Thanks for the info. I've seen the jiffies overflow approach on > buildroot setups I've used in the past to weed out timer overflow > problems early on. > I'm familiar w/ the -ERRNO stuff in kernel system call handling too -- > it's interesting to see times() getting sort of "tainted" by kernel > system call handling into returning -1 while it winds through the > ERRNOs. It causes the coma in olsr until timers start to appear fired > in olsr's code and it keeps going. At least it's a short outage for > people that might run into this w/o a fix -- however I'm running w/ a > 100 hz tick and at 5ms configured tick rate if my math is right I'll get > the same problem at about 4 minutes of uptime and then again after 248 > days of additional uptime. People running 1000 hz will have it every 24 > days etc. -- actually the numbers should be doubled since it's treated > like an unsigned and it's only tainted by kernel syscall when close to > zero -- so every 496 days for me and 48 for 1000hz setups. Yes. But the 1000HZ setups have only a 1/10th of the pause (compared to 100HZ) - so most probably won't really face the problem. And olsrd is not the first app in my life with that "problem" (or did you think I just looked that all up in the various sources in that short time beneath my day job? ;-). > I like the sane_times() snippet. > > So this is great -- I'd be happy to test any proposed fixes to the > handful of times() calls -- I've got a setup that will make such testing > easy. I need a few free hours for a patch+review to test .... > Thanks again both Bernd and Hannes -- are you guys regular developers > for olsr? I'll go have a peek at the devel mailing list. /me: Sometime ago yes. > It's amazing how quick a problem can be squashed w/ opensource. ATM we have only diagnosed the problem. It waits to be squashed;-) And IMHO it's mostly independent of OSS or some commercial support contract for proprietary software/systems with guaranteed response times to a bug report - more important other factors are: - Is it an already known problem? - Does one have an in-depth bug report containing facts like yours or just a "I downloaded the source, compiled it and it doesn't work." which is neither seldom nor particularly helpful. In the OSS world, the last sentence would have probably only resulted in "send more details like OS, version, ... And what did you do, what did you expect, etc.". Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services