On Tue, 2008-09-02 at 08:42 -0400, Eric Malkowski wrote: > Hannes- > > Thanks for the quick reply. > > I had some time to run this in strace this morning and it appears the > culprit is the fact that at daemon start time, the return values from the > times() system call are working their way at system boot time on my box > from -32768 at bootup up towards overflow at 0 and olsrd does his ~40 A usual way to test the jiffie (read: timer) overflow in the Linux kernel;-) > second "coma" close to zero and then times() return values go positive and > all is well. > > See the following strace log which I think tells the whole story: > > http://www.malknet.org/strace_olsrd_0.5.6_uclibc.log > > Towards the end of the file you can see pretty clearly what is going on -- > a lot of various odd errno values and possibly a bad pointer or address If you look into the source, the pointer(s) is clearly always valid. > being passed into times() for a bit - but it recovers. That's a "feature" of the - ahemm - insane definition of the the times(2) sys-call in posix/SuSv3/... and the implementation on Linux-x86 (and possibly a few other archs, but not all): The kernel returns the number of clock ticks (in a clock_t - which is AFAIK signed - the first bug) since some arbitrary point in the past (e.g. boot time, but also 300 seconds before boot time). This is essentially an unsigned number. "Shortly" before the (unsigned) overflow, the signed interpretation of that number is negative. Inside the Linux kernel, errors are generally returned as -E (and that E can be found later in the "errno" variable in user-space). However, at the "top" of the x86 kernel (where the syscalls end), result values > -1024 (IIRC) and < 0 are interpreted as "error" and - thus - times(2) returns -1 and errno gets such strange error values (and not one of the define values like EFAULT). That's the second bug. Voila. One should probably use something like: ---- snip ---- unsigned long sane_times(void) { struct tms tms_buf; const long t = times(&tms_buf); return t < 0 ? -errno : t; } ---- snip ---- The unused "tms_buf" is there since some BSD doesn't like it if one passes NULL there (IIRC). Hmm, one should review all the times(2) calls though .... Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services