[OLSR-users] Assertion `metric_counter' failed (IPV4)

Andreas Tønnesen (spam-protected)
Mon May 23 19:51:04 CEST 2005


Philippe, all,

We have tracked down the problem now. It all came down to a combination
of a optimization/hack in the link sensing and a high interval in the
interface change polling. The hack in question is something we have
decided to keep(it does more good then harm :-) ). I'll not get into too
much details here but the hack in question is a way to register multi
homed nodes before a MID message is received(since MIDs often are sent
on high intervals). Without this hack a node must in some cases wait for
a MID from a neighbor before the neighbor is properly registered.
The problem will occure when a neighbor changes interface address and
generates a HELLO message before olsrd detects the change. In this case
a erroneus MID entry will be created by the receiver and used in route
calculation.
The following has been done to "fix" the problem:

- The interface change interval poll is reduced to the half of the
   previous interval resulting in quicker IP change detection. It now
   polls every 2.5 seconds. A possible fix would be to check for changes
   upon every HELLO message generated, but we concluded that this would
   create too much CPU overhead.
   This interval might be configureable in the next version.
- The vtime for the MID entry created by the hack is halfed from 20.0
   to 10.0 seconds. This will reduce the time the system is in the
   erronous state. But it will also reduce the effect of the
   optimization/hack if nodes use a high interval for MID emission.
- The olsr_delete_routes_from_kernel has been updated to handle the
   stale routes.

Summa sumarum: there should be much less of a chance of the problem to
occure now, the system will handle the faulty routes and olsrd will
recover from the the erronous state quicker.

Phillipe: could you please check out a fresh copy from cvs tomorrow and
retest?

Thanks.

- Andreas


Philippe Vanhaesendonck wrote:
> Andreas,
> 
> Here is the outcome!
> 
> It was less easy to reproduce -- I guess we have a race condition
> somewhere and the fact that we are printing more information changes the
> overall behaviour...
> 
> --
> Phil.
> 
> Andreas Tønnesen wrote:
> 
> 
>>Philippe,
>>
>>Thanks(and sorry for the cut'n past error). It is now clear to see what
>>causes the problem. The entry:
>>
>>>Stale route to to 10.11.1.197 via 10.11.1.197 by wlan0 hopcount 3!<
>>
>>has a hopcount of 3 yet it has itself as nexthop. So there seems to be
>>a problem with the route calculation at some point...
>>This should really not be all that harmfull as long as the route in
>>question is actually deleted and it's easy to create a quick fix this
>>way.
>>But I'd like to get to the bottom of this. Would it be possible for you
>>to build yet another version and reproduce the problem? The attached
>>patch will generate routing table output to stdout. Hopefully there will
>>not be enough output to cause trouble. And hopefully there are no cut'n
>>paste bugs ;)
>>Thanks again.
>>
>>- Andreas
> 
> 
> [Deleted...]
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> olsr-users mailing list
> (spam-protected)
> https://www.olsr.org/mailman/listinfo/olsr-users

-- 
Andreas Tønnesen
http://www.olsr.org



More information about the Olsr-users mailing list