[Olsr-dev] Bug in timing out stale TC entries

Hannes Gredler (spam-protected)
Sun Feb 24 16:22:24 CET 2008


erik,

removing edge entries from the lsdb is not mandatory as long
as we "do the right thing", which is to ignore this link
during SPF computation.

it would be interesting to see if the offending edges (or the inverse
edges) are marked down using the OLSR_TC_EDGE_DOWN flag.
(pls add some instrumentation code to verify).

the reason we are just marking the edge as down and not immediately
removing it is b/c of the TC fragmentation bug where a TC
message spans across 2 packets. (if we would eliminate all TC
edges immediately then you get a whole lot of malloc() churn.
b/c a couple of ms later the next packet is being processed and
then the tc_edge is reinserted again.

/hannes

Erik Tromp wrote:
> Hi all,
> 
> I think there is a bug which as introduced in version 0.5.4 and is still present in 0.5.5 . The bug was not present in 0.5.3 and
> earlier (those good old times... :-).
> 
> The problem lies in that some stale (old) TC entries take extremely long (5-10 minutes) before they are cleaned up, and sometimes
> are not cleaned up at all.
> 
> I have a small test network of 4 nodes. A picture of it is available at:
> http://home.tiscali.nl/levab001/OLSR-BMF-testnetwork.pdf
> 
> In the beginning, all is well. Node 10.0.4.3 has the following
> state:
> 
> --- 14:26:43.83 ---------------------------------------------------- LINKS
> 
> IP address       hyst   LQ     lost   total  NLQ    ETX
> 10.0.6.6         0.000  1.000  0      5      1.000  1.00
> 10.0.6.5         0.000  1.000  0      5      1.000  1.00
> 10.0.5.5         0.000  1.000  0      5      1.000  1.00
> 10.0.6.4         0.000  1.000  0      5      1.000  1.00
> 
> --- 14:26:43.02831232 ----------------------- TWO-HOP NEIGHBORS
> 
> IP addr (2-hop)  IP addr (1-hop)  TLQ
> 10.0.8.6         10.0.4.4         1.000
>                  10.0.8.5         1.000
> 10.0.8.5         10.0.4.4         1.000
>                  10.0.8.6         1.000
> 10.0.4.4         10.0.8.6         1.000
>                  10.0.8.5         1.000
> 
> --- 14:26:43.83 ------------------------------------------------- TOPOLOGY
> 
> Source IP addr  Dest IP addr    LQ     ILQ    ETX
> 10.0.4.3        10.0.4.4         1.000  1.000  1.00
> 10.0.4.3        10.0.8.5         1.000  1.000  1.00
> 10.0.4.3        10.0.8.6         1.000  1.000  1.00
> 10.0.4.4        10.0.4.3         1.000  1.000  1.00
> 10.0.4.4        10.0.8.5         1.000  1.000  1.00
> 10.0.4.4        10.0.8.6         1.000  1.000  1.00
> 10.0.8.5        10.0.4.3         1.000  1.000  1.00
> 10.0.8.5        10.0.4.4         1.000  1.000  1.00
> 10.0.8.5        10.0.8.6         1.000  1.000  1.00
> 10.0.8.6        10.0.4.3         1.000  1.000  1.00
> 10.0.8.6        10.0.4.4         1.000  1.000  1.00
> 10.0.8.6        10.0.8.5         1.000  1.000  1.00
> 
> 
> This information is consistent with that on node 10.0.8.5:
> 
> --- 14:23:14.96 ---------------------------------------------------- LINKS
> 
> IP address       hyst   LQ     lost   total  NLQ    ETX
> 10.0.5.3         0.000  1.000  0      5      1.000  1.00
> 10.0.6.3         0.000  1.000  0      5      1.000  1.00
> 10.0.6.6         0.000  1.000  0      5      1.000  1.00
> 10.0.2.4         0.000  1.000  0      5      1.000  1.00
> 10.0.6.4         0.000  1.000  0      5      1.000  1.00
> 
> --- 14:23:14.02969528 ----------------------- TWO-HOP NEIGHBORS
> 
> IP addr (2-hop)  IP addr (1-hop)  TLQ
> 10.0.8.6         10.0.4.3         1.000
>                  10.0.4.4         1.000
> 10.0.4.4         10.0.4.3         1.000
>                  10.0.8.6         1.000
> 10.0.4.3         10.0.8.6         1.000
>                  10.0.4.4         1.000
> 
> --- 14:23:14.96 ------------------------------------------------- TOPOLOGY
> 
> Source IP addr  Dest IP addr    LQ     ILQ    ETX
> 10.0.4.3        10.0.4.4         1.000  1.000  1.00
> 10.0.4.3        10.0.8.5         1.000  1.000  1.00
> 10.0.4.3        10.0.8.6         1.000  1.000  1.00
> 10.0.4.4        10.0.4.3         1.000  1.000  1.00
> 10.0.4.4        10.0.8.5         1.000  1.000  1.00
> 10.0.4.4        10.0.8.6         1.000  1.000  1.00
> 10.0.8.5        10.0.4.3         1.000  1.000  1.00
> 10.0.8.5        10.0.4.4         1.000  1.000  1.00
> 10.0.8.5        10.0.8.6         1.000  1.000  1.00
> 10.0.8.6        10.0.4.3         1.000  1.000  1.00
> 10.0.8.6        10.0.4.4         1.000  1.000  1.00
> 10.0.8.6        10.0.8.5         1.000  1.000  1.00
> 
> 
> Now I shutdown the interface from node 10.0.4.3 into the 10.0.6.x network This is visible on 10.0.4.3 :
> 
> --- 14:49:29.03 ---------------------------------------------------- LINKS
> 
> IP address       hyst   LQ     lost   total  NLQ    ETX
> 10.0.5.5         0.000  1.000  0      5      1.000  1.00
> 
> --- 14:49:29.0230914 ----------------------- TWO-HOP NEIGHBORS
> 
> IP addr (2-hop)  IP addr (1-hop)  TLQ
> 10.0.8.6         10.0.8.5         1.000
> 10.0.4.4         10.0.8.5         1.000
> 
> --- 14:49:29.03 ------------------------------------------------- TOPOLOGY
> 
> Source IP addr  Dest IP addr    LQ     ILQ    ETX
> 10.0.4.3        10.0.8.5         1.000  1.000  1.00
> 10.0.4.4        10.0.4.3         1.000  0.596  1.68
> 10.0.4.4        10.0.8.5         1.000  1.000  1.00
> 10.0.4.4        10.0.8.6         1.000  1.000  1.00
> 10.0.8.5        10.0.4.3         1.000  1.000  1.00
> 10.0.8.5        10.0.4.4         1.000  1.000  1.00
> 10.0.8.5        10.0.8.6         1.000  1.000  1.00
> 10.0.8.6        10.0.4.4         1.000  1.000  1.00
> 10.0.8.6        10.0.8.5         1.000  1.000  1.00
> 
> The first stale TC entry is visible here:
> 10.0.4.4        10.0.4.3         1.000  0.596  1.68
> which does not seem to get removed.
> 
> 
> Also on node 10.0.8.5 we can see two TC entries that don't go away:
> 
> --- 14:45:23.91 ---------------------------------------------------- LINKS
> 
> IP address       hyst   LQ     lost   total  NLQ    ETX
> 10.0.5.3         0.000  1.000  0      5      1.000  1.00
> 10.0.6.6         0.000  1.000  0      5      1.000  1.00
> 10.0.2.4         0.000  1.000  0      5      1.000  1.00
> 10.0.6.4         0.000  1.000  0      5      1.000  1.00
> 
> --- 14:45:23.02915630 ----------------------- TWO-HOP NEIGHBORS
> 
> IP addr (2-hop)  IP addr (1-hop)  TLQ
> 10.0.8.6         10.0.4.4         1.000
> 10.0.4.4         10.0.8.6         1.000
> 
> --- 14:45:23.91 ------------------------------------------------- TOPOLOGY
> 
> Source IP addr  Dest IP addr    LQ     ILQ    ETX
> 10.0.4.3        10.0.8.5         1.000  1.000  1.00
> 10.0.4.4        10.0.4.3         1.000  0.596  1.68
> 10.0.4.4        10.0.8.5         1.000  1.000  1.00
> 10.0.4.4        10.0.8.6         1.000  1.000  1.00
> 10.0.8.5        10.0.4.3         1.000  1.000  1.00
> 10.0.8.5        10.0.4.4         1.000  1.000  1.00
> 10.0.8.5        10.0.8.6         1.000  1.000  1.00
> 10.0.8.6        10.0.4.3         1.000  0.800  1.25
> 10.0.8.6        10.0.4.4         1.000  1.000  1.00
> 10.0.8.6        10.0.8.5         1.000  1.000  1.00
> 
> From the above list, the following TC entries should have been romoved:
> 
> 10.0.4.4        10.0.4.3         1.000  0.596  1.68
> 10.0.8.6        10.0.4.3         1.000  0.800  1.25
> 
> 
> On node 10.0.4.4 the following stale entry can be seen:
> 
> 10.0.8.6        10.0.4.3         1.000  0.800  1.25
> 
> 
> On node there 10.0.8.6 are even four stale entries:
> 
> 10.0.4.4        10.0.8.5         0.400  1.000  2.50
> 10.0.4.4        10.0.8.6         0.400  1.000  2.50
> 10.0.8.5        10.0.4.4         1.000  0.596  1.68
> 10.0.8.6        10.0.4.4         0.400  1.000  2.50
> 
> 
> Which of the stale entries remain in place seems to be rather random: when the experiment is repeated, there may be other entries
> that are not cleaned up.
> 
> One of the results is that BMF does work function properly, since it relies on the information in the TC database.
> 
> Regards,
> Erik
> 
> 




More information about the Olsr-dev mailing list