[Olsr-dev] Triggered LQ Messages

Tue Feb 12 15:07:08 CET 2008

hi markus,

see answers/comments inline.

Markus Kittenberger wrote:
> Hi
> 
> What are the situation that trigger an LQ Hello?
> 
> A change in neighbourset?

yes

> if theese are the only reasons i think they are of no practical use,..
> (pls correct me if this is wrong (-;)

triggered hellos aim to speed up convergence, for many nodes starting up.
i.e. they do make sense.

> here are the reasons for above statement:
> 
> when losing connection to a neighbour, olsr usually needs 200 seconds of 
> a dead link before "accepting it"
> so there is really no urgency anymore to trigger an lq-hello,..

you are observing artifacts of a broken implementation, any olsr neighbor
(and its corresponding tc_edge) must be removed after the neighbor holdtime
expires, 200s sounds really broken.

> when getting an new neighbour, its lq is low at the beginning, so no 
> need to be proud of, (no need for an immediate lq-hello)
> 
> while seing no benefit of theese messages i dislike following:
> 
> as far as i kkonw, the logic producing the lq value, puts every hello in 
> its window
> and assumes a loss every 3*lq_hello interval,..
> (which i thin kis far too slow)
> 
> but in facts it may receive 100 triggered hellos in the same time period,..
> (a lq hello frequency of about 5Hz is not unusual near the uplink in vienna)
> 
> an this resulta in very unuseable lq values.

afaik henning is in the process of rewriting this.

> take this sample on a link over a 5Ghz Bridge (having some trouble with 
> trees and wind or similar, or whatever)
> 
> 12:00:00 lq 0.00 - 0 of  0 lost (bridge associates)
> 12:00:20 lq 0.95 - 5 of 100 lost (already 100 lq hellos transmitted)
> 12:01:00 lq 0.96 - 4 of 100 lost (bridge loses association)
> 12:01:20 lq 0.95 - 5 of 100 lost (slowly assuming losses)
> 12:01:21 lq 0.00 - 94 of 100 lost (first packet after link association 
> had a quite high sequence number, resulting in adding 90 losses)
> 12:01:45 lq 0.96 - 4 of 100 lost (everything is now like before the 
> bridge lost association)
> 
> look fine,..
> but the results are in fact horriffic (dont get angry on the bad 
> jokes/thoughts of the imaginary user)
> 
> assuming the default route going over the bridge at the beginning 
> folowing happens to me (the imaginary user (-;):
> 
> -- 
> 
> everything is fine and fast, im'surfing around, until the bridge loses 
> association (damn shit, this f*** happens again)
> i sit here an wait for olsrd to take another route (knowing this will 
> take unbelieveable 3 minutes)
> 
> but the bridge comes up again only 20 sec later, hurray!
> now i can immediatley continue surfing,..
> 
> but no!
> now olsr decides to switch the default route (i think maybe it would be 
> better to have no alternative routes at all)
> but this route isn*t working instantly, so i wait while the information 
> is propagated in the net,..
> (knowing that this alls is pure nonsense because the bridge is up again)
> 
> 20 seconds later olsr decides the route over the bridge is better (in 
> fact it is, stupid olsr!!!)
> it switches again, but the wrong link is now propagated in the net,.. 
>  (causing som temporary loops or so somewhere)
> 
> so i wait another 20 
> seconds, (thinking about the internet cafe on the other side of the street,..) 
> 
> 
> ---
> or in numbers, in such a scenario a outage of 10 seconds of a link can 
> result in an ETX change from 1 to 4 (assuming 5 LQ-Hellos per 
> second), causing the loss of full connectivity for about one minute,..
> 
> without so much triggered lq-hellos, chances would be high that after 10 
> seconds the ETX would just rise from 1 to 1.05, having no side effects, 
> so that there`s is just a 10 secong connectivity loss,..
> 
> if other nodes use this node as their main gateway, this unlucky 
> behaviour may affect large regions of a mesh
> 
> ---
> to come to a proposal,..
> 
> what about stoppping triggered lq-hellos completely, or at least stop 
> counting them like normal hellos,..

IMO it would be a better start to fix hold-time detection, such that
it behaves symmetrical. afaik henning has an experimental patch that
removes the whole windowing as is today by a simple exponential backoff
formula.

/hannes