[Olsr-dev] Some reflections on link state routing and metrics

Mon Feb 9 12:18:26 CET 2009

Thanks for starting public reflections on this theme, *g

On Mon, Feb 9, 2009 at 4:00 AM, Harald Geyer <(spam-protected)> wrote:

>
> Of course there are metrics trying to optimize very different aspects
> of connection qualitiy, but for the purpose of the e-mail I'll
> assume that any good metric will need a big range of values to provide
> a proper model for a divers real world mesh like freifunk/funkfeuer.
>
> So the question above boils down to "Will OLSR work with wide-range
> metrics?"

it depends on the metric, see below about packet loss on fast links

speaking generally without fundamental changes, IMHO 'IT WON`T WORK WELL
but there are already some ideas about changes:
.
dynamic tc-intervals
1. to free bandwith when links do not change there costs, (longer intervals)
2. if completely new/different values have to get announced, intervals are
lowered significantly
.
and to allow above lq values must have an clever hysterese (which also
introduces new problems)
and they should also have some mechanisms (longer history (not for doing
average, but for analysing lq-behaviour))
.
there are also some ideas about reliable flooding, (would be useful to make
sure significant updates are known)
.
and some more ideas, but not all of tham are easily to implement RFC
compliant, and its mostly just ideas,..
.
also there is the model of virtual nodes, hiding ethernet links (with would
have significant lower costs in most wide range metrics) completely from the
topology
(the normal usage is to hide the links between some routers on a single roof
(a multi-router-mesh-node), not to hide links between different nodes in the
mesh)
this infacts may be a very appropritate solution to introduce no cost links
(instead of low cost links), and maybe even step back from using a wide
range metric
.
or it may at least reduce the range of link costs if ALL ethernet/fibre
links get out of the topo, ...
but as there may still be fast (low cost) links you wanna see in topo,
virtual nodes of course will not solve all the problems,..

and there are also (again) ideas about detecting loops, an triggering local
resynchonisation (of links on the route to the target of the loop) between
the nodes whic detected a loop between them, but generally we should avoid
loops i think,..
but detecting them may be a good measure on how many loop we actually have,
and also a great help on debuggin problems

>
>
> OLSR is designed under the assumption that it takes only a short time
> for the routing to converge. Before the routing has converged loops
> are likely and the time to converge is a function of the diameter of
> the mesh and the broadcast package loss.

small note on this: in fact fisheye means planned packet loss, and slower
convergence (from this point of view)
.
inmho the error of fisheye is that its done (infact) completely by the
originator of the packet, and not by the forwarding nodes
it would be for example much better just to mark TCs as droppable, instead
of giving them low ttl, and let forwarders decide,.
.
to NOT forward a TC packet received over a high cost link (if it was marked
as droppable)
or to NOT forward it to an interface having only high cost neighbours
.
above proposals assume NO BIG changes of linkcosts in the TC to be dropped
.
(btw i have some more ideas about incremental routing and fisheye, but above
is easier/faster to implement)

> To keep the chance of loops
> at a minimum topology information is distribute frequently - faster
> than the link costs can change significantly.

you forget to mention explicitely one nice feature of ETX:
it has a defined minimum link cost of 1, so every hop has significant costs
and therefore there must be significant desynchonisations in topology of
different nodes, in fact a difference of in sum greater than 1 on a specific
route, to get a loop
but wide range metrics usually will NOT have a defined minimum,..
.
mathematically of course above is quite uninteresting, as only the factor of
maximum_cost/minimum_cost plays a role, but maybe above clearifys this in a
not so mathematical/theoretical way,..

>
>
> But if we introduce large link costs, we also need to allow for
> fast absolute changes: In a given fixed time interval the link costs
> must be able to change by some relativ amount (say 10%) otherwise
> the link costs would lag behind reality too much to be useful.

i would (till we have better solution) propose to stop links from recovering
fast, maybe only allow them to drop there lq fast, but not to rise it,.. but
this solves only halve of the problems, and even this not well,..

> (I hope this point is clear, if not: Please speak up.)
>
> So now if the absolute change of link costs is fast: Will the routing

> ever converge?

it will for sure produce loops very often,..
if it will ever converge, and if some (distant) nodes will maybe never reach
each other, because there are always loops on their routes, is just a
question of how much links change their costs how fast, and how much
packetloss (includes not enough bnandwidth) stops them from converging
an lowering the tc-intervals (in general) is a solution we surely CAN NOT
consider, as we already have too much tcs,..

> In the presence of package loss on low cost links probably
> not. - So I claim as a way out any link state routing protocol that
> wants to use wide-range metrics will need some mechanism to ensure
> good synchronisation on links with low cost.

ACK!
i think it`s an absolute catastrophe if we allow low cost links with packet
loss!!!
i mean a 10 gig fibre having significant packet loss MUST NOT have low
costs,..
.
so multiplicating bandwith with packet loss is absolutely no good idea on
such links
.
one option would be to switch to reliable communication on extremely fast
links (e.g. use tcp)
i means if a link "wants" to have low cost, there must be ZERO packet loss,
or there must be (layer 3) error correction enabled which ensures ZERO loss
on TCs (fisheye of course MUST BOT interfere with this, so we have to turn
it off (at least "locally"), or change it - see my fisheye ideas)
.
cause the lower the cost, the easier to loop over this link, and therefore
synchonisation must be much better, but luckily on fast links this should be
achieveavle, without much problems

Hope you find this interesting.
FULL ACK i do find this interesting, *g*g*g
Markus
p.s.
of course, there MUST be NO bugs (even more as this is desireable now),
interferring announcement/forwarding of topology, if we have low cost links
/ wide range metrics,..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.olsr.org/pipermail/olsr-dev/attachments/20090209/f48943a5/attachment.html>