[Olsr-dev] OLSR on OLPC?

Thu Jun 5 12:07:46 CEST 2008

C. Scott Ananian wrote:
> To quote a Nortel engineer:
>   "When traffic affects routing you have a feed-back system which is
> not good. This is something we work very hard to avoid with wireline
> routing systems and its one of the reasons we almost never do
> congestion based shortest path routing. I.e. it would oscilate like
> crazy. [...] Its not hard to see that if traffic interferes with
> routing that in turn affects traffic and hence routing etc... and you
> get a horrible set of feedback oscillations in the network. I've seen
> this many times in routing systems where the control packets are not
> properly isolated from the data packets. The
> symptoms are usually the same (a mess) and the cure is the same
> (proper isolation/prioritization of control over data at every stage
> of handling). Early OSPF networks used to have this problem (hellos
> dropped due to congestion) and were quite spectacular and horrifying
> to watch melt down. More than one very expensive outage (i.e. entire
> countries) happened as a result."

ok here comes the detailed explanation:

the assertion of the nortel engineer is by and large correct, and accepted public
knowledge in the router industry. the practical question to ask
is how to provide isolation in a software based router -> is the
control-plane (olsrd) daemon scheduled if the kernel (interrupt processing)
is swamped with packets.

what is required is a kernel with the ability to throttle down
the interface (the one with the massive input load) for 1s and make sure
that the control-plane daemon is proper scheduled.

within olsrd our code-paths are tuned for speed such that even on a MIPS
CPU with 200MHz it can process a SPF graph for thousands of nodes and still
be able to generate outgoing hellos in time.

so what is really missing is the throttle-interface ability, which has
to my knowledge not been a big shortcoming, since the CPUs of wireless nodes
by far out pace the maximum bandwidth of wireless networks.

>>>  f) your thoughts on the OLSR extensions to the 802.11s standard: are
>>> they worth implementing, or did you need to tweak the standard OLSR
>>> protocol to make it work in real life?
>> any wireless mesh protocol, without the ability to convey a loss or
>> usable bandwidth metric is hopelessly broken.
> 
> I provided some links to the 802.11s specs above; do you feel the
> metrics specified are useful?

vanilla (rfc 3626) OLSR does not have any loss-metric, therefore i think it
is not usable in practice. note that our olsr implementation
does add two new messages (Hello-LQ, and TC-LQ) on which olsrd convey the
loss-metric of a link).

>>>  g) the media access protocol in standard 802.11 doesn't seem to scale
>>> much past 30-or-so nodes wanting to talk at the same time.  Have you
>>> run into this problem in dense deployments?
>> i'll defer that question to the berlin guys. they have real-world experience
>> from running olsrd at conventions like the CCC.
> 
> The following academic paper cites several different approaches to
> scaling the 802.11 media access protocol to handle lots of nodes:
>   http://dx.doi.org/10.1109/TMC.2006.124
> 
> At the moment, "We are using regular 802.11 DCF as our Medium access
> protocol. It is now augmented by a dynamic contention window,
> currently under test. A dynamic CW  is not part of the standard, so
> the vendors will implement them in many different ways."
> 
> It's my understanding that a number of different 802.11 chipset
> vendors have implemented some form of extended media access protocol
> for 802.11; I'm curious to know whether that's part of the secret
> sauce folks like those in Berlin need to use in order to make 802.11
> (mesh or not) work in large settings.
>   --scott
>