[Olsr-users] [Olsr-dev] olsrd 0.6.6.1 (and earlier) ipv6 problems

Russell Senior (spam-protected)
Fri Mar 28 10:56:17 CET 2014


Yes, as reported by curl -6 http://localhost:$port/routes | wc -l, if I try
to exceed 180 routes, I see the collapse.  I can turn off what seem like
perfectly working devices, and turn on devices that previously seemed to
cause problems, and as long as I stay under 180 routes, it's all good.  If
I try to add another device, boom, collapse.


On Fri, Mar 28, 2014 at 2:48 AM, Russell Senior
<(spam-protected)>wrote:

> The central server, ::407, is running OpenVPN in server mode.  The "leaf"
> nodes all connect to it via OpenVPN client mode with a tap interface.  We
> statically provision the IPv6 addresses on the vpn.
>
> And yes, the OpenVPN links are still active.  We are running an IPv4
> instance of olsrd (same version) in parallel and those routes (to the very
> same devices) are not affected.
>
> We see the problem when particular (though varying) nodes olsrd ipv6
> instances are started/stopped.  Sometimes the nodes are running 0.6.6.1,
> and sometimes 0.6.4.  It doesn't seem to be specific.  The central server
> is running 0.6.6.1 now, but we saw the same thing earlier (which is why I
> upgraded) on 0.6.4.
>
> One other potential clue (it doesn't make very much sense, because I know
> there are much bigger networks than ours), I've never seen more than 186
> ipv6 routes on ::407.  We seem to see the problem when we try to exceed
> that.  I'm going to try to confirm that.
>
>
> On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <(spam-protected)> wrote:
>
>> Hi,
>>
>> I must admit that I am not convinced that its an Olsrd bug what we are
>> seeing...
>>
>> If I see it correctly Olsrd is running over the VPN interface
>> connection (interface name "vpn"), right?
>>
>> Is the VPN connection between the nodes still active during the route
>> loss? Most of the nodes seem to have direct connections and the "30
>> seconds until recovery" sounds like an ETX value slowly going down and
>> then dropping the link.
>>
>> Henning
>>
>> On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto <(spam-protected)>
>> wrote:
>> > Hello Russel,
>> >
>> > looking at this:
>> >   https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>> >   https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>> >   https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>> >
>> > it looks like IPv6 routes are removed from the olsrd database. So I is
>> > actually the olsrd daemon involved.
>> >
>> > do you know if there is a previous stable version of olsrd where this
>> > bug/behaviour is not present ?
>> >
>> > In my opinion the fastest way to track the bug is to try different
>> > versions of olsrd with "git bisect" method.
>> >
>> > The first step is to tell us if there is a version of olsrd that is
>> > not affected by this problem.
>> >
>> > thanks
>> >
>> > I cc: olsrd-dev
>> >
>> > Saverio
>> >
>> >
>> > 2014-03-27 10:37 GMT+01:00 Russell Senior <(spam-protected)>:
>> >>>>>>> "Henning" == Henning Rogge <(spam-protected)>
>> writes:
>> >>
>> >> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:
>> >>>> Anybody get a chance to look at the strace?  I see a:
>> >>
>> >> Henning> strace and packet dumps are much too lowlevel to directly
>> >> Henning> hunt problems like this. Thats why Saverios question about
>> >> Henning> txtinfo good, because it gives you a much more high-level
>> >> Henning> view on what is going on.
>> >>
>> >> I had not installed the modules previously, so that interface wasn't
>> >> immediately available.  It is now.
>> >>
>> >> [...]
>> >>
>> >> Henning> Okay, lets get back to the high-level view.
>> >>
>> >> Henning> To interpret the events you described we need a list of
>> >> Henning> nodes, with their interface IPs and the connectivity between
>> >> Henning> them.
>> >>
>> >> Here is the list of neighbors of 2001:470:e962::407.  The addresses
>> >> listed are on the public wifi.  The OpenVPN addresses of each node are
>> >> a permutation, e.g. if the public wifi addr is 2001:470:e962:wxyz::1,
>> >> then the OpenVPN address of the node is 2001:470:e962::wxyz.
>> >>
>> >> None of the nodes connect directly, everything goes through ::407.
>> >>
>> >> From curl -6 http://localhost:$port/neighbors
>> >>
>> >>   https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt
>> >>
>> >> Henning> I am also a bit worried about your usage of bridges
>> >> Henning> connected to mesh interfaces.  Normally you should no bridge
>> >> Henning> any interface that OLSR uses for meshing.  Mixing routing
>> >> Henning> (L3) and bridging (L2) can go wrong in very creative ways.
>> >>
>> >> I don't understand how the bridges could be a problem in this case.
>> >> This is a hub and spoke topology.  One openvpn server in the middle,
>> >> nodes at the edges.  None of the nodes interconnect otherwise.  Olsr
>> >> is broadcast on the wifi in case there are any olsrd devices nearby,
>> >> but, again, there is no overlap in the wifi coverage (and if there
>> >> were physically, they are on different SSIDs and wouldn't overlap
>> >> logically).
>> >>
>> >> Can you explain more about what in particularly would make you worry?
>> >> This configuration has been stable for us on ipv4 for years and also
>> >> on ipv6 until very recently, since late 2012 at least.  So, I suspect
>> >> a bug.  Somewhere.
>> >>
>> >> Henning> Txtinfo output would be good (especially /route) would be
>> >> Henning> good to see...  before the problem, during the problem and
>> >> Henning> after the recovery.
>> >>
>> >> I'm using curl -6 http://localhost:$port/routes to get the following
>> >> data, before, during and after turning on an ipv6 olsrd on a
>> >> particular node (2001:470:e962:11c1::1).
>> >>
>> >>   https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>> >>   https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>> >>   https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>> >>
>> >> Henning> It would also help if you can reduce the number of nodes
>> >> Henning> while still replicating the problem to a minimum.
>> >>
>> >> I don't have that level of control, unfortunately.  When I notice that
>> >> the ipv6 routes have collapsed, I pick a likely seeming node (maybe
>> >> because it had been plugged in recently) and turn off ipv6 olsrd, and
>> >> over 30-60 seconds, magically the routes all come back.  My luck in
>> >> guessing the right node to turn off is a little bit "too good", if you
>> >> know what I mean, so that I am not sure there is anything particularly
>> >> unique about the node I choose.  But, nevertheless, turning it off
>> >> seems to help, generally.
>> >>
>> >> FWIW, I'm including olsrd versions here.  The central machine ::407 is
>> >> running 0.6.6.1, compiled from the tarball.  The nodes have the
>> >> following versions, all built from openwrt routing feed sources.
>> >>
>> >>   https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt
>> >>
>> >> Here is a table listing the frequency of each openwrt version:
>> >>
>> >>       1 0.6.3-3
>> >>      33 0.6.4-1
>> >>       1 0.6.5.1-1
>> >>       1 0.6.5.1-2
>> >>       7 0.6.5.2-1
>> >>       1 0.6.5.3-1
>> >>       2 0.6.5.4-1
>> >>       2 0.6.6-2
>> >>       7 0.6.6-3
>> >>      11 0.6.6.1-1
>> >>
>> >>
>> >> --
>> >> Russell Senior, President
>> >> (spam-protected)
>> >>
>> >> --
>> >> Olsr-users mailing list
>> >> (spam-protected)
>> >> https://lists.olsr.org/mailman/listinfo/olsr-users
>> >
>> > --
>> > Olsr-dev mailing list
>> > (spam-protected)
>> > https://lists.olsr.org/mailman/listinfo/olsr-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.olsr.org/pipermail/olsr-users/attachments/20140328/f01a43c1/attachment.html>


More information about the Olsr-users mailing list