[Olsr-users] [Olsr-dev] olsrd 0.6.6.1 (and earlier) ipv6 problems
Russell Senior
(spam-protected)
Fri Mar 28 11:02:16 CET 2014
No, the central server (::407) is a beefy 8-core box with lots of RAM. The
nodes are mostly relatively well-endowed ar71xx devices with 128 meg of RAM.
On Fri, Mar 28, 2014 at 2:58 AM, Henning Rogge <(spam-protected)> wrote:
> Hi,
>
> hmm... 180 routes... any evidence that one of the devices is out of
> memory? Shouldn't be, just to be careful.
>
> Could it be that one of the Olsrd (or more) crashes and is restarted
> by a watchdog script?
>
> Henning
>
> On Fri, Mar 28, 2014 at 10:56 AM, Russell Senior
> <(spam-protected)> wrote:
> > Yes, as reported by curl -6 http://localhost:$port/routes | wc -l, if I
> try
> > to exceed 180 routes, I see the collapse. I can turn off what seem like
> > perfectly working devices, and turn on devices that previously seemed to
> > cause problems, and as long as I stay under 180 routes, it's all good.
> If I
> > try to add another device, boom, collapse.
> >
> >
> > On Fri, Mar 28, 2014 at 2:48 AM, Russell Senior <
> (spam-protected)>
> > wrote:
> >>
> >> The central server, ::407, is running OpenVPN in server mode. The
> "leaf"
> >> nodes all connect to it via OpenVPN client mode with a tap interface.
> We
> >> statically provision the IPv6 addresses on the vpn.
> >>
> >> And yes, the OpenVPN links are still active. We are running an IPv4
> >> instance of olsrd (same version) in parallel and those routes (to the
> very
> >> same devices) are not affected.
> >>
> >> We see the problem when particular (though varying) nodes olsrd ipv6
> >> instances are started/stopped. Sometimes the nodes are running
> 0.6.6.1, and
> >> sometimes 0.6.4. It doesn't seem to be specific. The central server is
> >> running 0.6.6.1 now, but we saw the same thing earlier (which is why I
> >> upgraded) on 0.6.4.
> >>
> >> One other potential clue (it doesn't make very much sense, because I
> know
> >> there are much bigger networks than ours), I've never seen more than 186
> >> ipv6 routes on ::407. We seem to see the problem when we try to exceed
> >> that. I'm going to try to confirm that.
> >>
> >>
> >> On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <(spam-protected)>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I must admit that I am not convinced that its an Olsrd bug what we are
> >>> seeing...
> >>>
> >>> If I see it correctly Olsrd is running over the VPN interface
> >>> connection (interface name "vpn"), right?
> >>>
> >>> Is the VPN connection between the nodes still active during the route
> >>> loss? Most of the nodes seem to have direct connections and the "30
> >>> seconds until recovery" sounds like an ETX value slowly going down and
> >>> then dropping the link.
> >>>
> >>> Henning
> >>>
> >>> On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto <(spam-protected)>
> >>> wrote:
> >>> > Hello Russel,
> >>> >
> >>> > looking at this:
> >>> > https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
> >>> > https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
> >>> > https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
> >>> >
> >>> > it looks like IPv6 routes are removed from the olsrd database. So I
> is
> >>> > actually the olsrd daemon involved.
> >>> >
> >>> > do you know if there is a previous stable version of olsrd where this
> >>> > bug/behaviour is not present ?
> >>> >
> >>> > In my opinion the fastest way to track the bug is to try different
> >>> > versions of olsrd with "git bisect" method.
> >>> >
> >>> > The first step is to tell us if there is a version of olsrd that is
> >>> > not affected by this problem.
> >>> >
> >>> > thanks
> >>> >
> >>> > I cc: olsrd-dev
> >>> >
> >>> > Saverio
> >>> >
> >>> >
> >>> > 2014-03-27 10:37 GMT+01:00 Russell Senior <(spam-protected)
> >:
> >>> >>>>>>> "Henning" == Henning Rogge <(spam-protected)>
> >>> >>>>>>> writes:
> >>> >>
> >>> >> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:
> >>> >>>> Anybody get a chance to look at the strace? I see a:
> >>> >>
> >>> >> Henning> strace and packet dumps are much too lowlevel to directly
> >>> >> Henning> hunt problems like this. Thats why Saverios question about
> >>> >> Henning> txtinfo good, because it gives you a much more high-level
> >>> >> Henning> view on what is going on.
> >>> >>
> >>> >> I had not installed the modules previously, so that interface wasn't
> >>> >> immediately available. It is now.
> >>> >>
> >>> >> [...]
> >>> >>
> >>> >> Henning> Okay, lets get back to the high-level view.
> >>> >>
> >>> >> Henning> To interpret the events you described we need a list of
> >>> >> Henning> nodes, with their interface IPs and the connectivity
> between
> >>> >> Henning> them.
> >>> >>
> >>> >> Here is the list of neighbors of 2001:470:e962::407. The addresses
> >>> >> listed are on the public wifi. The OpenVPN addresses of each node
> are
> >>> >> a permutation, e.g. if the public wifi addr is
> 2001:470:e962:wxyz::1,
> >>> >> then the OpenVPN address of the node is 2001:470:e962::wxyz.
> >>> >>
> >>> >> None of the nodes connect directly, everything goes through ::407.
> >>> >>
> >>> >> From curl -6 http://localhost:$port/neighbors
> >>> >>
> >>> >> https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt
> >>> >>
> >>> >> Henning> I am also a bit worried about your usage of bridges
> >>> >> Henning> connected to mesh interfaces. Normally you should no
> bridge
> >>> >> Henning> any interface that OLSR uses for meshing. Mixing routing
> >>> >> Henning> (L3) and bridging (L2) can go wrong in very creative ways.
> >>> >>
> >>> >> I don't understand how the bridges could be a problem in this case.
> >>> >> This is a hub and spoke topology. One openvpn server in the middle,
> >>> >> nodes at the edges. None of the nodes interconnect otherwise. Olsr
> >>> >> is broadcast on the wifi in case there are any olsrd devices nearby,
> >>> >> but, again, there is no overlap in the wifi coverage (and if there
> >>> >> were physically, they are on different SSIDs and wouldn't overlap
> >>> >> logically).
> >>> >>
> >>> >> Can you explain more about what in particularly would make you
> worry?
> >>> >> This configuration has been stable for us on ipv4 for years and also
> >>> >> on ipv6 until very recently, since late 2012 at least. So, I
> suspect
> >>> >> a bug. Somewhere.
> >>> >>
> >>> >> Henning> Txtinfo output would be good (especially /route) would be
> >>> >> Henning> good to see... before the problem, during the problem and
> >>> >> Henning> after the recovery.
> >>> >>
> >>> >> I'm using curl -6 http://localhost:$port/routes to get the
> following
> >>> >> data, before, during and after turning on an ipv6 olsrd on a
> >>> >> particular node (2001:470:e962:11c1::1).
> >>> >>
> >>> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
> >>> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
> >>> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
> >>> >>
> >>> >> Henning> It would also help if you can reduce the number of nodes
> >>> >> Henning> while still replicating the problem to a minimum.
> >>> >>
> >>> >> I don't have that level of control, unfortunately. When I notice
> that
> >>> >> the ipv6 routes have collapsed, I pick a likely seeming node (maybe
> >>> >> because it had been plugged in recently) and turn off ipv6 olsrd,
> and
> >>> >> over 30-60 seconds, magically the routes all come back. My luck in
> >>> >> guessing the right node to turn off is a little bit "too good", if
> you
> >>> >> know what I mean, so that I am not sure there is anything
> particularly
> >>> >> unique about the node I choose. But, nevertheless, turning it off
> >>> >> seems to help, generally.
> >>> >>
> >>> >> FWIW, I'm including olsrd versions here. The central machine ::407
> is
> >>> >> running 0.6.6.1, compiled from the tarball. The nodes have the
> >>> >> following versions, all built from openwrt routing feed sources.
> >>> >>
> >>> >>
> https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt
> >>> >>
> >>> >> Here is a table listing the frequency of each openwrt version:
> >>> >>
> >>> >> 1 0.6.3-3
> >>> >> 33 0.6.4-1
> >>> >> 1 0.6.5.1-1
> >>> >> 1 0.6.5.1-2
> >>> >> 7 0.6.5.2-1
> >>> >> 1 0.6.5.3-1
> >>> >> 2 0.6.5.4-1
> >>> >> 2 0.6.6-2
> >>> >> 7 0.6.6-3
> >>> >> 11 0.6.6.1-1
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Russell Senior, President
> >>> >> (spam-protected)
> >>> >>
> >>> >> --
> >>> >> Olsr-users mailing list
> >>> >> (spam-protected)
> >>> >> https://lists.olsr.org/mailman/listinfo/olsr-users
> >>> >
> >>> > --
> >>> > Olsr-dev mailing list
> >>> > (spam-protected)
> >>> > https://lists.olsr.org/mailman/listinfo/olsr-dev
> >>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.olsr.org/pipermail/olsr-users/attachments/20140328/0dd1c74d/attachment.html>
More information about the Olsr-users
mailing list