[Olsr-users] [Olsr-dev] olsrd 0.6.6.1 (and earlier) ipv6 problems
Henning Rogge
(spam-protected)
Fri Mar 28 11:09:33 CET 2014
Each leaf should have a /128 route for each other leaf...
Olsrd does NOT do any route aggregation.
Can you show me a routing table of a leaf and the txtinfo output when
everything is fine?
Henning
On Fri, Mar 28, 2014 at 11:06 AM, Russell Senior
<(spam-protected)> wrote:
> FWIW, the ipv6 routing tables on the "leaf" nodes are quite short, with
> mostly just a default route pointing at the central server, when olsrd is
> working. When the central server has the route collapse, the default route
> on the "leaf" nodes disappears.
>
> I am thinking about memory exhaustion, maybe something his helpfully killing
> it off when the size becomes "too large" ... /me goes to look for evidence
> of that.
>
>
> On Fri, Mar 28, 2014 at 3:03 AM, Russell Senior <(spam-protected)>
> wrote:
>>
>> The are single hop from the central server, which is the table I've been
>> posting.
>>
>>
>> On Fri, Mar 28, 2014 at 3:01 AM, Henning Rogge <(spam-protected)> wrote:
>>>
>>> What?
>>>
>>> but your routing tables only contains "ETX 1.0" paths... which means
>>> they are single hop!
>>>
>>> Henning
>>>
>>> On Fri, Mar 28, 2014 at 11:00 AM, Russell Senior
>>> <(spam-protected)> wrote:
>>> > Without the ipv6 olsrd, the nodes can't route to each other, it seems.
>>> > I
>>> > picked two I had turned off, and tried ping6'ing between them and got
>>> > 100%
>>> > packet loss.
>>> >
>>> >
>>> > On Fri, Mar 28, 2014 at 2:54 AM, Henning Rogge <(spam-protected)>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> as far as I can see each "leaf" node can see each other leaf node over
>>> >> the OpenVPN, right?
>>> >>
>>> >> So you are only using Olsrd to distribute HNAs?
>>> >>
>>> >> Henning Rogge
>>> >>
>>> >> On Fri, Mar 28, 2014 at 10:48 AM, Russell Senior
>>> >> <(spam-protected)> wrote:
>>> >> > The central server, ::407, is running OpenVPN in server mode. The
>>> >> > "leaf"
>>> >> > nodes all connect to it via OpenVPN client mode with a tap
>>> >> > interface.
>>> >> > We
>>> >> > statically provision the IPv6 addresses on the vpn.
>>> >> >
>>> >> > And yes, the OpenVPN links are still active. We are running an IPv4
>>> >> > instance of olsrd (same version) in parallel and those routes (to
>>> >> > the
>>> >> > very
>>> >> > same devices) are not affected.
>>> >> >
>>> >> > We see the problem when particular (though varying) nodes olsrd ipv6
>>> >> > instances are started/stopped. Sometimes the nodes are running
>>> >> > 0.6.6.1,
>>> >> > and
>>> >> > sometimes 0.6.4. It doesn't seem to be specific. The central
>>> >> > server is
>>> >> > running 0.6.6.1 now, but we saw the same thing earlier (which is why
>>> >> > I
>>> >> > upgraded) on 0.6.4.
>>> >> >
>>> >> > One other potential clue (it doesn't make very much sense, because I
>>> >> > know
>>> >> > there are much bigger networks than ours), I've never seen more than
>>> >> > 186
>>> >> > ipv6 routes on ::407. We seem to see the problem when we try to
>>> >> > exceed
>>> >> > that. I'm going to try to confirm that.
>>> >> >
>>> >> >
>>> >> > On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <(spam-protected)>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi,
>>> >> >>
>>> >> >> I must admit that I am not convinced that its an Olsrd bug what we
>>> >> >> are
>>> >> >> seeing...
>>> >> >>
>>> >> >> If I see it correctly Olsrd is running over the VPN interface
>>> >> >> connection (interface name "vpn"), right?
>>> >> >>
>>> >> >> Is the VPN connection between the nodes still active during the
>>> >> >> route
>>> >> >> loss? Most of the nodes seem to have direct connections and the "30
>>> >> >> seconds until recovery" sounds like an ETX value slowly going down
>>> >> >> and
>>> >> >> then dropping the link.
>>> >> >>
>>> >> >> Henning
>>> >> >>
>>> >> >> On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto
>>> >> >> <(spam-protected)>
>>> >> >> wrote:
>>> >> >> > Hello Russel,
>>> >> >> >
>>> >> >> > looking at this:
>>> >> >> >
>>> >> >> > https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>>> >> >> >
>>> >> >> > https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>>> >> >> > https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>>> >> >> >
>>> >> >> > it looks like IPv6 routes are removed from the olsrd database. So
>>> >> >> > I
>>> >> >> > is
>>> >> >> > actually the olsrd daemon involved.
>>> >> >> >
>>> >> >> > do you know if there is a previous stable version of olsrd where
>>> >> >> > this
>>> >> >> > bug/behaviour is not present ?
>>> >> >> >
>>> >> >> > In my opinion the fastest way to track the bug is to try
>>> >> >> > different
>>> >> >> > versions of olsrd with "git bisect" method.
>>> >> >> >
>>> >> >> > The first step is to tell us if there is a version of olsrd that
>>> >> >> > is
>>> >> >> > not affected by this problem.
>>> >> >> >
>>> >> >> > thanks
>>> >> >> >
>>> >> >> > I cc: olsrd-dev
>>> >> >> >
>>> >> >> > Saverio
>>> >> >> >
>>> >> >> >
>>> >> >> > 2014-03-27 10:37 GMT+01:00 Russell Senior
>>> >> >> > <(spam-protected)>:
>>> >> >> >>>>>>> "Henning" == Henning Rogge
>>> >> >> >>>>>>> <(spam-protected)>
>>> >> >> >>>>>>> writes:
>>> >> >> >>
>>> >> >> >> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:
>>> >> >> >>>> Anybody get a chance to look at the strace? I see a:
>>> >> >> >>
>>> >> >> >> Henning> strace and packet dumps are much too lowlevel to
>>> >> >> >> directly
>>> >> >> >> Henning> hunt problems like this. Thats why Saverios question
>>> >> >> >> about
>>> >> >> >> Henning> txtinfo good, because it gives you a much more
>>> >> >> >> high-level
>>> >> >> >> Henning> view on what is going on.
>>> >> >> >>
>>> >> >> >> I had not installed the modules previously, so that interface
>>> >> >> >> wasn't
>>> >> >> >> immediately available. It is now.
>>> >> >> >>
>>> >> >> >> [...]
>>> >> >> >>
>>> >> >> >> Henning> Okay, lets get back to the high-level view.
>>> >> >> >>
>>> >> >> >> Henning> To interpret the events you described we need a list of
>>> >> >> >> Henning> nodes, with their interface IPs and the connectivity
>>> >> >> >> between
>>> >> >> >> Henning> them.
>>> >> >> >>
>>> >> >> >> Here is the list of neighbors of 2001:470:e962::407. The
>>> >> >> >> addresses
>>> >> >> >> listed are on the public wifi. The OpenVPN addresses of each
>>> >> >> >> node
>>> >> >> >> are
>>> >> >> >> a permutation, e.g. if the public wifi addr is
>>> >> >> >> 2001:470:e962:wxyz::1,
>>> >> >> >> then the OpenVPN address of the node is 2001:470:e962::wxyz.
>>> >> >> >>
>>> >> >> >> None of the nodes connect directly, everything goes through
>>> >> >> >> ::407.
>>> >> >> >>
>>> >> >> >> From curl -6 http://localhost:$port/neighbors
>>> >> >> >>
>>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt
>>> >> >> >>
>>> >> >> >> Henning> I am also a bit worried about your usage of bridges
>>> >> >> >> Henning> connected to mesh interfaces. Normally you should no
>>> >> >> >> bridge
>>> >> >> >> Henning> any interface that OLSR uses for meshing. Mixing
>>> >> >> >> routing
>>> >> >> >> Henning> (L3) and bridging (L2) can go wrong in very creative
>>> >> >> >> ways.
>>> >> >> >>
>>> >> >> >> I don't understand how the bridges could be a problem in this
>>> >> >> >> case.
>>> >> >> >> This is a hub and spoke topology. One openvpn server in the
>>> >> >> >> middle,
>>> >> >> >> nodes at the edges. None of the nodes interconnect otherwise.
>>> >> >> >> Olsr
>>> >> >> >> is broadcast on the wifi in case there are any olsrd devices
>>> >> >> >> nearby,
>>> >> >> >> but, again, there is no overlap in the wifi coverage (and if
>>> >> >> >> there
>>> >> >> >> were physically, they are on different SSIDs and wouldn't
>>> >> >> >> overlap
>>> >> >> >> logically).
>>> >> >> >>
>>> >> >> >> Can you explain more about what in particularly would make you
>>> >> >> >> worry?
>>> >> >> >> This configuration has been stable for us on ipv4 for years and
>>> >> >> >> also
>>> >> >> >> on ipv6 until very recently, since late 2012 at least. So, I
>>> >> >> >> suspect
>>> >> >> >> a bug. Somewhere.
>>> >> >> >>
>>> >> >> >> Henning> Txtinfo output would be good (especially /route) would
>>> >> >> >> be
>>> >> >> >> Henning> good to see... before the problem, during the problem
>>> >> >> >> and
>>> >> >> >> Henning> after the recovery.
>>> >> >> >>
>>> >> >> >> I'm using curl -6 http://localhost:$port/routes to get the
>>> >> >> >> following
>>> >> >> >> data, before, during and after turning on an ipv6 olsrd on a
>>> >> >> >> particular node (2001:470:e962:11c1::1).
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>>> >> >> >>
>>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>>> >> >> >>
>>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>>> >> >> >>
>>> >> >> >> Henning> It would also help if you can reduce the number of
>>> >> >> >> nodes
>>> >> >> >> Henning> while still replicating the problem to a minimum.
>>> >> >> >>
>>> >> >> >> I don't have that level of control, unfortunately. When I
>>> >> >> >> notice
>>> >> >> >> that
>>> >> >> >> the ipv6 routes have collapsed, I pick a likely seeming node
>>> >> >> >> (maybe
>>> >> >> >> because it had been plugged in recently) and turn off ipv6
>>> >> >> >> olsrd,
>>> >> >> >> and
>>> >> >> >> over 30-60 seconds, magically the routes all come back. My luck
>>> >> >> >> in
>>> >> >> >> guessing the right node to turn off is a little bit "too good",
>>> >> >> >> if
>>> >> >> >> you
>>> >> >> >> know what I mean, so that I am not sure there is anything
>>> >> >> >> particularly
>>> >> >> >> unique about the node I choose. But, nevertheless, turning it
>>> >> >> >> off
>>> >> >> >> seems to help, generally.
>>> >> >> >>
>>> >> >> >> FWIW, I'm including olsrd versions here. The central machine
>>> >> >> >> ::407
>>> >> >> >> is
>>> >> >> >> running 0.6.6.1, compiled from the tarball. The nodes have the
>>> >> >> >> following versions, all built from openwrt routing feed sources.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt
>>> >> >> >>
>>> >> >> >> Here is a table listing the frequency of each openwrt version:
>>> >> >> >>
>>> >> >> >> 1 0.6.3-3
>>> >> >> >> 33 0.6.4-1
>>> >> >> >> 1 0.6.5.1-1
>>> >> >> >> 1 0.6.5.1-2
>>> >> >> >> 7 0.6.5.2-1
>>> >> >> >> 1 0.6.5.3-1
>>> >> >> >> 2 0.6.5.4-1
>>> >> >> >> 2 0.6.6-2
>>> >> >> >> 7 0.6.6-3
>>> >> >> >> 11 0.6.6.1-1
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Russell Senior, President
>>> >> >> >> (spam-protected)
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Olsr-users mailing list
>>> >> >> >> (spam-protected)
>>> >> >> >> https://lists.olsr.org/mailman/listinfo/olsr-users
>>> >> >> >
>>> >> >> > --
>>> >> >> > Olsr-dev mailing list
>>> >> >> > (spam-protected)
>>> >> >> > https://lists.olsr.org/mailman/listinfo/olsr-dev
>>> >> >
>>> >> >
>>> >
>>> >
>>
>>
>
More information about the Olsr-users
mailing list