[Olsr-users] [Olsr-dev] olsrd 0.6.6.1 (and earlier) ipv6 problems
Russell Senior
(spam-protected)
Fri Mar 28 12:01:30 CET 2014
I should have done this earlier, but here are my olsrd.conf files. On the
server:
===================================================
IpVersion 6
#Hna4
#{
#}
Hna6
{
0:: 0
}
LinkQualityFishEye 0
LoadPlugin "olsrd_txtinfo.so.0.1"
{
PlParam "port" "7862"
}
#############################################
### OLSRD default interface configuration ###
#############################################
# the default interface section can have the same values as the following
# interface configuration. It will allow you so set common options for all
# interfaces.
InterfaceDefaults {
# Ip4Broadcast 255.255.255.255
}
Interface "ptp" "ptp-udp" "vpn" "iris"
{
# Mode "ether"
}
=====================================================
I am pretty sure that Hna4 { } part had been there uncommented for a
while. The Mode "ether" was uncommented too. When I commented them out,
as above, and restart I see the individual routes on the client, as you
would expect. I had noticed the "route aggregation" and been a little
surprised, but having just moved to a newer version, I wasn't too
suspicious.
On the clients:
=====================================================
IpVersion 6
LinkQualityFishEye 0
Hna6
{
2001:470:e962:xxyy:: 64
}
LoadPlugin "olsrd_txtinfo.so.0.1"
{
PlParam "port" "7862"
}
Interface "br-pub" "ptp"
{
}
=====================================================
When it's working, I see 177 olsrd routes (the 180 figure included some
header/footer lines, apparently) on the server and 176 on the client. But
if I add another node, the routes all collapse still. It is confusing
though. Sometimes, I only see two routes, as below, apparently when Mode
"ether" is in force. It's confusing because sometimes I was seeing the
more complete client routing table even with Mode "ether".
Table: Routes
Destination Gateway IP Metric ETX Interface
::/0 2001:470:e962::407 1 1.000 ptp
2001:470:e962::407/128 2001:470:e962::407 1 1.000 ptp
I am turning Mode "ether" off again, and I seem to get a complete set of
routes (one less than the server) on the clients.
Again, though, if I add one more node, the routes on both the server and
clients collapse. The clients go to zero. The server has routes to one or
sometimes two clients, which vary a little bit.
On Fri, Mar 28, 2014 at 3:09 AM, Henning Rogge <(spam-protected)> wrote:
> Each leaf should have a /128 route for each other leaf...
>
> Olsrd does NOT do any route aggregation.
>
> Can you show me a routing table of a leaf and the txtinfo output when
> everything is fine?
>
> Henning
>
> On Fri, Mar 28, 2014 at 11:06 AM, Russell Senior
> <(spam-protected)> wrote:
> > FWIW, the ipv6 routing tables on the "leaf" nodes are quite short, with
> > mostly just a default route pointing at the central server, when olsrd is
> > working. When the central server has the route collapse, the default
> route
> > on the "leaf" nodes disappears.
> >
> > I am thinking about memory exhaustion, maybe something his helpfully
> killing
> > it off when the size becomes "too large" ... /me goes to look for
> evidence
> > of that.
> >
> >
> > On Fri, Mar 28, 2014 at 3:03 AM, Russell Senior <
> (spam-protected)>
> > wrote:
> >>
> >> The are single hop from the central server, which is the table I've been
> >> posting.
> >>
> >>
> >> On Fri, Mar 28, 2014 at 3:01 AM, Henning Rogge <(spam-protected)>
> wrote:
> >>>
> >>> What?
> >>>
> >>> but your routing tables only contains "ETX 1.0" paths... which means
> >>> they are single hop!
> >>>
> >>> Henning
> >>>
> >>> On Fri, Mar 28, 2014 at 11:00 AM, Russell Senior
> >>> <(spam-protected)> wrote:
> >>> > Without the ipv6 olsrd, the nodes can't route to each other, it
> seems.
> >>> > I
> >>> > picked two I had turned off, and tried ping6'ing between them and got
> >>> > 100%
> >>> > packet loss.
> >>> >
> >>> >
> >>> > On Fri, Mar 28, 2014 at 2:54 AM, Henning Rogge <(spam-protected)>
> >>> > wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> as far as I can see each "leaf" node can see each other leaf node
> over
> >>> >> the OpenVPN, right?
> >>> >>
> >>> >> So you are only using Olsrd to distribute HNAs?
> >>> >>
> >>> >> Henning Rogge
> >>> >>
> >>> >> On Fri, Mar 28, 2014 at 10:48 AM, Russell Senior
> >>> >> <(spam-protected)> wrote:
> >>> >> > The central server, ::407, is running OpenVPN in server mode. The
> >>> >> > "leaf"
> >>> >> > nodes all connect to it via OpenVPN client mode with a tap
> >>> >> > interface.
> >>> >> > We
> >>> >> > statically provision the IPv6 addresses on the vpn.
> >>> >> >
> >>> >> > And yes, the OpenVPN links are still active. We are running an
> IPv4
> >>> >> > instance of olsrd (same version) in parallel and those routes (to
> >>> >> > the
> >>> >> > very
> >>> >> > same devices) are not affected.
> >>> >> >
> >>> >> > We see the problem when particular (though varying) nodes olsrd
> ipv6
> >>> >> > instances are started/stopped. Sometimes the nodes are running
> >>> >> > 0.6.6.1,
> >>> >> > and
> >>> >> > sometimes 0.6.4. It doesn't seem to be specific. The central
> >>> >> > server is
> >>> >> > running 0.6.6.1 now, but we saw the same thing earlier (which is
> why
> >>> >> > I
> >>> >> > upgraded) on 0.6.4.
> >>> >> >
> >>> >> > One other potential clue (it doesn't make very much sense,
> because I
> >>> >> > know
> >>> >> > there are much bigger networks than ours), I've never seen more
> than
> >>> >> > 186
> >>> >> > ipv6 routes on ::407. We seem to see the problem when we try to
> >>> >> > exceed
> >>> >> > that. I'm going to try to confirm that.
> >>> >> >
> >>> >> >
> >>> >> > On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <(spam-protected)>
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> Hi,
> >>> >> >>
> >>> >> >> I must admit that I am not convinced that its an Olsrd bug what
> we
> >>> >> >> are
> >>> >> >> seeing...
> >>> >> >>
> >>> >> >> If I see it correctly Olsrd is running over the VPN interface
> >>> >> >> connection (interface name "vpn"), right?
> >>> >> >>
> >>> >> >> Is the VPN connection between the nodes still active during the
> >>> >> >> route
> >>> >> >> loss? Most of the nodes seem to have direct connections and the
> "30
> >>> >> >> seconds until recovery" sounds like an ETX value slowly going
> down
> >>> >> >> and
> >>> >> >> then dropping the link.
> >>> >> >>
> >>> >> >> Henning
> >>> >> >>
> >>> >> >> On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto
> >>> >> >> <(spam-protected)>
> >>> >> >> wrote:
> >>> >> >> > Hello Russel,
> >>> >> >> >
> >>> >> >> > looking at this:
> >>> >> >> >
> >>> >> >> >
> https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
> >>> >> >> >
> >>> >> >> >
> https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
> >>> >> >> >
> https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
> >>> >> >> >
> >>> >> >> > it looks like IPv6 routes are removed from the olsrd database.
> So
> >>> >> >> > I
> >>> >> >> > is
> >>> >> >> > actually the olsrd daemon involved.
> >>> >> >> >
> >>> >> >> > do you know if there is a previous stable version of olsrd
> where
> >>> >> >> > this
> >>> >> >> > bug/behaviour is not present ?
> >>> >> >> >
> >>> >> >> > In my opinion the fastest way to track the bug is to try
> >>> >> >> > different
> >>> >> >> > versions of olsrd with "git bisect" method.
> >>> >> >> >
> >>> >> >> > The first step is to tell us if there is a version of olsrd
> that
> >>> >> >> > is
> >>> >> >> > not affected by this problem.
> >>> >> >> >
> >>> >> >> > thanks
> >>> >> >> >
> >>> >> >> > I cc: olsrd-dev
> >>> >> >> >
> >>> >> >> > Saverio
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 2014-03-27 10:37 GMT+01:00 Russell Senior
> >>> >> >> > <(spam-protected)>:
> >>> >> >> >>>>>>> "Henning" == Henning Rogge
> >>> >> >> >>>>>>> <(spam-protected)>
> >>> >> >> >>>>>>> writes:
> >>> >> >> >>
> >>> >> >> >> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:
> >>> >> >> >>>> Anybody get a chance to look at the strace? I see a:
> >>> >> >> >>
> >>> >> >> >> Henning> strace and packet dumps are much too lowlevel to
> >>> >> >> >> directly
> >>> >> >> >> Henning> hunt problems like this. Thats why Saverios question
> >>> >> >> >> about
> >>> >> >> >> Henning> txtinfo good, because it gives you a much more
> >>> >> >> >> high-level
> >>> >> >> >> Henning> view on what is going on.
> >>> >> >> >>
> >>> >> >> >> I had not installed the modules previously, so that interface
> >>> >> >> >> wasn't
> >>> >> >> >> immediately available. It is now.
> >>> >> >> >>
> >>> >> >> >> [...]
> >>> >> >> >>
> >>> >> >> >> Henning> Okay, lets get back to the high-level view.
> >>> >> >> >>
> >>> >> >> >> Henning> To interpret the events you described we need a list
> of
> >>> >> >> >> Henning> nodes, with their interface IPs and the connectivity
> >>> >> >> >> between
> >>> >> >> >> Henning> them.
> >>> >> >> >>
> >>> >> >> >> Here is the list of neighbors of 2001:470:e962::407. The
> >>> >> >> >> addresses
> >>> >> >> >> listed are on the public wifi. The OpenVPN addresses of each
> >>> >> >> >> node
> >>> >> >> >> are
> >>> >> >> >> a permutation, e.g. if the public wifi addr is
> >>> >> >> >> 2001:470:e962:wxyz::1,
> >>> >> >> >> then the OpenVPN address of the node is 2001:470:e962::wxyz.
> >>> >> >> >>
> >>> >> >> >> None of the nodes connect directly, everything goes through
> >>> >> >> >> ::407.
> >>> >> >> >>
> >>> >> >> >> From curl -6 http://localhost:$port/neighbors
> >>> >> >> >>
> >>> >> >> >>
> https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt
> >>> >> >> >>
> >>> >> >> >> Henning> I am also a bit worried about your usage of bridges
> >>> >> >> >> Henning> connected to mesh interfaces. Normally you should no
> >>> >> >> >> bridge
> >>> >> >> >> Henning> any interface that OLSR uses for meshing. Mixing
> >>> >> >> >> routing
> >>> >> >> >> Henning> (L3) and bridging (L2) can go wrong in very creative
> >>> >> >> >> ways.
> >>> >> >> >>
> >>> >> >> >> I don't understand how the bridges could be a problem in this
> >>> >> >> >> case.
> >>> >> >> >> This is a hub and spoke topology. One openvpn server in the
> >>> >> >> >> middle,
> >>> >> >> >> nodes at the edges. None of the nodes interconnect otherwise.
> >>> >> >> >> Olsr
> >>> >> >> >> is broadcast on the wifi in case there are any olsrd devices
> >>> >> >> >> nearby,
> >>> >> >> >> but, again, there is no overlap in the wifi coverage (and if
> >>> >> >> >> there
> >>> >> >> >> were physically, they are on different SSIDs and wouldn't
> >>> >> >> >> overlap
> >>> >> >> >> logically).
> >>> >> >> >>
> >>> >> >> >> Can you explain more about what in particularly would make you
> >>> >> >> >> worry?
> >>> >> >> >> This configuration has been stable for us on ipv4 for years
> and
> >>> >> >> >> also
> >>> >> >> >> on ipv6 until very recently, since late 2012 at least. So, I
> >>> >> >> >> suspect
> >>> >> >> >> a bug. Somewhere.
> >>> >> >> >>
> >>> >> >> >> Henning> Txtinfo output would be good (especially /route)
> would
> >>> >> >> >> be
> >>> >> >> >> Henning> good to see... before the problem, during the
> problem
> >>> >> >> >> and
> >>> >> >> >> Henning> after the recovery.
> >>> >> >> >>
> >>> >> >> >> I'm using curl -6 http://localhost:$port/routes to get the
> >>> >> >> >> following
> >>> >> >> >> data, before, during and after turning on an ipv6 olsrd on a
> >>> >> >> >> particular node (2001:470:e962:11c1::1).
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
> >>> >> >> >>
> >>> >> >> >>
> https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
> >>> >> >> >>
> >>> >> >> >>
> https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
> >>> >> >> >>
> >>> >> >> >> Henning> It would also help if you can reduce the number of
> >>> >> >> >> nodes
> >>> >> >> >> Henning> while still replicating the problem to a minimum.
> >>> >> >> >>
> >>> >> >> >> I don't have that level of control, unfortunately. When I
> >>> >> >> >> notice
> >>> >> >> >> that
> >>> >> >> >> the ipv6 routes have collapsed, I pick a likely seeming node
> >>> >> >> >> (maybe
> >>> >> >> >> because it had been plugged in recently) and turn off ipv6
> >>> >> >> >> olsrd,
> >>> >> >> >> and
> >>> >> >> >> over 30-60 seconds, magically the routes all come back. My
> luck
> >>> >> >> >> in
> >>> >> >> >> guessing the right node to turn off is a little bit "too
> good",
> >>> >> >> >> if
> >>> >> >> >> you
> >>> >> >> >> know what I mean, so that I am not sure there is anything
> >>> >> >> >> particularly
> >>> >> >> >> unique about the node I choose. But, nevertheless, turning it
> >>> >> >> >> off
> >>> >> >> >> seems to help, generally.
> >>> >> >> >>
> >>> >> >> >> FWIW, I'm including olsrd versions here. The central machine
> >>> >> >> >> ::407
> >>> >> >> >> is
> >>> >> >> >> running 0.6.6.1, compiled from the tarball. The nodes have
> the
> >>> >> >> >> following versions, all built from openwrt routing feed
> sources.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt
> >>> >> >> >>
> >>> >> >> >> Here is a table listing the frequency of each openwrt version:
> >>> >> >> >>
> >>> >> >> >> 1 0.6.3-3
> >>> >> >> >> 33 0.6.4-1
> >>> >> >> >> 1 0.6.5.1-1
> >>> >> >> >> 1 0.6.5.1-2
> >>> >> >> >> 7 0.6.5.2-1
> >>> >> >> >> 1 0.6.5.3-1
> >>> >> >> >> 2 0.6.5.4-1
> >>> >> >> >> 2 0.6.6-2
> >>> >> >> >> 7 0.6.6-3
> >>> >> >> >> 11 0.6.6.1-1
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> Russell Senior, President
> >>> >> >> >> (spam-protected)
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> Olsr-users mailing list
> >>> >> >> >> (spam-protected)
> >>> >> >> >> https://lists.olsr.org/mailman/listinfo/olsr-users
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Olsr-dev mailing list
> >>> >> >> > (spam-protected)
> >>> >> >> > https://lists.olsr.org/mailman/listinfo/olsr-dev
> >>> >> >
> >>> >> >
> >>> >
> >>> >
> >>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.olsr.org/pipermail/olsr-users/attachments/20140328/4e866b28/attachment.html>
More information about the Olsr-users
mailing list