[Olsr-users] [Olsr-dev] olsrd 0.6.6.1 (and earlier) ipv6 problems

Russell Senior (spam-protected)
Fri Mar 28 12:08:22 CET 2014


Just to clarify "ptp" is the client's OpenVPN interface.  "vpn" is the
server's OpenVPN interface.  "br-pub" typically is the wifi AP interface on
the node device (sometimes with an ethernet interface as well).


On Fri, Mar 28, 2014 at 4:01 AM, Russell Senior
<(spam-protected)>wrote:

> I should have done this earlier, but here are my olsrd.conf files.  On the
> server:
>
> ===================================================
> IpVersion 6
>
> #Hna4
> #{
> #}
>
> Hna6
> {
>         0::     0
> }
>
> LinkQualityFishEye  0
>
> LoadPlugin "olsrd_txtinfo.so.0.1"
> {
>         PlParam "port" "7862"
> }
>
> #############################################
> ### OLSRD default interface configuration ###
> #############################################
> # the default interface section can have the same values as the following
> # interface configuration. It will allow you so set common options for all
> # interfaces.
>
> InterfaceDefaults {
>         # Ip4Broadcast      255.255.255.255
> }
>
> Interface "ptp" "ptp-udp" "vpn" "iris"
> {
> #       Mode "ether"
> }
> =====================================================
>
> I am pretty sure that Hna4 { } part had been there uncommented for a
> while.  The Mode "ether" was uncommented too.  When I commented them out,
> as above, and restart I see the individual routes on the client, as you
> would expect.  I had noticed the "route aggregation" and been a little
> surprised, but having just moved to a newer version, I wasn't too
> suspicious.
>
> On the clients:
>
> =====================================================
>
> IpVersion 6
>
> LinkQualityFishEye 0
>
> Hna6
> {
>         2001:470:e962:xxyy::    64
> }
>
> LoadPlugin "olsrd_txtinfo.so.0.1"
> {
>         PlParam "port" "7862"
> }
>
> Interface "br-pub" "ptp"
> {
> }
> =====================================================
>
> When it's working, I see 177 olsrd routes (the 180 figure included some
> header/footer lines, apparently) on the server and 176 on the client.  But
> if I add another node, the routes all collapse still.  It is confusing
> though.  Sometimes, I only see two routes, as below, apparently when Mode
> "ether" is in force.  It's confusing because sometimes I was seeing the
> more complete client routing table even with Mode "ether".
>
> Table: Routes
> Destination     Gateway IP      Metric  ETX     Interface
> ::/0    2001:470:e962::407      1       1.000   ptp
> 2001:470:e962::407/128  2001:470:e962::407      1       1.000   ptp
>
> I am turning Mode "ether" off again, and I seem to get a complete set of
> routes (one less than the server) on the clients.
>
> Again, though, if I add one more node, the routes on both the server and
> clients collapse.  The clients go to zero.  The server has routes to one or
> sometimes two clients, which vary a little bit.
>
>
>
>
> On Fri, Mar 28, 2014 at 3:09 AM, Henning Rogge <(spam-protected)> wrote:
>
>> Each leaf should have a /128 route for each other leaf...
>>
>> Olsrd does NOT do any route aggregation.
>>
>> Can you show me a routing table of a leaf and the txtinfo output when
>> everything is fine?
>>
>> Henning
>>
>> On Fri, Mar 28, 2014 at 11:06 AM, Russell Senior
>> <(spam-protected)> wrote:
>> > FWIW, the ipv6 routing tables on the "leaf" nodes are quite short, with
>> > mostly just a default route pointing at the central server, when olsrd
>> is
>> > working.  When the central server has the route collapse, the default
>> route
>> > on the "leaf" nodes disappears.
>> >
>> > I am thinking about memory exhaustion, maybe something his helpfully
>> killing
>> > it off when the size becomes "too large" ... /me goes to look for
>> evidence
>> > of that.
>> >
>> >
>> > On Fri, Mar 28, 2014 at 3:03 AM, Russell Senior <
>> (spam-protected)>
>> > wrote:
>> >>
>> >> The are single hop from the central server, which is the table I've
>> been
>> >> posting.
>> >>
>> >>
>> >> On Fri, Mar 28, 2014 at 3:01 AM, Henning Rogge <(spam-protected)>
>> wrote:
>> >>>
>> >>> What?
>> >>>
>> >>> but your routing tables only contains "ETX 1.0" paths... which means
>> >>> they are single hop!
>> >>>
>> >>> Henning
>> >>>
>> >>> On Fri, Mar 28, 2014 at 11:00 AM, Russell Senior
>> >>> <(spam-protected)> wrote:
>> >>> > Without the ipv6 olsrd, the nodes can't route to each other, it
>> seems.
>> >>> > I
>> >>> > picked two I had turned off, and tried ping6'ing between them and
>> got
>> >>> > 100%
>> >>> > packet loss.
>> >>> >
>> >>> >
>> >>> > On Fri, Mar 28, 2014 at 2:54 AM, Henning Rogge <(spam-protected)>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> as far as I can see each "leaf" node can see each other leaf node
>> over
>> >>> >> the OpenVPN, right?
>> >>> >>
>> >>> >> So you are only using Olsrd to distribute HNAs?
>> >>> >>
>> >>> >> Henning Rogge
>> >>> >>
>> >>> >> On Fri, Mar 28, 2014 at 10:48 AM, Russell Senior
>> >>> >> <(spam-protected)> wrote:
>> >>> >> > The central server, ::407, is running OpenVPN in server mode.
>>  The
>> >>> >> > "leaf"
>> >>> >> > nodes all connect to it via OpenVPN client mode with a tap
>> >>> >> > interface.
>> >>> >> > We
>> >>> >> > statically provision the IPv6 addresses on the vpn.
>> >>> >> >
>> >>> >> > And yes, the OpenVPN links are still active.  We are running an
>> IPv4
>> >>> >> > instance of olsrd (same version) in parallel and those routes (to
>> >>> >> > the
>> >>> >> > very
>> >>> >> > same devices) are not affected.
>> >>> >> >
>> >>> >> > We see the problem when particular (though varying) nodes olsrd
>> ipv6
>> >>> >> > instances are started/stopped.  Sometimes the nodes are running
>> >>> >> > 0.6.6.1,
>> >>> >> > and
>> >>> >> > sometimes 0.6.4.  It doesn't seem to be specific.  The central
>> >>> >> > server is
>> >>> >> > running 0.6.6.1 now, but we saw the same thing earlier (which is
>> why
>> >>> >> > I
>> >>> >> > upgraded) on 0.6.4.
>> >>> >> >
>> >>> >> > One other potential clue (it doesn't make very much sense,
>> because I
>> >>> >> > know
>> >>> >> > there are much bigger networks than ours), I've never seen more
>> than
>> >>> >> > 186
>> >>> >> > ipv6 routes on ::407.  We seem to see the problem when we try to
>> >>> >> > exceed
>> >>> >> > that.  I'm going to try to confirm that.
>> >>> >> >
>> >>> >> >
>> >>> >> > On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <(spam-protected)
>> >
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> Hi,
>> >>> >> >>
>> >>> >> >> I must admit that I am not convinced that its an Olsrd bug what
>> we
>> >>> >> >> are
>> >>> >> >> seeing...
>> >>> >> >>
>> >>> >> >> If I see it correctly Olsrd is running over the VPN interface
>> >>> >> >> connection (interface name "vpn"), right?
>> >>> >> >>
>> >>> >> >> Is the VPN connection between the nodes still active during the
>> >>> >> >> route
>> >>> >> >> loss? Most of the nodes seem to have direct connections and the
>> "30
>> >>> >> >> seconds until recovery" sounds like an ETX value slowly going
>> down
>> >>> >> >> and
>> >>> >> >> then dropping the link.
>> >>> >> >>
>> >>> >> >> Henning
>> >>> >> >>
>> >>> >> >> On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto
>> >>> >> >> <(spam-protected)>
>> >>> >> >> wrote:
>> >>> >> >> > Hello Russel,
>> >>> >> >> >
>> >>> >> >> > looking at this:
>> >>> >> >> >
>> >>> >> >> >
>> https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>> >>> >> >> >
>> >>> >> >> >
>> https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>> >>> >> >> >
>> https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>> >>> >> >> >
>> >>> >> >> > it looks like IPv6 routes are removed from the olsrd
>> database. So
>> >>> >> >> > I
>> >>> >> >> > is
>> >>> >> >> > actually the olsrd daemon involved.
>> >>> >> >> >
>> >>> >> >> > do you know if there is a previous stable version of olsrd
>> where
>> >>> >> >> > this
>> >>> >> >> > bug/behaviour is not present ?
>> >>> >> >> >
>> >>> >> >> > In my opinion the fastest way to track the bug is to try
>> >>> >> >> > different
>> >>> >> >> > versions of olsrd with "git bisect" method.
>> >>> >> >> >
>> >>> >> >> > The first step is to tell us if there is a version of olsrd
>> that
>> >>> >> >> > is
>> >>> >> >> > not affected by this problem.
>> >>> >> >> >
>> >>> >> >> > thanks
>> >>> >> >> >
>> >>> >> >> > I cc: olsrd-dev
>> >>> >> >> >
>> >>> >> >> > Saverio
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 2014-03-27 10:37 GMT+01:00 Russell Senior
>> >>> >> >> > <(spam-protected)>:
>> >>> >> >> >>>>>>> "Henning" == Henning Rogge
>> >>> >> >> >>>>>>> <(spam-protected)>
>> >>> >> >> >>>>>>> writes:
>> >>> >> >> >>
>> >>> >> >> >> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:
>> >>> >> >> >>>> Anybody get a chance to look at the strace?  I see a:
>> >>> >> >> >>
>> >>> >> >> >> Henning> strace and packet dumps are much too lowlevel to
>> >>> >> >> >> directly
>> >>> >> >> >> Henning> hunt problems like this. Thats why Saverios question
>> >>> >> >> >> about
>> >>> >> >> >> Henning> txtinfo good, because it gives you a much more
>> >>> >> >> >> high-level
>> >>> >> >> >> Henning> view on what is going on.
>> >>> >> >> >>
>> >>> >> >> >> I had not installed the modules previously, so that interface
>> >>> >> >> >> wasn't
>> >>> >> >> >> immediately available.  It is now.
>> >>> >> >> >>
>> >>> >> >> >> [...]
>> >>> >> >> >>
>> >>> >> >> >> Henning> Okay, lets get back to the high-level view.
>> >>> >> >> >>
>> >>> >> >> >> Henning> To interpret the events you described we need a
>> list of
>> >>> >> >> >> Henning> nodes, with their interface IPs and the connectivity
>> >>> >> >> >> between
>> >>> >> >> >> Henning> them.
>> >>> >> >> >>
>> >>> >> >> >> Here is the list of neighbors of 2001:470:e962::407.  The
>> >>> >> >> >> addresses
>> >>> >> >> >> listed are on the public wifi.  The OpenVPN addresses of each
>> >>> >> >> >> node
>> >>> >> >> >> are
>> >>> >> >> >> a permutation, e.g. if the public wifi addr is
>> >>> >> >> >> 2001:470:e962:wxyz::1,
>> >>> >> >> >> then the OpenVPN address of the node is 2001:470:e962::wxyz.
>> >>> >> >> >>
>> >>> >> >> >> None of the nodes connect directly, everything goes through
>> >>> >> >> >> ::407.
>> >>> >> >> >>
>> >>> >> >> >> From curl -6 http://localhost:$port/neighbors
>> >>> >> >> >>
>> >>> >> >> >>
>> https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt
>> >>> >> >> >>
>> >>> >> >> >> Henning> I am also a bit worried about your usage of bridges
>> >>> >> >> >> Henning> connected to mesh interfaces.  Normally you should
>> no
>> >>> >> >> >> bridge
>> >>> >> >> >> Henning> any interface that OLSR uses for meshing.  Mixing
>> >>> >> >> >> routing
>> >>> >> >> >> Henning> (L3) and bridging (L2) can go wrong in very creative
>> >>> >> >> >> ways.
>> >>> >> >> >>
>> >>> >> >> >> I don't understand how the bridges could be a problem in this
>> >>> >> >> >> case.
>> >>> >> >> >> This is a hub and spoke topology.  One openvpn server in the
>> >>> >> >> >> middle,
>> >>> >> >> >> nodes at the edges.  None of the nodes interconnect
>> otherwise.
>> >>> >> >> >> Olsr
>> >>> >> >> >> is broadcast on the wifi in case there are any olsrd devices
>> >>> >> >> >> nearby,
>> >>> >> >> >> but, again, there is no overlap in the wifi coverage (and if
>> >>> >> >> >> there
>> >>> >> >> >> were physically, they are on different SSIDs and wouldn't
>> >>> >> >> >> overlap
>> >>> >> >> >> logically).
>> >>> >> >> >>
>> >>> >> >> >> Can you explain more about what in particularly would make
>> you
>> >>> >> >> >> worry?
>> >>> >> >> >> This configuration has been stable for us on ipv4 for years
>> and
>> >>> >> >> >> also
>> >>> >> >> >> on ipv6 until very recently, since late 2012 at least.  So, I
>> >>> >> >> >> suspect
>> >>> >> >> >> a bug.  Somewhere.
>> >>> >> >> >>
>> >>> >> >> >> Henning> Txtinfo output would be good (especially /route)
>> would
>> >>> >> >> >> be
>> >>> >> >> >> Henning> good to see...  before the problem, during the
>> problem
>> >>> >> >> >> and
>> >>> >> >> >> Henning> after the recovery.
>> >>> >> >> >>
>> >>> >> >> >> I'm using curl -6 http://localhost:$port/routes to get the
>> >>> >> >> >> following
>> >>> >> >> >> data, before, during and after turning on an ipv6 olsrd on a
>> >>> >> >> >> particular node (2001:470:e962:11c1::1).
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>> >>> >> >> >>
>> >>> >> >> >>
>> https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>> >>> >> >> >>
>> >>> >> >> >>
>> https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>> >>> >> >> >>
>> >>> >> >> >> Henning> It would also help if you can reduce the number of
>> >>> >> >> >> nodes
>> >>> >> >> >> Henning> while still replicating the problem to a minimum.
>> >>> >> >> >>
>> >>> >> >> >> I don't have that level of control, unfortunately.  When I
>> >>> >> >> >> notice
>> >>> >> >> >> that
>> >>> >> >> >> the ipv6 routes have collapsed, I pick a likely seeming node
>> >>> >> >> >> (maybe
>> >>> >> >> >> because it had been plugged in recently) and turn off ipv6
>> >>> >> >> >> olsrd,
>> >>> >> >> >> and
>> >>> >> >> >> over 30-60 seconds, magically the routes all come back.  My
>> luck
>> >>> >> >> >> in
>> >>> >> >> >> guessing the right node to turn off is a little bit "too
>> good",
>> >>> >> >> >> if
>> >>> >> >> >> you
>> >>> >> >> >> know what I mean, so that I am not sure there is anything
>> >>> >> >> >> particularly
>> >>> >> >> >> unique about the node I choose.  But, nevertheless, turning
>> it
>> >>> >> >> >> off
>> >>> >> >> >> seems to help, generally.
>> >>> >> >> >>
>> >>> >> >> >> FWIW, I'm including olsrd versions here.  The central machine
>> >>> >> >> >> ::407
>> >>> >> >> >> is
>> >>> >> >> >> running 0.6.6.1, compiled from the tarball.  The nodes have
>> the
>> >>> >> >> >> following versions, all built from openwrt routing feed
>> sources.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt
>> >>> >> >> >>
>> >>> >> >> >> Here is a table listing the frequency of each openwrt
>> version:
>> >>> >> >> >>
>> >>> >> >> >>       1 0.6.3-3
>> >>> >> >> >>      33 0.6.4-1
>> >>> >> >> >>       1 0.6.5.1-1
>> >>> >> >> >>       1 0.6.5.1-2
>> >>> >> >> >>       7 0.6.5.2-1
>> >>> >> >> >>       1 0.6.5.3-1
>> >>> >> >> >>       2 0.6.5.4-1
>> >>> >> >> >>       2 0.6.6-2
>> >>> >> >> >>       7 0.6.6-3
>> >>> >> >> >>      11 0.6.6.1-1
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Russell Senior, President
>> >>> >> >> >> (spam-protected)
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Olsr-users mailing list
>> >>> >> >> >> (spam-protected)
>> >>> >> >> >> https://lists.olsr.org/mailman/listinfo/olsr-users
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Olsr-dev mailing list
>> >>> >> >> > (spam-protected)
>> >>> >> >> > https://lists.olsr.org/mailman/listinfo/olsr-dev
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.olsr.org/pipermail/olsr-users/attachments/20140328/09e14efe/attachment.html>


More information about the Olsr-users mailing list