[Olsr-users] [Olsr-dev] olsrd 0.6.6.1 (and earlier) ipv6 problems
Henning Rogge
(spam-protected)
Fri Mar 28 12:23:00 CET 2014
Hi,
mode "ether" should only be used by a group of OLSRd that run on the
same ethernet switch... which means everyone can see everyone else.
It suppress some forwarding of incoming OLSR messages because it
assumes that everyone on this interface has already seen the message
anyways.
If the "hub" had "mode ether" activated it could be a good explanation
what happened.
Henning
On Fri, Mar 28, 2014 at 12:01 PM, Russell Senior
<(spam-protected)> wrote:
> I should have done this earlier, but here are my olsrd.conf files. On the
> server:
>
> ===================================================
> IpVersion 6
>
> #Hna4
> #{
> #}
>
> Hna6
> {
> 0:: 0
> }
>
> LinkQualityFishEye 0
>
> LoadPlugin "olsrd_txtinfo.so.0.1"
> {
> PlParam "port" "7862"
> }
>
> #############################################
> ### OLSRD default interface configuration ###
> #############################################
> # the default interface section can have the same values as the following
> # interface configuration. It will allow you so set common options for all
> # interfaces.
>
> InterfaceDefaults {
> # Ip4Broadcast 255.255.255.255
> }
>
> Interface "ptp" "ptp-udp" "vpn" "iris"
> {
> # Mode "ether"
> }
> =====================================================
>
> I am pretty sure that Hna4 { } part had been there uncommented for a while.
> The Mode "ether" was uncommented too. When I commented them out, as above,
> and restart I see the individual routes on the client, as you would expect.
> I had noticed the "route aggregation" and been a little surprised, but
> having just moved to a newer version, I wasn't too suspicious.
>
> On the clients:
>
> =====================================================
>
> IpVersion 6
>
> LinkQualityFishEye 0
>
> Hna6
> {
> 2001:470:e962:xxyy:: 64
> }
>
> LoadPlugin "olsrd_txtinfo.so.0.1"
> {
> PlParam "port" "7862"
> }
>
> Interface "br-pub" "ptp"
> {
> }
> =====================================================
>
> When it's working, I see 177 olsrd routes (the 180 figure included some
> header/footer lines, apparently) on the server and 176 on the client. But
> if I add another node, the routes all collapse still. It is confusing
> though. Sometimes, I only see two routes, as below, apparently when Mode
> "ether" is in force. It's confusing because sometimes I was seeing the more
> complete client routing table even with Mode "ether".
>
> Table: Routes
> Destination Gateway IP Metric ETX Interface
> ::/0 2001:470:e962::407 1 1.000 ptp
> 2001:470:e962::407/128 2001:470:e962::407 1 1.000 ptp
>
> I am turning Mode "ether" off again, and I seem to get a complete set of
> routes (one less than the server) on the clients.
>
> Again, though, if I add one more node, the routes on both the server and
> clients collapse. The clients go to zero. The server has routes to one or
> sometimes two clients, which vary a little bit.
>
>
>
>
> On Fri, Mar 28, 2014 at 3:09 AM, Henning Rogge <(spam-protected)> wrote:
>>
>> Each leaf should have a /128 route for each other leaf...
>>
>> Olsrd does NOT do any route aggregation.
>>
>> Can you show me a routing table of a leaf and the txtinfo output when
>> everything is fine?
>>
>> Henning
>>
>> On Fri, Mar 28, 2014 at 11:06 AM, Russell Senior
>> <(spam-protected)> wrote:
>> > FWIW, the ipv6 routing tables on the "leaf" nodes are quite short, with
>> > mostly just a default route pointing at the central server, when olsrd
>> > is
>> > working. When the central server has the route collapse, the default
>> > route
>> > on the "leaf" nodes disappears.
>> >
>> > I am thinking about memory exhaustion, maybe something his helpfully
>> > killing
>> > it off when the size becomes "too large" ... /me goes to look for
>> > evidence
>> > of that.
>> >
>> >
>> > On Fri, Mar 28, 2014 at 3:03 AM, Russell Senior
>> > <(spam-protected)>
>> > wrote:
>> >>
>> >> The are single hop from the central server, which is the table I've
>> >> been
>> >> posting.
>> >>
>> >>
>> >> On Fri, Mar 28, 2014 at 3:01 AM, Henning Rogge <(spam-protected)>
>> >> wrote:
>> >>>
>> >>> What?
>> >>>
>> >>> but your routing tables only contains "ETX 1.0" paths... which means
>> >>> they are single hop!
>> >>>
>> >>> Henning
>> >>>
>> >>> On Fri, Mar 28, 2014 at 11:00 AM, Russell Senior
>> >>> <(spam-protected)> wrote:
>> >>> > Without the ipv6 olsrd, the nodes can't route to each other, it
>> >>> > seems.
>> >>> > I
>> >>> > picked two I had turned off, and tried ping6'ing between them and
>> >>> > got
>> >>> > 100%
>> >>> > packet loss.
>> >>> >
>> >>> >
>> >>> > On Fri, Mar 28, 2014 at 2:54 AM, Henning Rogge <(spam-protected)>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> as far as I can see each "leaf" node can see each other leaf node
>> >>> >> over
>> >>> >> the OpenVPN, right?
>> >>> >>
>> >>> >> So you are only using Olsrd to distribute HNAs?
>> >>> >>
>> >>> >> Henning Rogge
>> >>> >>
>> >>> >> On Fri, Mar 28, 2014 at 10:48 AM, Russell Senior
>> >>> >> <(spam-protected)> wrote:
>> >>> >> > The central server, ::407, is running OpenVPN in server mode.
>> >>> >> > The
>> >>> >> > "leaf"
>> >>> >> > nodes all connect to it via OpenVPN client mode with a tap
>> >>> >> > interface.
>> >>> >> > We
>> >>> >> > statically provision the IPv6 addresses on the vpn.
>> >>> >> >
>> >>> >> > And yes, the OpenVPN links are still active. We are running an
>> >>> >> > IPv4
>> >>> >> > instance of olsrd (same version) in parallel and those routes (to
>> >>> >> > the
>> >>> >> > very
>> >>> >> > same devices) are not affected.
>> >>> >> >
>> >>> >> > We see the problem when particular (though varying) nodes olsrd
>> >>> >> > ipv6
>> >>> >> > instances are started/stopped. Sometimes the nodes are running
>> >>> >> > 0.6.6.1,
>> >>> >> > and
>> >>> >> > sometimes 0.6.4. It doesn't seem to be specific. The central
>> >>> >> > server is
>> >>> >> > running 0.6.6.1 now, but we saw the same thing earlier (which is
>> >>> >> > why
>> >>> >> > I
>> >>> >> > upgraded) on 0.6.4.
>> >>> >> >
>> >>> >> > One other potential clue (it doesn't make very much sense,
>> >>> >> > because I
>> >>> >> > know
>> >>> >> > there are much bigger networks than ours), I've never seen more
>> >>> >> > than
>> >>> >> > 186
>> >>> >> > ipv6 routes on ::407. We seem to see the problem when we try to
>> >>> >> > exceed
>> >>> >> > that. I'm going to try to confirm that.
>> >>> >> >
>> >>> >> >
>> >>> >> > On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <(spam-protected)>
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> Hi,
>> >>> >> >>
>> >>> >> >> I must admit that I am not convinced that its an Olsrd bug what
>> >>> >> >> we
>> >>> >> >> are
>> >>> >> >> seeing...
>> >>> >> >>
>> >>> >> >> If I see it correctly Olsrd is running over the VPN interface
>> >>> >> >> connection (interface name "vpn"), right?
>> >>> >> >>
>> >>> >> >> Is the VPN connection between the nodes still active during the
>> >>> >> >> route
>> >>> >> >> loss? Most of the nodes seem to have direct connections and the
>> >>> >> >> "30
>> >>> >> >> seconds until recovery" sounds like an ETX value slowly going
>> >>> >> >> down
>> >>> >> >> and
>> >>> >> >> then dropping the link.
>> >>> >> >>
>> >>> >> >> Henning
>> >>> >> >>
>> >>> >> >> On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto
>> >>> >> >> <(spam-protected)>
>> >>> >> >> wrote:
>> >>> >> >> > Hello Russel,
>> >>> >> >> >
>> >>> >> >> > looking at this:
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>> >>> >> >> >
>> >>> >> >> > https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>> >>> >> >> >
>> >>> >> >> > it looks like IPv6 routes are removed from the olsrd database.
>> >>> >> >> > So
>> >>> >> >> > I
>> >>> >> >> > is
>> >>> >> >> > actually the olsrd daemon involved.
>> >>> >> >> >
>> >>> >> >> > do you know if there is a previous stable version of olsrd
>> >>> >> >> > where
>> >>> >> >> > this
>> >>> >> >> > bug/behaviour is not present ?
>> >>> >> >> >
>> >>> >> >> > In my opinion the fastest way to track the bug is to try
>> >>> >> >> > different
>> >>> >> >> > versions of olsrd with "git bisect" method.
>> >>> >> >> >
>> >>> >> >> > The first step is to tell us if there is a version of olsrd
>> >>> >> >> > that
>> >>> >> >> > is
>> >>> >> >> > not affected by this problem.
>> >>> >> >> >
>> >>> >> >> > thanks
>> >>> >> >> >
>> >>> >> >> > I cc: olsrd-dev
>> >>> >> >> >
>> >>> >> >> > Saverio
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 2014-03-27 10:37 GMT+01:00 Russell Senior
>> >>> >> >> > <(spam-protected)>:
>> >>> >> >> >>>>>>> "Henning" == Henning Rogge
>> >>> >> >> >>>>>>> <(spam-protected)>
>> >>> >> >> >>>>>>> writes:
>> >>> >> >> >>
>> >>> >> >> >> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:
>> >>> >> >> >>>> Anybody get a chance to look at the strace? I see a:
>> >>> >> >> >>
>> >>> >> >> >> Henning> strace and packet dumps are much too lowlevel to
>> >>> >> >> >> directly
>> >>> >> >> >> Henning> hunt problems like this. Thats why Saverios question
>> >>> >> >> >> about
>> >>> >> >> >> Henning> txtinfo good, because it gives you a much more
>> >>> >> >> >> high-level
>> >>> >> >> >> Henning> view on what is going on.
>> >>> >> >> >>
>> >>> >> >> >> I had not installed the modules previously, so that interface
>> >>> >> >> >> wasn't
>> >>> >> >> >> immediately available. It is now.
>> >>> >> >> >>
>> >>> >> >> >> [...]
>> >>> >> >> >>
>> >>> >> >> >> Henning> Okay, lets get back to the high-level view.
>> >>> >> >> >>
>> >>> >> >> >> Henning> To interpret the events you described we need a list
>> >>> >> >> >> of
>> >>> >> >> >> Henning> nodes, with their interface IPs and the connectivity
>> >>> >> >> >> between
>> >>> >> >> >> Henning> them.
>> >>> >> >> >>
>> >>> >> >> >> Here is the list of neighbors of 2001:470:e962::407. The
>> >>> >> >> >> addresses
>> >>> >> >> >> listed are on the public wifi. The OpenVPN addresses of each
>> >>> >> >> >> node
>> >>> >> >> >> are
>> >>> >> >> >> a permutation, e.g. if the public wifi addr is
>> >>> >> >> >> 2001:470:e962:wxyz::1,
>> >>> >> >> >> then the OpenVPN address of the node is 2001:470:e962::wxyz.
>> >>> >> >> >>
>> >>> >> >> >> None of the nodes connect directly, everything goes through
>> >>> >> >> >> ::407.
>> >>> >> >> >>
>> >>> >> >> >> From curl -6 http://localhost:$port/neighbors
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt
>> >>> >> >> >>
>> >>> >> >> >> Henning> I am also a bit worried about your usage of bridges
>> >>> >> >> >> Henning> connected to mesh interfaces. Normally you should
>> >>> >> >> >> no
>> >>> >> >> >> bridge
>> >>> >> >> >> Henning> any interface that OLSR uses for meshing. Mixing
>> >>> >> >> >> routing
>> >>> >> >> >> Henning> (L3) and bridging (L2) can go wrong in very creative
>> >>> >> >> >> ways.
>> >>> >> >> >>
>> >>> >> >> >> I don't understand how the bridges could be a problem in this
>> >>> >> >> >> case.
>> >>> >> >> >> This is a hub and spoke topology. One openvpn server in the
>> >>> >> >> >> middle,
>> >>> >> >> >> nodes at the edges. None of the nodes interconnect
>> >>> >> >> >> otherwise.
>> >>> >> >> >> Olsr
>> >>> >> >> >> is broadcast on the wifi in case there are any olsrd devices
>> >>> >> >> >> nearby,
>> >>> >> >> >> but, again, there is no overlap in the wifi coverage (and if
>> >>> >> >> >> there
>> >>> >> >> >> were physically, they are on different SSIDs and wouldn't
>> >>> >> >> >> overlap
>> >>> >> >> >> logically).
>> >>> >> >> >>
>> >>> >> >> >> Can you explain more about what in particularly would make
>> >>> >> >> >> you
>> >>> >> >> >> worry?
>> >>> >> >> >> This configuration has been stable for us on ipv4 for years
>> >>> >> >> >> and
>> >>> >> >> >> also
>> >>> >> >> >> on ipv6 until very recently, since late 2012 at least. So, I
>> >>> >> >> >> suspect
>> >>> >> >> >> a bug. Somewhere.
>> >>> >> >> >>
>> >>> >> >> >> Henning> Txtinfo output would be good (especially /route)
>> >>> >> >> >> would
>> >>> >> >> >> be
>> >>> >> >> >> Henning> good to see... before the problem, during the
>> >>> >> >> >> problem
>> >>> >> >> >> and
>> >>> >> >> >> Henning> after the recovery.
>> >>> >> >> >>
>> >>> >> >> >> I'm using curl -6 http://localhost:$port/routes to get the
>> >>> >> >> >> following
>> >>> >> >> >> data, before, during and after turning on an ipv6 olsrd on a
>> >>> >> >> >> particular node (2001:470:e962:11c1::1).
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt
>> >>> >> >> >>
>> >>> >> >> >> Henning> It would also help if you can reduce the number of
>> >>> >> >> >> nodes
>> >>> >> >> >> Henning> while still replicating the problem to a minimum.
>> >>> >> >> >>
>> >>> >> >> >> I don't have that level of control, unfortunately. When I
>> >>> >> >> >> notice
>> >>> >> >> >> that
>> >>> >> >> >> the ipv6 routes have collapsed, I pick a likely seeming node
>> >>> >> >> >> (maybe
>> >>> >> >> >> because it had been plugged in recently) and turn off ipv6
>> >>> >> >> >> olsrd,
>> >>> >> >> >> and
>> >>> >> >> >> over 30-60 seconds, magically the routes all come back. My
>> >>> >> >> >> luck
>> >>> >> >> >> in
>> >>> >> >> >> guessing the right node to turn off is a little bit "too
>> >>> >> >> >> good",
>> >>> >> >> >> if
>> >>> >> >> >> you
>> >>> >> >> >> know what I mean, so that I am not sure there is anything
>> >>> >> >> >> particularly
>> >>> >> >> >> unique about the node I choose. But, nevertheless, turning
>> >>> >> >> >> it
>> >>> >> >> >> off
>> >>> >> >> >> seems to help, generally.
>> >>> >> >> >>
>> >>> >> >> >> FWIW, I'm including olsrd versions here. The central machine
>> >>> >> >> >> ::407
>> >>> >> >> >> is
>> >>> >> >> >> running 0.6.6.1, compiled from the tarball. The nodes have
>> >>> >> >> >> the
>> >>> >> >> >> following versions, all built from openwrt routing feed
>> >>> >> >> >> sources.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt
>> >>> >> >> >>
>> >>> >> >> >> Here is a table listing the frequency of each openwrt
>> >>> >> >> >> version:
>> >>> >> >> >>
>> >>> >> >> >> 1 0.6.3-3
>> >>> >> >> >> 33 0.6.4-1
>> >>> >> >> >> 1 0.6.5.1-1
>> >>> >> >> >> 1 0.6.5.1-2
>> >>> >> >> >> 7 0.6.5.2-1
>> >>> >> >> >> 1 0.6.5.3-1
>> >>> >> >> >> 2 0.6.5.4-1
>> >>> >> >> >> 2 0.6.6-2
>> >>> >> >> >> 7 0.6.6-3
>> >>> >> >> >> 11 0.6.6.1-1
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Russell Senior, President
>> >>> >> >> >> (spam-protected)
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Olsr-users mailing list
>> >>> >> >> >> (spam-protected)
>> >>> >> >> >> https://lists.olsr.org/mailman/listinfo/olsr-users
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Olsr-dev mailing list
>> >>> >> >> > (spam-protected)
>> >>> >> >> > https://lists.olsr.org/mailman/listinfo/olsr-dev
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>
>
More information about the Olsr-users
mailing list