<div dir="ltr">Just to clarify "ptp" is the client's OpenVPN interface. "vpn" is the server's OpenVPN interface. "br-pub" typically is the wifi AP interface on the node device (sometimes with an ethernet interface as well).<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 28, 2014 at 4:01 AM, Russell Senior <span dir="ltr"><<a href="mailto:russell@personaltelco.net" target="_blank">russell@personaltelco.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>I should have done this earlier, but here are my olsrd.conf files. On the server:<br>
<br>===================================================<br>IpVersion 6<br><br></div><div>#Hna4<br>#{<br>
#}<br></div><div><br>Hna6<br>{<br> 0:: 0<br>}<br><br>LinkQualityFishEye 0<br><br>LoadPlugin "olsrd_txtinfo.so.0.1"<br>{<br> PlParam "port" "7862"<br>}<br><br>#############################################<br>
### OLSRD default interface configuration ###<br>#############################################<br># the default interface section can have the same values as the following<br># interface configuration. It will allow you so set common options for all<br>
# interfaces.<br><br>InterfaceDefaults {<br> # Ip4Broadcast 255.255.255.255<br>}<br><br>Interface "ptp" "ptp-udp" "vpn" "iris"<br>{<br># Mode "ether"<br>
}<br>=====================================================<br><br></div><div>I am pretty sure that Hna4 { } part had been there uncommented for a while. The Mode "ether" was uncommented too. When I commented them out, as above, and restart I see the individual routes on the client, as you would expect. I had noticed the "route aggregation" and been a little surprised, but having just moved to a newer version, I wasn't too suspicious.<br>
</div><div><br></div>On the clients:<br><br>=====================================================<div class=""><br>IpVersion 6<br><br>LinkQualityFishEye 0<br><br>Hna6<br>{<br></div> 2001:470:e962:xxyy:: 64<br>}<br>
<br>LoadPlugin "olsrd_txtinfo.so.0.1"<br>
{<br> PlParam "port" "7862"<br>}<br> <br>Interface "br-pub" "ptp"<br>{<br>}<br>=====================================================<br><br></div>When it's working, I see 177 olsrd routes (the 180 figure included some header/footer lines, apparently) on the server and 176 on the client. But if I add another node, the routes all collapse still. It is confusing though. Sometimes, I only see two routes, as below, apparently when Mode "ether" is in force. It's confusing because sometimes I was seeing the more complete client routing table even with Mode "ether". <br>
<br>Table: Routes<br>Destination Gateway IP Metric ETX Interface<br>::/0 2001:470:e962::407 1 1.000 ptp<br>2001:470:e962::407/128 2001:470:e962::407 1 1.000 ptp<br><br></div>I am turning Mode "ether" off again, and I seem to get a complete set of routes (one less than the server) on the clients.<br>
<br></div>Again, though, if I add one more node, the routes on both the server and clients collapse. The clients go to zero. The server has routes to one or sometimes two clients, which vary a little bit.<br><div><div>
<br>
<div><div><br></div></div></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 28, 2014 at 3:09 AM, Henning Rogge <span dir="ltr"><<a href="mailto:hrogge@gmail.com" target="_blank">hrogge@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Each leaf should have a /128 route for each other leaf...<br>
<br>
Olsrd does NOT do any route aggregation.<br>
<br>
Can you show me a routing table of a leaf and the txtinfo output when<br>
everything is fine?<br>
<br>
Henning<br>
<br>
On Fri, Mar 28, 2014 at 11:06 AM, Russell Senior<br>
<div><div><<a href="mailto:russell@personaltelco.net" target="_blank">russell@personaltelco.net</a>> wrote:<br>
> FWIW, the ipv6 routing tables on the "leaf" nodes are quite short, with<br>
> mostly just a default route pointing at the central server, when olsrd is<br>
> working. When the central server has the route collapse, the default route<br>
> on the "leaf" nodes disappears.<br>
><br>
> I am thinking about memory exhaustion, maybe something his helpfully killing<br>
> it off when the size becomes "too large" ... /me goes to look for evidence<br>
> of that.<br>
><br>
><br>
> On Fri, Mar 28, 2014 at 3:03 AM, Russell Senior <<a href="mailto:russell@personaltelco.net" target="_blank">russell@personaltelco.net</a>><br>
> wrote:<br>
>><br>
>> The are single hop from the central server, which is the table I've been<br>
>> posting.<br>
>><br>
>><br>
>> On Fri, Mar 28, 2014 at 3:01 AM, Henning Rogge <<a href="mailto:hrogge@gmail.com" target="_blank">hrogge@gmail.com</a>> wrote:<br>
>>><br>
>>> What?<br>
>>><br>
>>> but your routing tables only contains "ETX 1.0" paths... which means<br>
>>> they are single hop!<br>
>>><br>
>>> Henning<br>
>>><br>
>>> On Fri, Mar 28, 2014 at 11:00 AM, Russell Senior<br>
>>> <<a href="mailto:russell@personaltelco.net" target="_blank">russell@personaltelco.net</a>> wrote:<br>
>>> > Without the ipv6 olsrd, the nodes can't route to each other, it seems.<br>
>>> > I<br>
>>> > picked two I had turned off, and tried ping6'ing between them and got<br>
>>> > 100%<br>
>>> > packet loss.<br>
>>> ><br>
>>> ><br>
>>> > On Fri, Mar 28, 2014 at 2:54 AM, Henning Rogge <<a href="mailto:hrogge@gmail.com" target="_blank">hrogge@gmail.com</a>><br>
>>> > wrote:<br>
>>> >><br>
>>> >> Hi,<br>
>>> >><br>
>>> >> as far as I can see each "leaf" node can see each other leaf node over<br>
>>> >> the OpenVPN, right?<br>
>>> >><br>
>>> >> So you are only using Olsrd to distribute HNAs?<br>
>>> >><br>
>>> >> Henning Rogge<br>
>>> >><br>
>>> >> On Fri, Mar 28, 2014 at 10:48 AM, Russell Senior<br>
>>> >> <<a href="mailto:russell@personaltelco.net" target="_blank">russell@personaltelco.net</a>> wrote:<br>
>>> >> > The central server, ::407, is running OpenVPN in server mode. The<br>
>>> >> > "leaf"<br>
>>> >> > nodes all connect to it via OpenVPN client mode with a tap<br>
>>> >> > interface.<br>
>>> >> > We<br>
>>> >> > statically provision the IPv6 addresses on the vpn.<br>
>>> >> ><br>
>>> >> > And yes, the OpenVPN links are still active. We are running an IPv4<br>
>>> >> > instance of olsrd (same version) in parallel and those routes (to<br>
>>> >> > the<br>
>>> >> > very<br>
>>> >> > same devices) are not affected.<br>
>>> >> ><br>
>>> >> > We see the problem when particular (though varying) nodes olsrd ipv6<br>
>>> >> > instances are started/stopped. Sometimes the nodes are running<br>
>>> >> > 0.6.6.1,<br>
>>> >> > and<br>
>>> >> > sometimes 0.6.4. It doesn't seem to be specific. The central<br>
>>> >> > server is<br>
>>> >> > running 0.6.6.1 now, but we saw the same thing earlier (which is why<br>
>>> >> > I<br>
>>> >> > upgraded) on 0.6.4.<br>
>>> >> ><br>
>>> >> > One other potential clue (it doesn't make very much sense, because I<br>
>>> >> > know<br>
>>> >> > there are much bigger networks than ours), I've never seen more than<br>
>>> >> > 186<br>
>>> >> > ipv6 routes on ::407. We seem to see the problem when we try to<br>
>>> >> > exceed<br>
>>> >> > that. I'm going to try to confirm that.<br>
>>> >> ><br>
>>> >> ><br>
>>> >> > On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <<a href="mailto:hrogge@gmail.com" target="_blank">hrogge@gmail.com</a>><br>
>>> >> > wrote:<br>
>>> >> >><br>
>>> >> >> Hi,<br>
>>> >> >><br>
>>> >> >> I must admit that I am not convinced that its an Olsrd bug what we<br>
>>> >> >> are<br>
>>> >> >> seeing...<br>
>>> >> >><br>
>>> >> >> If I see it correctly Olsrd is running over the VPN interface<br>
>>> >> >> connection (interface name "vpn"), right?<br>
>>> >> >><br>
>>> >> >> Is the VPN connection between the nodes still active during the<br>
>>> >> >> route<br>
>>> >> >> loss? Most of the nodes seem to have direct connections and the "30<br>
>>> >> >> seconds until recovery" sounds like an ETX value slowly going down<br>
>>> >> >> and<br>
>>> >> >> then dropping the link.<br>
>>> >> >><br>
>>> >> >> Henning<br>
>>> >> >><br>
>>> >> >> On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto<br>
>>> >> >> <<a href="mailto:zioproto@gmail.com" target="_blank">zioproto@gmail.com</a>><br>
>>> >> >> wrote:<br>
>>> >> >> > Hello Russel,<br>
>>> >> >> ><br>
>>> >> >> > looking at this:<br>
>>> >> >> ><br>
>>> >> >> > <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt</a><br>
>>> >> >> ><br>
>>> >> >> > <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt</a><br>
>>> >> >> > <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt</a><br>
>>> >> >> ><br>
>>> >> >> > it looks like IPv6 routes are removed from the olsrd database. So<br>
>>> >> >> > I<br>
>>> >> >> > is<br>
>>> >> >> > actually the olsrd daemon involved.<br>
>>> >> >> ><br>
>>> >> >> > do you know if there is a previous stable version of olsrd where<br>
>>> >> >> > this<br>
>>> >> >> > bug/behaviour is not present ?<br>
>>> >> >> ><br>
>>> >> >> > In my opinion the fastest way to track the bug is to try<br>
>>> >> >> > different<br>
>>> >> >> > versions of olsrd with "git bisect" method.<br>
>>> >> >> ><br>
>>> >> >> > The first step is to tell us if there is a version of olsrd that<br>
>>> >> >> > is<br>
>>> >> >> > not affected by this problem.<br>
>>> >> >> ><br>
>>> >> >> > thanks<br>
>>> >> >> ><br>
>>> >> >> > I cc: olsrd-dev<br>
>>> >> >> ><br>
>>> >> >> > Saverio<br>
>>> >> >> ><br>
>>> >> >> ><br>
>>> >> >> > 2014-03-27 10:37 GMT+01:00 Russell Senior<br>
>>> >> >> > <<a href="mailto:russell@personaltelco.net" target="_blank">russell@personaltelco.net</a>>:<br>
>>> >> >> >>>>>>> "Henning" == Henning Rogge<br>
>>> >> >> >>>>>>> <<a href="mailto:henning.rogge@fkie.fraunhofer.de" target="_blank">henning.rogge@fkie.fraunhofer.de</a>><br>
>>> >> >> >>>>>>> writes:<br>
>>> >> >> >><br>
>>> >> >> >> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:<br>
>>> >> >> >>>> Anybody get a chance to look at the strace? I see a:<br>
>>> >> >> >><br>
>>> >> >> >> Henning> strace and packet dumps are much too lowlevel to<br>
>>> >> >> >> directly<br>
>>> >> >> >> Henning> hunt problems like this. Thats why Saverios question<br>
>>> >> >> >> about<br>
>>> >> >> >> Henning> txtinfo good, because it gives you a much more<br>
>>> >> >> >> high-level<br>
>>> >> >> >> Henning> view on what is going on.<br>
>>> >> >> >><br>
>>> >> >> >> I had not installed the modules previously, so that interface<br>
>>> >> >> >> wasn't<br>
>>> >> >> >> immediately available. It is now.<br>
>>> >> >> >><br>
>>> >> >> >> [...]<br>
>>> >> >> >><br>
>>> >> >> >> Henning> Okay, lets get back to the high-level view.<br>
>>> >> >> >><br>
>>> >> >> >> Henning> To interpret the events you described we need a list of<br>
>>> >> >> >> Henning> nodes, with their interface IPs and the connectivity<br>
>>> >> >> >> between<br>
>>> >> >> >> Henning> them.<br>
>>> >> >> >><br>
>>> >> >> >> Here is the list of neighbors of 2001:470:e962::407. The<br>
>>> >> >> >> addresses<br>
>>> >> >> >> listed are on the public wifi. The OpenVPN addresses of each<br>
>>> >> >> >> node<br>
>>> >> >> >> are<br>
>>> >> >> >> a permutation, e.g. if the public wifi addr is<br>
>>> >> >> >> 2001:470:e962:wxyz::1,<br>
>>> >> >> >> then the OpenVPN address of the node is 2001:470:e962::wxyz.<br>
>>> >> >> >><br>
>>> >> >> >> None of the nodes connect directly, everything goes through<br>
>>> >> >> >> ::407.<br>
>>> >> >> >><br>
>>> >> >> >> From curl -6 http://localhost:$port/neighbors<br>
>>> >> >> >><br>
>>> >> >> >> <a href="https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt</a><br>
>>> >> >> >><br>
>>> >> >> >> Henning> I am also a bit worried about your usage of bridges<br>
>>> >> >> >> Henning> connected to mesh interfaces. Normally you should no<br>
>>> >> >> >> bridge<br>
>>> >> >> >> Henning> any interface that OLSR uses for meshing. Mixing<br>
>>> >> >> >> routing<br>
>>> >> >> >> Henning> (L3) and bridging (L2) can go wrong in very creative<br>
>>> >> >> >> ways.<br>
>>> >> >> >><br>
>>> >> >> >> I don't understand how the bridges could be a problem in this<br>
>>> >> >> >> case.<br>
>>> >> >> >> This is a hub and spoke topology. One openvpn server in the<br>
>>> >> >> >> middle,<br>
>>> >> >> >> nodes at the edges. None of the nodes interconnect otherwise.<br>
>>> >> >> >> Olsr<br>
>>> >> >> >> is broadcast on the wifi in case there are any olsrd devices<br>
>>> >> >> >> nearby,<br>
>>> >> >> >> but, again, there is no overlap in the wifi coverage (and if<br>
>>> >> >> >> there<br>
>>> >> >> >> were physically, they are on different SSIDs and wouldn't<br>
>>> >> >> >> overlap<br>
>>> >> >> >> logically).<br>
>>> >> >> >><br>
>>> >> >> >> Can you explain more about what in particularly would make you<br>
>>> >> >> >> worry?<br>
>>> >> >> >> This configuration has been stable for us on ipv4 for years and<br>
>>> >> >> >> also<br>
>>> >> >> >> on ipv6 until very recently, since late 2012 at least. So, I<br>
>>> >> >> >> suspect<br>
>>> >> >> >> a bug. Somewhere.<br>
>>> >> >> >><br>
>>> >> >> >> Henning> Txtinfo output would be good (especially /route) would<br>
>>> >> >> >> be<br>
>>> >> >> >> Henning> good to see... before the problem, during the problem<br>
>>> >> >> >> and<br>
>>> >> >> >> Henning> after the recovery.<br>
>>> >> >> >><br>
>>> >> >> >> I'm using curl -6 http://localhost:$port/routes to get the<br>
>>> >> >> >> following<br>
>>> >> >> >> data, before, during and after turning on an ipv6 olsrd on a<br>
>>> >> >> >> particular node (2001:470:e962:11c1::1).<br>
>>> >> >> >><br>
>>> >> >> >><br>
>>> >> >> >> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt</a><br>
>>> >> >> >><br>
>>> >> >> >> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt</a><br>
>>> >> >> >><br>
>>> >> >> >> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt</a><br>
>>> >> >> >><br>
>>> >> >> >> Henning> It would also help if you can reduce the number of<br>
>>> >> >> >> nodes<br>
>>> >> >> >> Henning> while still replicating the problem to a minimum.<br>
>>> >> >> >><br>
>>> >> >> >> I don't have that level of control, unfortunately. When I<br>
>>> >> >> >> notice<br>
>>> >> >> >> that<br>
>>> >> >> >> the ipv6 routes have collapsed, I pick a likely seeming node<br>
>>> >> >> >> (maybe<br>
>>> >> >> >> because it had been plugged in recently) and turn off ipv6<br>
>>> >> >> >> olsrd,<br>
>>> >> >> >> and<br>
>>> >> >> >> over 30-60 seconds, magically the routes all come back. My luck<br>
>>> >> >> >> in<br>
>>> >> >> >> guessing the right node to turn off is a little bit "too good",<br>
>>> >> >> >> if<br>
>>> >> >> >> you<br>
>>> >> >> >> know what I mean, so that I am not sure there is anything<br>
>>> >> >> >> particularly<br>
>>> >> >> >> unique about the node I choose. But, nevertheless, turning it<br>
>>> >> >> >> off<br>
>>> >> >> >> seems to help, generally.<br>
>>> >> >> >><br>
>>> >> >> >> FWIW, I'm including olsrd versions here. The central machine<br>
>>> >> >> >> ::407<br>
>>> >> >> >> is<br>
>>> >> >> >> running 0.6.6.1, compiled from the tarball. The nodes have the<br>
>>> >> >> >> following versions, all built from openwrt routing feed sources.<br>
>>> >> >> >><br>
>>> >> >> >><br>
>>> >> >> >><br>
>>> >> >> >> <a href="https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt</a><br>
>>> >> >> >><br>
>>> >> >> >> Here is a table listing the frequency of each openwrt version:<br>
>>> >> >> >><br>
>>> >> >> >> 1 0.6.3-3<br>
>>> >> >> >> 33 0.6.4-1<br>
>>> >> >> >> 1 0.6.5.1-1<br>
>>> >> >> >> 1 0.6.5.1-2<br>
>>> >> >> >> 7 0.6.5.2-1<br>
>>> >> >> >> 1 0.6.5.3-1<br>
>>> >> >> >> 2 0.6.5.4-1<br>
>>> >> >> >> 2 0.6.6-2<br>
>>> >> >> >> 7 0.6.6-3<br>
>>> >> >> >> 11 0.6.6.1-1<br>
>>> >> >> >><br>
>>> >> >> >><br>
>>> >> >> >> --<br>
>>> >> >> >> Russell Senior, President<br>
>>> >> >> >> <a href="mailto:russell@personaltelco.net" target="_blank">russell@personaltelco.net</a><br>
>>> >> >> >><br>
>>> >> >> >> --<br>
>>> >> >> >> Olsr-users mailing list<br>
>>> >> >> >> <a href="mailto:Olsr-users@lists.olsr.org" target="_blank">Olsr-users@lists.olsr.org</a><br>
>>> >> >> >> <a href="https://lists.olsr.org/mailman/listinfo/olsr-users" target="_blank">https://lists.olsr.org/mailman/listinfo/olsr-users</a><br>
>>> >> >> ><br>
>>> >> >> > --<br>
>>> >> >> > Olsr-dev mailing list<br>
>>> >> >> > <a href="mailto:Olsr-dev@lists.olsr.org" target="_blank">Olsr-dev@lists.olsr.org</a><br>
>>> >> >> > <a href="https://lists.olsr.org/mailman/listinfo/olsr-dev" target="_blank">https://lists.olsr.org/mailman/listinfo/olsr-dev</a><br>
>>> >> ><br>
>>> >> ><br>
>>> ><br>
>>> ><br>
>><br>
>><br>
><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>