<div dir="ltr">The central server, ::407, is running OpenVPN in server mode. The "leaf" nodes all connect to it via OpenVPN client mode with a tap interface. We statically provision the IPv6 addresses on the vpn.<br>
<div><br>And yes, the OpenVPN links are still active. We are running an IPv4 instance of olsrd (same version) in parallel and those routes (to the very same devices) are not affected.<br><br></div><div>We see the problem when particular (though varying) nodes olsrd ipv6 instances are started/stopped. Sometimes the nodes are running 0.6.6.1, and sometimes 0.6.4. It doesn't seem to be specific. The central server is running 0.6.6.1 now, but we saw the same thing earlier (which is why I upgraded) on 0.6.4.<br>
<br></div><div>One other potential clue (it doesn't make very much sense, because I know there are much bigger networks than ours), I've never seen more than 186 ipv6 routes on ::407. We seem to see the problem when we try to exceed that. I'm going to try to confirm that.<br>
</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 28, 2014 at 2:34 AM, Henning Rogge <span dir="ltr"><<a href="mailto:hrogge@gmail.com" target="_blank">hrogge@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
I must admit that I am not convinced that its an Olsrd bug what we are seeing...<br>
<br>
If I see it correctly Olsrd is running over the VPN interface<br>
connection (interface name "vpn"), right?<br>
<br>
Is the VPN connection between the nodes still active during the route<br>
loss? Most of the nodes seem to have direct connections and the "30<br>
seconds until recovery" sounds like an ETX value slowly going down and<br>
then dropping the link.<br>
<br>
Henning<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, Mar 28, 2014 at 10:11 AM, Saverio Proto <<a href="mailto:zioproto@gmail.com">zioproto@gmail.com</a>> wrote:<br>
> Hello Russel,<br>
><br>
> looking at this:<br>
> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt</a><br>
> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt</a><br>
> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt</a><br>
><br>
> it looks like IPv6 routes are removed from the olsrd database. So I is<br>
> actually the olsrd daemon involved.<br>
><br>
> do you know if there is a previous stable version of olsrd where this<br>
> bug/behaviour is not present ?<br>
><br>
> In my opinion the fastest way to track the bug is to try different<br>
> versions of olsrd with "git bisect" method.<br>
><br>
> The first step is to tell us if there is a version of olsrd that is<br>
> not affected by this problem.<br>
><br>
> thanks<br>
><br>
> I cc: olsrd-dev<br>
><br>
> Saverio<br>
><br>
><br>
> 2014-03-27 10:37 GMT+01:00 Russell Senior <<a href="mailto:russell@personaltelco.net">russell@personaltelco.net</a>>:<br>
>>>>>>> "Henning" == Henning Rogge <<a href="mailto:henning.rogge@fkie.fraunhofer.de">henning.rogge@fkie.fraunhofer.de</a>> writes:<br>
>><br>
>> Henning> On 03/26/2014 07:41 PM, Russell Senior wrote:<br>
>>>> Anybody get a chance to look at the strace? I see a:<br>
>><br>
>> Henning> strace and packet dumps are much too lowlevel to directly<br>
>> Henning> hunt problems like this. Thats why Saverios question about<br>
>> Henning> txtinfo good, because it gives you a much more high-level<br>
>> Henning> view on what is going on.<br>
>><br>
>> I had not installed the modules previously, so that interface wasn't<br>
>> immediately available. It is now.<br>
>><br>
>> [...]<br>
>><br>
>> Henning> Okay, lets get back to the high-level view.<br>
>><br>
>> Henning> To interpret the events you described we need a list of<br>
>> Henning> nodes, with their interface IPs and the connectivity between<br>
>> Henning> them.<br>
>><br>
>> Here is the list of neighbors of 2001:470:e962::407. The addresses<br>
>> listed are on the public wifi. The OpenVPN addresses of each node are<br>
>> a permutation, e.g. if the public wifi addr is 2001:470:e962:wxyz::1,<br>
>> then the OpenVPN address of the node is 2001:470:e962::wxyz.<br>
>><br>
>> None of the nodes connect directly, everything goes through ::407.<br>
>><br>
>> From curl -6 http://localhost:$port/neighbors<br>
>><br>
>> <a href="https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-neighbors.txt</a><br>
>><br>
>> Henning> I am also a bit worried about your usage of bridges<br>
>> Henning> connected to mesh interfaces. Normally you should no bridge<br>
>> Henning> any interface that OLSR uses for meshing. Mixing routing<br>
>> Henning> (L3) and bridging (L2) can go wrong in very creative ways.<br>
>><br>
>> I don't understand how the bridges could be a problem in this case.<br>
>> This is a hub and spoke topology. One openvpn server in the middle,<br>
>> nodes at the edges. None of the nodes interconnect otherwise. Olsr<br>
>> is broadcast on the wifi in case there are any olsrd devices nearby,<br>
>> but, again, there is no overlap in the wifi coverage (and if there<br>
>> were physically, they are on different SSIDs and wouldn't overlap<br>
>> logically).<br>
>><br>
>> Can you explain more about what in particularly would make you worry?<br>
>> This configuration has been stable for us on ipv4 for years and also<br>
>> on ipv6 until very recently, since late 2012 at least. So, I suspect<br>
>> a bug. Somewhere.<br>
>><br>
>> Henning> Txtinfo output would be good (especially /route) would be<br>
>> Henning> good to see... before the problem, during the problem and<br>
>> Henning> after the recovery.<br>
>><br>
>> I'm using curl -6 http://localhost:$port/routes to get the following<br>
>> data, before, during and after turning on an ipv6 olsrd on a<br>
>> particular node (2001:470:e962:11c1::1).<br>
>><br>
>> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-before.txt</a><br>
>> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-during.txt</a><br>
>> <a href="https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-routes-after.txt</a><br>
>><br>
>> Henning> It would also help if you can reduce the number of nodes<br>
>> Henning> while still replicating the problem to a minimum.<br>
>><br>
>> I don't have that level of control, unfortunately. When I notice that<br>
>> the ipv6 routes have collapsed, I pick a likely seeming node (maybe<br>
>> because it had been plugged in recently) and turn off ipv6 olsrd, and<br>
>> over 30-60 seconds, magically the routes all come back. My luck in<br>
>> guessing the right node to turn off is a little bit "too good", if you<br>
>> know what I mean, so that I am not sure there is anything particularly<br>
>> unique about the node I choose. But, nevertheless, turning it off<br>
>> seems to help, generally.<br>
>><br>
>> FWIW, I'm including olsrd versions here. The central machine ::407 is<br>
>> running 0.6.6.1, compiled from the tarball. The nodes have the<br>
>> following versions, all built from openwrt routing feed sources.<br>
>><br>
>> <a href="https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt" target="_blank">https://personaltelco.net/~russell/olsrd/olsrd-versions-by-node.txt</a><br>
>><br>
>> Here is a table listing the frequency of each openwrt version:<br>
>><br>
>> 1 0.6.3-3<br>
>> 33 0.6.4-1<br>
>> 1 0.6.5.1-1<br>
>> 1 0.6.5.1-2<br>
>> 7 0.6.5.2-1<br>
>> 1 0.6.5.3-1<br>
>> 2 0.6.5.4-1<br>
>> 2 0.6.6-2<br>
>> 7 0.6.6-3<br>
>> 11 0.6.6.1-1<br>
>><br>
>><br>
>> --<br>
>> Russell Senior, President<br>
>> <a href="mailto:russell@personaltelco.net">russell@personaltelco.net</a><br>
>><br>
>> --<br>
>> Olsr-users mailing list<br>
>> <a href="mailto:Olsr-users@lists.olsr.org">Olsr-users@lists.olsr.org</a><br>
>> <a href="https://lists.olsr.org/mailman/listinfo/olsr-users" target="_blank">https://lists.olsr.org/mailman/listinfo/olsr-users</a><br>
><br>
</div></div><span class="HOEnZb"><font color="#888888">> --<br>
> Olsr-dev mailing list<br>
> <a href="mailto:Olsr-dev@lists.olsr.org">Olsr-dev@lists.olsr.org</a><br>
> <a href="https://lists.olsr.org/mailman/listinfo/olsr-dev" target="_blank">https://lists.olsr.org/mailman/listinfo/olsr-dev</a><br>
</font></span></blockquote></div><br></div>