[Olsr-users] Route on repeater nodes sometimes break when gateway node reboots

Ben West (spam-protected)
Thu Mar 27 02:47:55 CET 2014


The specific failure mode I'm seeing is that, after the gateway node
undergoes an unscheduled reboot, the affected repeater node is only able to
send out ICMP ping packets beyond the WAN interface of the gateway node.
Other packets, e.g. "nslookup somedomain.com", don't go through.

Again, this failure mode does not appear to happen consistently.  I can't
trigger it (easily).

Since this is failing for commands entered at the shell prompt, e.g. like
nslookup, I am assuming it is not a problem with existing TCP sessions
becoming stuck or orphaned.


On Wed, Mar 26, 2014 at 1:30 PM, Ferry Huberts <(spam-protected)> wrote:

>
>
> On 26/03/14 17:32, Ben West wrote:
>
>> W/r/t to situations where a gateway node undergoes an uncommanded
>> reboot, e.g. power fails briefly, is there a preferred approach for
>> ensuring all repeater nodes w/in that mesh have their NAT states
>> refreshed as needed, when SmartGateway is in use?
>>
>> For example, if the repeater nodes detect their selected gateway node
>> rebooting (i.e. becoming unavailable for a few minutes), or even a new
>> gateway node coming online, should they restart their local instance if
>> olsrd in consequence?
>>
>> Or, would the best approach perhaps be to upgrade to olsrd 0.6.6, using
>> the same config?  I do see these entries in Changelog that might be
>> useful:
>>
>>        kernel_route: olsr_os_inetgw_tunnel_route can now take the table
>>        gateway: let the gateway code determine the tunnel name
>>        gateway: remove the worst gateway before adding new one
>>        gateway: add SmartGatewayUseCount configuration parameter
>>        gateway: use SmartGatewayUseCount setting the the gateway lists
>>        gateway: add SmartGatewayEgressInterfaces configuration parameter
>>        gateway: add SmartGatewayMarkOffset{Egress,Tunnels} configuration
>>           parameters
>>        gateway: add SmartGatewayPolicyRoutingScript configuration
>> parameter
>>        gateway: initialise a set of fixed tunnel names in/for
>> multi-gateway mode
>>        gateway: initialise the egress interface names in/for
>> multi-gateway mode
>>        gateway: use fixed tunnel names in/for multi-gateway mode
>>        gateway: setup and clear table specific default routes in/for
>>           multi-gateway mode
>>        gateway: setup/cleanup multi-gateway mode during startup/shutdown
>> of olsrd
>>        gateway: introduce and use MULTI_GW_MODE define
>>        gateway: enable multi-gateway mode
>>
>> Besides that, I have now deployed this param to all nodes, and disabled
>> dyn_gw_plain plugin.  However, it looks like a few repeater nodes (not
>> all, mysteriously) still see their route to the WAN, beyond the gateway
>> node, break when the gateway node reboots.
>>
>>
> what is 'break' ?
>
> When the gateway reboots obviously the traffic to the WAN can't proceed.
> It now depends on the application that initiated the connection on what
> happens during the time between the brokenness and the choosing of a new
> gateway (at least 1 minute in your version).
>
>
>  LoadPlugin "olsrd_dyn_gw.so.0.5"
>> {
>>      PlParam "HNA" "0.0.0.0 0.0.0.0"
>> }
>>
>>
>>
>>
>>
>> On Mon, Mar 24, 2014 at 2:55 PM, Teco Boot <(spam-protected)
>> <mailto:(spam-protected)>> wrote:
>>
>>     Is *the traffic* from same connection?
>>     If so, the NAT state is gone after a reboot and connection shall be
>>     restarted. Smart gateway cannot fix all problems.
>>
>>     Teco
>>
>>
>>     Op 24 mrt. 2014, om 19:14 heeft Ben West <(spam-protected)
>>     <mailto:(spam-protected)>> het volgende geschreven:
>>
>>      Ferry pointed this out off-list.  I've since removed dyn_gw_plain
>>>     on the nodes where I was testing, and am trying to see if the
>>>     problem can be repeated.
>>>
>>>
>>>     On Mon, Mar 24, 2014 at 1:11 PM, Teco Boot <(spam-protected)
>>>     <mailto:(spam-protected)>> wrote:
>>>
>>>         On original posting: why using both dyn_gw and dyn_gw_plain?
>>>
>>>         Teco
>>>
>>>
>>>         Op 24 mrt. 2014, om 02:39 heeft Ben West <(spam-protected)
>>>         <mailto:(spam-protected)>> het volgende geschreven:
>>>
>>>          I have seen sporadic instances of certain repeater nodes'
>>>>         (not all, generally a small subset of all repeater nodes in a
>>>>         given mesh), break their route through the gateway node if
>>>>         the gateway node reboots while the repeater does not.
>>>>
>>>>         That is, the gateway node reboots, and the affected repeater
>>>>         node thereafter appears to correctly re-establish its route
>>>>         thru the gateway, but the gateway doesn't actually route the
>>>>         repeater's traffic.  From the affected node, I can ping the
>>>>         gateway's mesh IP and also the gateway's WAN IP, but I can't
>>>>         ping anything beyond the gateway node's WAN interface.
>>>>
>>>>         Restarting olsrd on the repeater node seems to resolve this
>>>>         problem consistently.
>>>>
>>>>         This is occurring on nodes running OpenWRT AA r39154 and
>>>>         OLSRd v6.5-4, using SmartGateway.  I'm quoting my
>>>>         /etc/config/olsrd below, used on all notes alike.
>>>>
>>>>         Has anyone else observed a similar problem?  Browsing the
>>>>         changelog at http://olsr.org/git/ since v6.5-4 doesn't show
>>>>         any mention of explicit SmartGateway bugfixes, just
>>>>         additional features.
>>>>
>>>>         -----
>>>>         config olsrd
>>>>             # uncomment the following line to use a custom config
>>>>         file instead:
>>>>             #option config_file '/etc/olsrd.conf'
>>>>
>>>>             option 'IpVersion' '4'
>>>>             option 'LinkQualityLevel' '2'
>>>>             option 'LinkQualityAlgorithm' 'etx_ffeth'
>>>>             option 'SmartGateway' 'yes'
>>>>             option 'Pollrate' '0.1'
>>>>             option 'TcRedundancy'    '2'
>>>>             option 'MprCoverage'    '5'
>>>>
>>>>         config 'LoadPlugin'
>>>>             option 'library' 'olsrd_arprefresh.so.0.1'
>>>>
>>>>         config 'LoadPlugin'
>>>>             option 'library' 'olsrd_dyn_gw.so.0.5'
>>>>
>>>>         config 'LoadPlugin'
>>>>             option 'library' 'olsrd_dyn_gw_plain.so.0.4'
>>>>
>>>>         config 'LoadPlugin'
>>>>           option 'library' 'olsrd_nameservice.so.0.3'
>>>>           #option 'resolv_file' '/tmp/resolv.conf.auto'
>>>>           option 'sighup_pid_file' '/var/run/dnsmasq.pid'
>>>>           option 'suffix' '.mesh'
>>>>
>>>>         config 'LoadPlugin'
>>>>             option 'library' 'olsrd_txtinfo.so.0.1'
>>>>             option 'accept' '0.0.0.0'
>>>>
>>>>         config 'Interface'
>>>>             list 'interface' 'mesh'
>>>>             option 'Ip4Broadcast' '255.255.255.255'
>>>>             option 'Mode' 'mesh'
>>>>         #
>>>>
>>>>
>>>>         --
>>>>         Ben West
>>>>         http://gowasabi.net <http://gowasabi.net/>
>>>>         (spam-protected) <mailto:(spam-protected)>
>>>>         314-246-9434 <tel:314-246-9434>
>>>>         --
>>>>         Olsr-users mailing list
>>>>         (spam-protected) <mailto:(spam-protected)>
>>>>         https://lists.olsr.org/mailman/listinfo/olsr-users
>>>>
>>>
>>>
>>>
>>>
>>>     --
>>>     Ben West
>>>     http://gowasabi.net <http://gowasabi.net/>
>>>     (spam-protected) <mailto:(spam-protected)>
>>>     314-246-9434 <tel:314-246-9434>
>>>
>>
>>
>>
>>
>> --
>> Ben West
>> http://gowasabi.net
>> (spam-protected) <mailto:(spam-protected)>
>> 314-246-9434 <tel:314-246-9434>
>>
>>
>>
> --
> Ferry Huberts
>



-- 
Ben West
http://gowasabi.net
(spam-protected)
314-246-9434
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.olsr.org/pipermail/olsr-users/attachments/20140326/c07a4832/attachment.html>


More information about the Olsr-users mailing list