[Olsr-users] INFINITE cost in all links

Antonio Anselmi (spam-protected)
Tue Jul 29 14:37:55 CEST 2014


hi Ben, nice to read you again!

running olsrd with the minimal (default) configuration will be the first
step for debugging, in addition to compile olsrd with the flag
NO_DEBUG_MESSAGES=0 (Openwrt Makefile). This way, as soon as that condition
occurs, we'll try to grab the debug lines for off-line examination based on
a well-known configuration.
Unfortunatelly, we can't reproduce the "ETX=infinite situation" and the
interested networks are all production networks that serve several user
connections.

By the way, we noticed that "ETX=infinite" occurs more frequently when the
olsrd-gateway manages about a hundred connections from the users associated
to the APs of the meshed nodes (every mesh node runs a second VAP in master
mode): so the rate of that error, if I may say so, seems to be related to
the traffic/connections. The more a node is under load the more is its
latency, then it begins to lose packets and its LQ decreases, routes change
and TC propagates the changes. Maybe incorrect timings could cause slow
refreshes of the routing tables so that a loop could occur? But all the
nodes and all at the same time?... unlikely. Moreover, the gateway node -
as I wrote - continues to receive olsrd packets, sign that the other nodes are
alive.

Comments are always welcome!

Antonio




2014-07-28 23:27 GMT+02:00 Ben West <(spam-protected)>:

> Hi Antonio!  I recognize the timing / interval values you have for the
> adhoc interface from ROBIN-MESH firmware based on olsrd v0.5.6.  When I
> tried carrying these timing values over to olsrd v0.6+, I experienced
> sporadic problems with isolated repeater nodes mysteriously losing their
> route (while using SmartGateway) in meshes with multiple gateway nodes.
> Removing those timings seemed to make that particular failure mode go away.
>
> Below is the configuration I use (defaults if not specified), and the
> firmware platform versions:
>
> DebugLevel 0
> AllowNoInt yes
> IpVersion 4
> LinkQualityLevel 2
> LinkQualityAlgorithm "etx_ffeth"
> SmartGateway yes
> SmartGatewayUseCount 1
> Pollrate 0.1
>
> LoadPlugin "olsrd_arprefresh.so.0.1"
> {
> }
>
> LoadPlugin "olsrd_dyn_gw.so.0.5"
> {
>     PlParam "HNA" "0.0.0.0 0.0.0.0"
> }
>
> LoadPlugin "olsrd_txtinfo.so.0.1"
> {
>     PlParam "port" "2006"
>     PlParam "Accept" "127.0.0.1"
> }
>
> Interface "eth0"
> {
>     Mode "ether"
> }
>
> Interface "wlan0-1"
> {
>     Mode "mesh"
>     Ip4Broadcast 255.255.255.255
> }
>
> Hardware:
> UBNT Nanostation Loco M2's
>
> Firmware platform:
> OpenWRT AA r41182
> Kernel v3.3.8
> ath9k v3.3.8+2014-05-22-1
> olsrd v0.6.6.1
>
>
>
> On Fri, Jul 25, 2014 at 5:11 AM, Antonio Anselmi <(spam-protected)>
> wrote:
>
>> Hi all,
>> I'm facing an odd problem that recurs randomly, without any user
>> intervention, at different times on a single-gateway network.
>> Suddenly, all the links of the gateway node have their NLQ values equal
>> to zero and the respective LQ values greather than zero, so every link has
>> an INFINITE cost -> olsrd route table is empty -> ip main route table is
>> empty.
>> Just because all NLQ values are zero, it seems that the nodes do not
>> receive/hear olsrd packets from the gateway, but , at the same time, the
>> gateway itself continues to receive and process packets sent by the other
>> node (LQ values change in time).
>>
>> In this situation, I logged to the gateway node and run the command
>> 'tcpdump -vv -ni wlan0-1 port 698' (wlan0-1 is the adhoc interface) to
>> inspect the way olsrd was working.
>> As supposed:
>> 1) the gateway had stopped transmitting olsrd packets but continued to
>> receive and process olsrd packets sent by the other nodes
>> 2) olsrd route table and ip main route table were empty and the gateway
>> does not ping any other node
>> 3) the topology seen by the gateway have no node reachable
>> 4) at layer2 the neighbors are all associated (iw dev wlan0-1 station
>> dump) so adHoc beacons travel
>> Stopping and restarting olsrd does not fix the stalemate as well as
>> manually populating the ip main route table. Obviuosly, rebooting the node
>> all goes right ...untill the next olsrd stale.
>>
>> I'm wondering:
>> 1) Why all links go to INFINITE cost? (A loop caused by LQ mechanism?)
>> 2) may be the case that the gateway try to transmit olsrd packets, but
>> since its route table is empty, no packet reach the adHoc interface
>> (loop inside the gteway) ?
>> 3) since the TC-LQ packets sent by the nodes (and received by the
>> gateway) do not show the gateway (NLQ = 0) why these packets reach the
>> gateway? Nodes should have no route to the gateway since they do not hear
>> it!
>> 4) and... the all_INFINITE_costs situation (as well as the empty routing
>> tables) is a "cause" or an "effect" ?
>>
>> Have you some directions?
>>
>> Antonio
>>
>> For your information:
>> - network is 17 nodes + 1 gateway node, every node has an AP for wireless
>> user connections
>> - openwrt trunk r37737, kernel 3.10.4
>> - ath9k from kmod-mac80211 3.10.4+2013-06-27-1
>> - olsrd.conf file below
>>
>>
>> DebugLevel 0
>> IpVersion 4
>> AllowNoInt yes
>> Pollrate 0.05
>> TcRedundancy 2
>> MprCoverage 7
>> LinkQualityFishEye 1
>> LinkQualityLevel 2
>> UseHysteresis no
>> NatThreshold 0.5
>>
>> Interface "wlan0-1"
>> {
>>     Ip4Broadcast 255.255.255.255
>>     HelloInterval 6.0
>>     HelloValidityTime 108.0
>>     TcInterval 4.0
>>     TcValidityTime 324.0
>>     MidInterval 18.0
>>     MidValidityTime 324.0
>>     HnaInterval 18.0
>>     HnaValidityTime 108.0
>> }
>>
>> LoadPlugin "olsrd_txtinfo.so.0.1"
>> {
>>     PlParam "port" "8090"
>>     PlParam "Host" "127.0.0.1"
>> }
>>
>> LoadPlugin "olsrd_dot_draw.so.0.3"
>> {
>>    PlParam "port" "2004"
>> }
>>
>> LoadPlugin "olsrd_httpinfo.so.0.1"
>> {
>>     PlParam     "port" "8080"
>>     PlParam     "Net" "0.0.0.0 0.0.0.0"
>> }
>>
>>
>> Hna4
>> {
>>     0.0.0.0   0.0.0.0
>> }
>>
>>
>> --
>> Olsr-users mailing list
>> (spam-protected)
>> https://lists.olsr.org/mailman/listinfo/olsr-users
>>
>
>
>
> --
> Ben West
> http://gowasabi.net
> (spam-protected)
> 314-246-9434
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.olsr.org/pipermail/olsr-users/attachments/20140729/f0fc4d31/attachment.html>


More information about the Olsr-users mailing list