[Olsr-dev] info plugin still send blocking

Ferry Huberts (spam-protected)
Tue Dec 5 12:29:07 CET 2017


Yes, sorry to reply so slow.
That patch should probably work well.
To be fair, this was already present in the plugins before the 
conversion to shared info, I just traced it there.

On 05/12/17 08:59, Henning Rogge wrote:
> Hi,
> 
> could you test the following patch?
> 
> Henning
> 
> On Mon, Dec 4, 2017 at 7:53 PM, Joe Ayers <(spam-protected)> wrote:
>> Correction, full strace log file URL is:
>>
>> https://drive.google.com/file/d/1TGW5VFpcKppbd82eT72qf6TqTtgqy-0j/view?usp=sharing
>>
>>
>>
>> On Mon, Dec 4, 2017 at 10:36 AM, Joe Ayers <(spam-protected)> wrote:
>>> In reference to:
>>>
>>> " * A timer was added and each time it expires each non-empty buffer
>>>    * in this structure will try to write data into a "non-blocking"
>>>    * socket until all data is sent, so that no blocking occurs."
>>>
>>> A blocking event can be reliably reproduced in olsr 0.9.6.2 in OpenWRT
>>> Chaos Calmer.  The node drops off the mesh and stops responding (not a
>>> good thing when it's remote on a tower :) ).
>>>
>>> Test scenario:
>>>
>>> - Node_A LAN laptop, "echo /all | nc node_B 9090"  sleeps 2 seconds
>>> and repeats (reproduces more quickly, but could be a single hit)
>>> - Node_A has RF link to Node_B with ~70% LQ/NLQ (maybe marginal LQ/NLQ
>>> is a non-factor?)
>>> - Node_B has "echo /nei | nc localhost 2006" sleeps 5 seconds and repeats
>>>
>>> After a few minutes, Node_B blocks on send in strace,  subsequently
>>> waited ~10 min and SIGTERM'd  (search for SIGTERM to find in full log
>>> file URL below).
>>>
>>> clock_gettime(CLOCK_MONOTONIC, {281, 819282014}) = 0
>>> accept(13, {sa_family=AF_INET, sin_port=htons(32965),
>>> sin_addr=inet_addr("10.34.163.239")}, [16]) = 17
>>> _newselect(18, [17], NULL, NULL, {0, 20000}) = 1 (in [17], left {0, 19984})
>>> recv(17, "/all\n", 1024, MSG_DONTWAIT)  = 5
>>> time(NULL)                              = 1512345153
>>> clock_gettime(CLOCK_MONOTONIC, {281, 825309152}) = 0
>>> ...
>>> clock_gettime(CLOCK_MONOTONIC, {282, 416388829}) = 0
>>> _newselect(18, NULL, [17], NULL, {0, 0}) = 1 (out [17], left {0, 0})
>>> send(17, "{\"pid\": 1726,\"systemTime\": 15123"..., 386040, 0) = 141904
>>>
>>> OLSR no longer functions at this point in time, remaining TCP
>>> connections go to CLOSE_WAIT until socket listen limit reached.
>>> Shouldn't there be a MSG_DONTWAIT flag in the send to be non-blocking?
>>>
>>> Full strace log:
>>> https://drive.google.com/file/d/0B2bEy75HhwWhVE1BQ3BUdHY3azg/view?usp=sharing
>>>
>>> Joe AE6XE
>>
>> --
>> Olsr-dev mailing list
>> (spam-protected)
>> https://lists.olsr.org/mailman/listinfo/olsr-dev
>>
>>

-- 
Ferry Huberts



More information about the Olsr-dev mailing list