[Olsr-dev] info plugin still send blocking

Joe Ayers (spam-protected)
Mon Dec 4 19:36:10 CET 2017


In reference to:

" * A timer was added and each time it expires each non-empty buffer
  * in this structure will try to write data into a "non-blocking"
  * socket until all data is sent, so that no blocking occurs."

A blocking event can be reliably reproduced in olsr 0.9.6.2 in OpenWRT
Chaos Calmer.  The node drops off the mesh and stops responding (not a
good thing when it's remote on a tower :) ).

Test scenario:

- Node_A LAN laptop, "echo /all | nc node_B 9090"  sleeps 2 seconds
and repeats (reproduces more quickly, but could be a single hit)
- Node_A has RF link to Node_B with ~70% LQ/NLQ (maybe marginal LQ/NLQ
is a non-factor?)
- Node_B has "echo /nei | nc localhost 2006" sleeps 5 seconds and repeats

After a few minutes, Node_B blocks on send in strace,  subsequently
waited ~10 min and SIGTERM'd  (search for SIGTERM to find in full log
file URL below).

clock_gettime(CLOCK_MONOTONIC, {281, 819282014}) = 0
accept(13, {sa_family=AF_INET, sin_port=htons(32965),
sin_addr=inet_addr("10.34.163.239")}, [16]) = 17
_newselect(18, [17], NULL, NULL, {0, 20000}) = 1 (in [17], left {0, 19984})
recv(17, "/all\n", 1024, MSG_DONTWAIT)  = 5
time(NULL)                              = 1512345153
clock_gettime(CLOCK_MONOTONIC, {281, 825309152}) = 0
...
clock_gettime(CLOCK_MONOTONIC, {282, 416388829}) = 0
_newselect(18, NULL, [17], NULL, {0, 0}) = 1 (out [17], left {0, 0})
send(17, "{\"pid\": 1726,\"systemTime\": 15123"..., 386040, 0) = 141904

OLSR no longer functions at this point in time, remaining TCP
connections go to CLOSE_WAIT until socket listen limit reached.
Shouldn't there be a MSG_DONTWAIT flag in the send to be non-blocking?

Full strace log:
https://drive.google.com/file/d/0B2bEy75HhwWhVE1BQ3BUdHY3azg/view?usp=sharing

Joe AE6XE



More information about the Olsr-dev mailing list