[Olsr-dev] info plugin still send blocking
Joe Ayers
(spam-protected)
Mon Dec 4 19:36:10 CET 2017
In reference to:
" * A timer was added and each time it expires each non-empty buffer
* in this structure will try to write data into a "non-blocking"
* socket until all data is sent, so that no blocking occurs."
A blocking event can be reliably reproduced in olsr 0.9.6.2 in OpenWRT
Chaos Calmer. The node drops off the mesh and stops responding (not a
good thing when it's remote on a tower :) ).
Test scenario:
- Node_A LAN laptop, "echo /all | nc node_B 9090" sleeps 2 seconds
and repeats (reproduces more quickly, but could be a single hit)
- Node_A has RF link to Node_B with ~70% LQ/NLQ (maybe marginal LQ/NLQ
is a non-factor?)
- Node_B has "echo /nei | nc localhost 2006" sleeps 5 seconds and repeats
After a few minutes, Node_B blocks on send in strace, subsequently
waited ~10 min and SIGTERM'd (search for SIGTERM to find in full log
file URL below).
clock_gettime(CLOCK_MONOTONIC, {281, 819282014}) = 0
accept(13, {sa_family=AF_INET, sin_port=htons(32965),
sin_addr=inet_addr("10.34.163.239")}, [16]) = 17
_newselect(18, [17], NULL, NULL, {0, 20000}) = 1 (in [17], left {0, 19984})
recv(17, "/all\n", 1024, MSG_DONTWAIT) = 5
time(NULL) = 1512345153
clock_gettime(CLOCK_MONOTONIC, {281, 825309152}) = 0
...
clock_gettime(CLOCK_MONOTONIC, {282, 416388829}) = 0
_newselect(18, NULL, [17], NULL, {0, 0}) = 1 (out [17], left {0, 0})
send(17, "{\"pid\": 1726,\"systemTime\": 15123"..., 386040, 0) = 141904
OLSR no longer functions at this point in time, remaining TCP
connections go to CLOSE_WAIT until socket listen limit reached.
Shouldn't there be a MSG_DONTWAIT flag in the send to be non-blocking?
Full strace log:
https://drive.google.com/file/d/0B2bEy75HhwWhVE1BQ3BUdHY3azg/view?usp=sharing
Joe AE6XE
More information about the Olsr-dev
mailing list