[Olsr-dev] info plugin still send blocking

Ferry Huberts (spam-protected)
Wed Dec 6 17:47:40 CET 2017


Thnaks guys!

The patch is now on master.

On 06/12/17 16:52, Joe Ayers wrote:
> The tests with this patch are successful.
> 
> Before:   4+ sequential failures reproduced in less than an hour each time
> 
> After:  same test setup, added an additional 15 second repeating on-node 
> "echo /all |  nc localhost 2006" to further stress test.   Ran for over 
> 11 hours with no failures, still running.
> 
> Good to go from my perspective.
> 
> Joe AE6XE
> 
> On Mon, Dec 4, 2017 at 11:59 PM, Henning Rogge <(spam-protected) 
> <mailto:(spam-protected)>> wrote:
> 
>     Hi,
> 
>     could you test the following patch?
> 
>     Henning
> 
>     On Mon, Dec 4, 2017 at 7:53 PM, Joe Ayers <(spam-protected)
>     <mailto:(spam-protected)>> wrote:
>      > Correction, full strace log file URL is:
>      >
>      >
>     https://drive.google.com/file/d/1TGW5VFpcKppbd82eT72qf6TqTtgqy-0j/view?usp=sharing
>     <https://drive.google.com/file/d/1TGW5VFpcKppbd82eT72qf6TqTtgqy-0j/view?usp=sharing>
>      >
>      >
>      >
>      > On Mon, Dec 4, 2017 at 10:36 AM, Joe Ayers <(spam-protected)
>     <mailto:(spam-protected)>> wrote:
>      >> In reference to:
>      >>
>      >> " * A timer was added and each time it expires each non-empty buffer
>      >>   * in this structure will try to write data into a "non-blocking"
>      >>   * socket until all data is sent, so that no blocking occurs."
>      >>
>      >> A blocking event can be reliably reproduced in olsr 0.9.6.2 in
>     OpenWRT
>      >> Chaos Calmer.  The node drops off the mesh and stops responding
>     (not a
>      >> good thing when it's remote on a tower :) ).
>      >>
>      >> Test scenario:
>      >>
>      >> - Node_A LAN laptop, "echo /all | nc node_B 9090"  sleeps 2 seconds
>      >> and repeats (reproduces more quickly, but could be a single hit)
>      >> - Node_A has RF link to Node_B with ~70% LQ/NLQ (maybe marginal
>     LQ/NLQ
>      >> is a non-factor?)
>      >> - Node_B has "echo /nei | nc localhost 2006" sleeps 5 seconds
>     and repeats
>      >>
>      >> After a few minutes, Node_B blocks on send in strace,  subsequently
>      >> waited ~10 min and SIGTERM'd  (search for SIGTERM to find in
>     full log
>      >> file URL below).
>      >>
>      >> clock_gettime(CLOCK_MONOTONIC, {281, 819282014}) = 0
>      >> accept(13, {sa_family=AF_INET, sin_port=htons(32965),
>      >> sin_addr=inet_addr("10.34.163.239")}, [16]) = 17
>      >> _newselect(18, [17], NULL, NULL, {0, 20000}) = 1 (in [17], left
>     {0, 19984})
>      >> recv(17, "/all\n", 1024, MSG_DONTWAIT)  = 5
>      >> time(NULL)                              = 1512345153
>      >> clock_gettime(CLOCK_MONOTONIC, {281, 825309152}) = 0
>      >> ...
>      >> clock_gettime(CLOCK_MONOTONIC, {282, 416388829}) = 0
>      >> _newselect(18, NULL, [17], NULL, {0, 0}) = 1 (out [17], left {0, 0})
>      >> send(17, "{\"pid\": 1726,\"systemTime\": 15123"..., 386040, 0) =
>     141904
>      >>
>      >> OLSR no longer functions at this point in time, remaining TCP
>      >> connections go to CLOSE_WAIT until socket listen limit reached.
>      >> Shouldn't there be a MSG_DONTWAIT flag in the send to be
>     non-blocking?
>      >>
>      >> Full strace log:
>      >>
>     https://drive.google.com/file/d/0B2bEy75HhwWhVE1BQ3BUdHY3azg/view?usp=sharing
>     <https://drive.google.com/file/d/0B2bEy75HhwWhVE1BQ3BUdHY3azg/view?usp=sharing>
>      >>
>      >> Joe AE6XE
>     >
>     > --
>     > Olsr-dev mailing list
>     > (spam-protected) <mailto:(spam-protected)>
>     > https://lists.olsr.org/mailman/listinfo/olsr-dev
>     <https://lists.olsr.org/mailman/listinfo/olsr-dev>
> 
> 
> 
> 

-- 
Ferry Huberts




More information about the Olsr-dev mailing list