[Olsr-dev] info plugin still send blocking
Ferry Huberts
(spam-protected)
Wed Dec 6 17:47:40 CET 2017
Thnaks guys!
The patch is now on master.
On 06/12/17 16:52, Joe Ayers wrote:
> The tests with this patch are successful.
>
> Before: 4+ sequential failures reproduced in less than an hour each time
>
> After: same test setup, added an additional 15 second repeating on-node
> "echo /all | nc localhost 2006" to further stress test. Ran for over
> 11 hours with no failures, still running.
>
> Good to go from my perspective.
>
> Joe AE6XE
>
> On Mon, Dec 4, 2017 at 11:59 PM, Henning Rogge <(spam-protected)
> <mailto:(spam-protected)>> wrote:
>
> Hi,
>
> could you test the following patch?
>
> Henning
>
> On Mon, Dec 4, 2017 at 7:53 PM, Joe Ayers <(spam-protected)
> <mailto:(spam-protected)>> wrote:
> > Correction, full strace log file URL is:
> >
> >
> https://drive.google.com/file/d/1TGW5VFpcKppbd82eT72qf6TqTtgqy-0j/view?usp=sharing
> <https://drive.google.com/file/d/1TGW5VFpcKppbd82eT72qf6TqTtgqy-0j/view?usp=sharing>
> >
> >
> >
> > On Mon, Dec 4, 2017 at 10:36 AM, Joe Ayers <(spam-protected)
> <mailto:(spam-protected)>> wrote:
> >> In reference to:
> >>
> >> " * A timer was added and each time it expires each non-empty buffer
> >> * in this structure will try to write data into a "non-blocking"
> >> * socket until all data is sent, so that no blocking occurs."
> >>
> >> A blocking event can be reliably reproduced in olsr 0.9.6.2 in
> OpenWRT
> >> Chaos Calmer. The node drops off the mesh and stops responding
> (not a
> >> good thing when it's remote on a tower :) ).
> >>
> >> Test scenario:
> >>
> >> - Node_A LAN laptop, "echo /all | nc node_B 9090" sleeps 2 seconds
> >> and repeats (reproduces more quickly, but could be a single hit)
> >> - Node_A has RF link to Node_B with ~70% LQ/NLQ (maybe marginal
> LQ/NLQ
> >> is a non-factor?)
> >> - Node_B has "echo /nei | nc localhost 2006" sleeps 5 seconds
> and repeats
> >>
> >> After a few minutes, Node_B blocks on send in strace, subsequently
> >> waited ~10 min and SIGTERM'd (search for SIGTERM to find in
> full log
> >> file URL below).
> >>
> >> clock_gettime(CLOCK_MONOTONIC, {281, 819282014}) = 0
> >> accept(13, {sa_family=AF_INET, sin_port=htons(32965),
> >> sin_addr=inet_addr("10.34.163.239")}, [16]) = 17
> >> _newselect(18, [17], NULL, NULL, {0, 20000}) = 1 (in [17], left
> {0, 19984})
> >> recv(17, "/all\n", 1024, MSG_DONTWAIT) = 5
> >> time(NULL) = 1512345153
> >> clock_gettime(CLOCK_MONOTONIC, {281, 825309152}) = 0
> >> ...
> >> clock_gettime(CLOCK_MONOTONIC, {282, 416388829}) = 0
> >> _newselect(18, NULL, [17], NULL, {0, 0}) = 1 (out [17], left {0, 0})
> >> send(17, "{\"pid\": 1726,\"systemTime\": 15123"..., 386040, 0) =
> 141904
> >>
> >> OLSR no longer functions at this point in time, remaining TCP
> >> connections go to CLOSE_WAIT until socket listen limit reached.
> >> Shouldn't there be a MSG_DONTWAIT flag in the send to be
> non-blocking?
> >>
> >> Full strace log:
> >>
> https://drive.google.com/file/d/0B2bEy75HhwWhVE1BQ3BUdHY3azg/view?usp=sharing
> <https://drive.google.com/file/d/0B2bEy75HhwWhVE1BQ3BUdHY3azg/view?usp=sharing>
> >>
> >> Joe AE6XE
> >
> > --
> > Olsr-dev mailing list
> > (spam-protected) <mailto:(spam-protected)>
> > https://lists.olsr.org/mailman/listinfo/olsr-dev
> <https://lists.olsr.org/mailman/listinfo/olsr-dev>
>
>
>
>
--
Ferry Huberts
More information about the Olsr-dev
mailing list