[Olsr-dev] [PATCH v1 1/1] lq_packet: only report a neighbour once in a hello message

Ferry Huberts (spam-protected)
Thu Jan 19 22:21:21 CET 2017



On 19/01/17 21:15, (spam-protected) wrote:
> Has it been determined why 3d2fd73a5528a9c7cdccd088f2dcca80b37e66b9
> broke this?
>

It dropped the check for UNSPEC

That's why I asked Henning to look into it, since he knows best why he
did that.

> If it is indeed the cause and we are in a rush to solve this why not
> just revert 3d2fd73a5528a9c7cdccd088f2dcca80b37e66b9?
>


That is also a possibility I've considered.
I want to hear from Henning first though.

In the meantime we can continue to investigate alternatives; we now have
2 patches to deal with it on the TX side, plus the revert, so 3 options.

> This is broken but let's not rush fixes to resolve it.
>
> If there is a need for a drastic quick fix revert
> 3d2fd73a5528a9c7cdccd088f2dcca80b37e66b9 that attempt to handle
> fragmented packets and make sure to solve that issue correctly
> without impacting the rest of the code (BTW: the RFC says we should
> [its a should not a must]  avoid having fragments in the first place
> )

Yes, well Henning didn't do his change for no good reason, duh.
If you go over (I think) 62 neighbours then you _will_ have fragmented
HELLOs. No avoiding that at all.


>
> There is no point to push a new release that is still knowingly
> broken, even if this goes out with the TX fix this hour it will not
> fix anyone out there that does not upgrade all the impacted devices
> in the network.

And how is it broken after my fix?
Only when mixing versions, true.

>
> It's taken since 0.6.7 to even find out this exists,  its likely not
> the most frequency occurring problem impacting users. The small

The effects can be invisible in some network configurations.
We only saw it because we were digging in a development setup.
In other setups it can be quite visible. It all depends.


> group it's impacting can surely experiment with a development patch
> manually if they absolutely need it.
>
> There are also other temporary resolutions available (avoiding two
> interfaces on the same L2 broadcast domain, blocking packets, etc)
>


PS. would you mind introducing yourself? I think I've not seen you on
this mailing list before.


> Sent from my iPhone
>
>> On Jan 19, 2017, at 11:46 AM, Ferry Huberts <(spam-protected)>
>> wrote:
>>
>>
>>
>>> On 19/01/17 16:36, Teco Boot wrote: I'm still testing. I want to
>>> see a 100% fix on both TX (eliminate duplicates that are not
>>> correct or relevant) and on RX (don't use LQ where
>>> link-type=UNSPEC. The RFC is my guide.
>>
>> Eliminating (these kinds of) duplicates is NOT ALLOWED per the RFC.
>> The RFC very clearly states that the ENTIRE neighbourhood must be
>> sent, and neighbours not reachable over the sending interface must
>> be UNSPEC. In the example setup that results in one SYM neighbour
>> and one UNSPEC neighbour, which is entirely correct.
>>
>> There is not enough information in the messages to fully fix this,
>> accept it. This can only be fixed if the hellos contain the
>> interface indexes with every record; then these duplicates will
>> cease to be duplicates. That would require an RFC change, and we
>> will not go there.
>>
>>
>>>
>>> I'm facing lots of SPFs where topology is stable. The glitches
>>> bite.
>>>
>>> I tested a bit with fragments. I want to make sure that the RX
>>> patch doesn't break something.
>>
>>
>> There is no RX patch (yet), so what are your talking about?
>>
>>
>>>
>>> I don't see a reason to hurry.
>>
>> There is VERY good reason to hurry: this has been broken since
>> 3d2fd73a5528a9c7cdccd088f2dcca80b37e66b9, which is every release
>> since - and including - v0.6.7.
>>
>> The oneliner patch solves it, or at least mitigates it.
>>
>> We need to make a patch on the RX side as well, as we already
>> agreed and as Henning indicated earlier today.
>>
>>
>> I am going to push my TX fix tomorrow and I will work on the RX
>> patch after next week.
>>
>>
>> I will also put the fix on our release branch, and coordinate with
>> Henning to do a release of that branch.
>>
>>
>> F
>>
>>>
>>> Teco
>>>
>>>
>>>> Op 19 jan. 2017, om 15:00 heeft Henning Rogge
>>>> <(spam-protected)> het volgende geschreven:
>>>>
>>>> On Thu, Jan 19, 2017 at 2:57 PM, Ferry Huberts
>>>> <(spam-protected)> wrote:
>>>>
>>>>
>>>> On 19/01/17 13:33, Henning Rogge wrote: Hi,
>>>>
>>>> back from (two) business trips...
>>>>
>>>> On Mon, Jan 16, 2017 at 6:14 PM, Ferry Huberts
>>>> <(spam-protected) <mailto:(spam-protected)>> wrote:
>>>>
>>>> I think I may have stumbled onto a MUCH easier fix to this
>>>> problem.
>>>>
>>>>
>>>> The actual problem is quite involved (with fragmented hello
>>>> messages etc) but it all boils down to a later UNSPEC link
>>>> overwriting a previous SYM/ASYM/other link on the receiving
>>>> end.
>>>>
>>>> Since the neighbours in a hello message are already ordered
>>>> when the hello message is sent out, the following patch also
>>>> solves it.
>>>>
>>>> Opinions?
>>>>
>>>>
>>>> I like the idea of solve the problem by sorting, but I worry
>>>> that doing so on the TX side is a bit "brittle"... its easy to
>>>> break because there is nothing on the TX side that needs this
>>>> order. And there will always be older Olsrd installations on
>>>> many community networks.
>>>>
>>>> How difficult would it be to do the sorting on the RX side in
>>>> deserialize_hello() (process_package.c: 318..)?
>>>>
>>>> Instead of building them into a single linked list, have an
>>>> array with MAX_LINK lists, one for each link type... and then
>>>> concatenate this lists before parsing in the right order.
>>>>
>>>> This way we also fix the problem appearing again because of an
>>>> outdated Olsrd sending bad data, right?
>>>>
>>>>
>>>>
>>>> If we do that as a follow-up patch? Would that be ok?
>>>>
>>>> Sounds good... quick fix now to solve the problem for the
>>>> current code, cleanup later to make it future proof.
>>>>
>>>> Henning
>>>
>>
>> -- Ferry Huberts
>>
>> -- Olsr-dev mailing list (spam-protected)
>> https://lists.olsr.org/mailman/listinfo/olsr-dev
>
>
> On 19/01/17 16:36, Teco Boot wrote:
>> I'm still testing. I want to see a 100% fix on both TX (eliminate
>> duplicates that are not correct or relevant) and on RX (don't use
>> LQ where link-type=UNSPEC. The RFC is my guide.
>
> Eliminating (these kinds of) duplicates is NOT ALLOWED per the RFC.
> The RFC very clearly states that the ENTIRE neighbourhood must be
> sent, and neighbours not reachable over the sending interface must
> be UNSPEC. In the example setup that results in one SYM neighbour
> and one UNSPEC neighbour, which is entirely correct.
>
> There is not enough information in the messages to fully fix this,
> accept it. This can only be fixed if the hellos contain the
> interface indexes with every record; then these duplicates will cease
> to be duplicates. That would require an RFC change, and we will not
> go there.
>
>
>>
>> I'm facing lots of SPFs where topology is stable. The glitches
>> bite.
>>
>> I tested a bit with fragments. I want to make sure that the RX
>> patch doesn't break something.
>
>
> There is no RX patch (yet), so what are your talking about?
>
>
>>
>> I don't see a reason to hurry.
>
> There is VERY good reason to hurry: this has been broken since
> 3d2fd73a5528a9c7cdccd088f2dcca80b37e66b9, which is every release
> since - and including - v0.6.7.
>
> The oneliner patch solves it, or at least mitigates it.
>
> We need to make a patch on the RX side as well, as we already agreed
> and as Henning indicated earlier today.
>
>
> I am going to push my TX fix tomorrow and I will work on the RX
> patch after next week.
>
>
> I will also put the fix on our release branch, and coordinate with
> Henning to do a release of that branch.
>
>
> F
>
>>
>> Teco
>>
>>
>>> Op 19 jan. 2017, om 15:00 heeft Henning Rogge <(spam-protected)>
>>> het volgende geschreven:
>>>
>>> On Thu, Jan 19, 2017 at 2:57 PM, Ferry Huberts
>>> <(spam-protected)> wrote:
>>>
>>>
>>> On 19/01/17 13:33, Henning Rogge wrote: Hi,
>>>
>>> back from (two) business trips...
>>>
>>> On Mon, Jan 16, 2017 at 6:14 PM, Ferry Huberts
>>> <(spam-protected) <mailto:(spam-protected)>> wrote:
>>>
>>> I think I may have stumbled onto a MUCH easier fix to this
>>> problem.
>>>
>>>
>>> The actual problem is quite involved (with fragmented hello
>>> messages etc) but it all boils down to a later UNSPEC link
>>> overwriting a previous SYM/ASYM/other link on the receiving end.
>>>
>>> Since the neighbours in a hello message are already ordered when
>>> the hello message is sent out, the following patch also solves
>>> it.
>>>
>>> Opinions?
>>>
>>>
>>> I like the idea of solve the problem by sorting, but I worry
>>> that doing so on the TX side is a bit "brittle"... its easy to
>>> break because there is nothing on the TX side that needs this
>>> order. And there will always be older Olsrd installations on
>>> many community networks.
>>>
>>> How difficult would it be to do the sorting on the RX side in
>>> deserialize_hello() (process_package.c: 318..)?
>>>
>>> Instead of building them into a single linked list, have an
>>> array with MAX_LINK lists, one for each link type... and then
>>> concatenate this lists before parsing in the right order.
>>>
>>> This way we also f
> ix the problem appearing again because of an outdated
>>> Olsrd sending bad data, right?
>>>
>>>
>>>
>>> If we do that as a follow-up patch? Would that be ok?
>>>
>>> Sounds good... quick fix now to solve the problem for the
>>> current code, cleanup later to make it future proof.
>>>
>>> Henning
>>
>

-- 
Ferry Huberts



More information about the Olsr-dev mailing list