<br><br><div class="gmail_quote">On Mon, Dec 8, 2008 at 7:51 AM, Sven-Ola Tuecke <span dir="ltr"><<a href="mailto:sven-ola@gmx.de" target="_blank">sven-ola@gmx.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Markus,<br>
<br>
the <a href="http://olsr.org" target="_blank">olsr.org</a> routing daemon has issues with ifup/down as well with any<br>
combination of changeing IP, netmask, MTU, queue-config since the beginning<br>
of olsrd. For example: the daemon and the kernel both maintain a table of<br>
routing entries. If some braindead admin or script executes "ip l set dev X</blockquote><div>ok this the script is a bit braindead, and may look very braindead, if you don`t know the background for using it, but ... (see below)</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
down;ip l set dev X up", the kernel routes are removed while not beeing</blockquote><div>as you point it out this should only result in missing kernel routes, but in reality you get wrong kernel routes also, ...<br>
</div><div>.</div><div>reproduceable with an braindeaded ifdownup script, <br>but also findable ocassionlly on routers whre nobody logged in, and olsr runs only on quite normal interfaces, e.g. routers runnig freifunkfirmware (#)</div>
<div># where nobody ever logged into the shell,.. and just some wireless settings were changed and ip4broadcast was confgured on the webif,..</div><div>imho it`s far less braindead to use such a script to get olsrd to reproduce below effects, than to waste time on reading some parts of your email,..</div>
<div><br></div><div>1. missing routes (not the biggest problem as it has to be expected, as they get deleted from the kernel)<br>but in fact olsr does usually quite well in inserting them again<br></div><div>2. wrong routes, yes you end up with routes going out on the wrong interface, and stay in kernel quite long,..<br>
mostly this are intermitted routed created while one interface is down, but are never "moved" back to the "correct" interface after this is up again<br><br></div><div>3. crashing olsr (if you run an ifupdown script to long against an 0.5.6 olsrd )<br>
this was not the goal of this testscript, as it denies testing of above effects, but it showed bugs in code only executed after netlink errors on route updates,..<br>
<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
removed in olsrd.</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Same if you manually fiddle with routing entries<br>
(especially with the default route). This is what I denote "well known".<br>
<br>
With this in mind, you may remove your admin password from devices to prevent<br>
yourself from doing things in an uncontrolled way. As an alternative, replace<br>
the "ip" and "ifconfig" commands with a version that restarts olsrd.</blockquote><div>doing so it will still crash olsrd regulary on about 10 routers in vienna, after some braindead device owner updated their olsrd (or complete firmware) from currently 0.5.5 which currently runs there (at least withut crashing),..<br>
<br></div><div>Why will this happen?</div><div>openvpn runs on this routers, and its devices go up/down dynamically,.. <br>(why openvpn is configured this way is another (long and sad olsr-related) story, but you may also have dynamic wds interfaces you want olsr to run on, or whatever)<br>
</div><div><br>so i didn't invent this braindead test script yesterday just for fun, it`s just a way to reproduce the reason for olsrd 0.5.6 crashes on this routers (and <a href="http://0.5.5.">0.5.5.</a> (and <a href="http://0.5.6.">0.5.6.</a>) producing wrong routes)</div>
<div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
Yes - the situation is worse in 0.5.6, because 0.5.5 has handled this better</blockquote><div>ACK </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
(but not perfectly to my knowledge).</blockquote><div>but good enough to run for weeks without crashing on our vpn-server </div><div>.</div><div>in fact i can not remember it ever crashed, only very seldomly there were some wrong/missing routes.</div>
<div>but this happened as seldomly as it happens on other routers with similar number of interfaces and routing changes, that do not have "dynamic" interfaces.</div><div>.</div>
<div>0.5.6 may crash within minutes/seconds there, and with much luck it runs there for some hours, which is inacceptable, so we still have 0.5.5 there</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Why? Because there is no easy fix with the current stable code IMO. A<br>
reasonable fix includes changeing critical code pathes - which should not<br>
happen in a stable branch.</blockquote><div> </div><div>imo this branch may not be called stable (-;<br>(see at the end why)<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Because it will induce new bugs - bugging the<br>
majority wich do not change their ifconfig every here and then. Something<br>
like this should take place in the development branch IMO.</blockquote><div> </div><div>partly ACK, as i also plan a better handling for this and related problems,.. </div><div>at the moment i have (a quite small) patch to make olsr using its own proto tag in kernel table</div>
<div>preventing olsrd from deleting routes he never made, and enabling him to flush his (of from a previously crashed instance) routes safely (at startup/shutdown)</div><div>.</div><div>the bigger goal is to get rid of a well known olsr problem, inconsitent olsr und kernel routing tables, which unfortunately happens quite often (even on devices without any admins ever logging into) which is the main reason for permanent routing problems in our network (which tend to stay for hours/days)<br>
.<br></div><div>i think the key to fix is lies in handling rtnetlink errors properly, and reacting apropriate on destination unreachable, interface is down, replies you get,<br>maybe we should also consider to let the kernel (via rtnetlink messages) inform olsrd about external route updates,..<br>
.<br></div>
<div>BUT: i still wish olsr 0.5.6 "stable" releases to run stable on routers where 0.5.5 did run stable !!<br>i hope this "wish" is reasonable,..</div><div>.</div><div>Markus</div></div>