[Olsr-users] Interesting syslog msgs when tap0 disappears

Eric Malkowski (spam-protected)
Thu Oct 1 05:52:44 CEST 2009


In testing OLSR across an openvpn tap interface tonight, I killed openvpn and
got some interesting syslog messages.
This is running the tip of the 0.5.6 branch I believe to be the release
candidate for 0.5.6-r6.

Here's the log snippet

Jan  1 02:45:16 hocr-node-3 daemon.info olsrd[432]: Removing interface tap0
Jan  1 02:45:16 hocr-node-3 daemon.err olsrd[432]: OLSR: sendto IPv4 No such device
Jan  1 02:45:18 hocr-node-3 daemon.err olsrd[432]: error 'No such process' (3)
del route to 10.5.1.4/32 via 192.168.32.2 dev void
Jan  1 02:45:18 hocr-node-3 daemon.err olsrd[432]: . ignoring 'No such process'
(3) while deleting route!
Jan  1 02:45:18 hocr-node-3 daemon.err olsrd[432]: error 'No such process' (3)
del route to 10.2.4.0/24 via 192.168.32.2 dev void
Jan  1 02:45:18 hocr-node-3 daemon.err olsrd[432]: . ignoring 'No such process'
(3) while deleting route!
Jan  1 02:45:18 hocr-node-3 daemon.err olsrd[432]: error 'No such process' (3)
del route to 192.168.32.2/32 dev void
Jan  1 02:45:18 hocr-node-3 daemon.err olsrd[432]: . ignoring 'No such process'
(3) while deleting route!

I'm wondering if olsr realizes (via new netlink notifier stuff perhaps?) tap0
went down, but because openvpn also deletes the interface all together, olsr is
trying to do some route cleanups against a now non-existent tap0 interface. 
i.e. it's a sort of race condition.

I'm no expert on this stuff, but some years ago I did some virtual net interface
drivers in the linux kernel and upon notification of an interface changing state
(by ifconfig from userland and getting called via net device notifier chain),
the kernel code could do whatever it needed to atomically and return from
notifier callback when done.  Since this is a notification to olsr that the
interface is gone, the openvpn process can still wipe the interface out even
before olsr is "done" with whatever it's doing as a result of being notified --
i.e. it's not a chain of calls in kernel space, but different userspace
processes can step on each other is what I'm thinking.  i.e. it's too bad olsr
couldn't be notified, block other stuff from going on while it cleans up
(preventing openvpn from wiping the interface), and then "return" from the
notifier chain and let openvpn finish it's work of removing the device.

I hope I don't sound too confusing...  and maybe these messages can safely be
ignored, but it looks like there could be some null strings being printed to the
log or something.

-Eric Malkowski





More information about the Olsr-users mailing list