[Olsr-dev] olsrd-0.5.2 bmf segmentation fault
Erik Tromp
(spam-protected)
Tue Jul 31 18:56:38 CEST 2007
Bernd, Others,
Thanks!
I will do a roundup with patches to BMF in a short time, test everything
against OLSR 0.5.2 and put a new version on sourceforge.
Cheerz,
Erik
-----Oorspronkelijk bericht-----
Van: Bernd Petrovitsch [mailto:(spam-protected)]
Verzonden: dinsdag 31 juli 2007 13:58
Aan: (spam-protected)
CC: Erik Tromp
Onderwerp: Re: [Olsr-dev] olsrd-0.5.2 bmf segmentation fault
On Mon, 2007-07-30 at 12:28 +0200, Bernd Petrovitsch wrote:
> On Sun, 2007-07-29 at 22:58 +0200, Bernd Petrovitsch wrote:
> [...]
> > Sad news (at least for me): Ido not understand why the SIGSEGV above
> > occurs:
> > - We do a dlopen(3) on the .so file of the plugin (in the olsr_load_dl()
> > function). And this succeeds (put printf(3)s in there to verify).
> > - We get the function pointer with dlsym(3) to get the interface
> > version (in the olsr_add_dl() function). This succeeds too and
> > delivers "4" and so we know that it is the "old version". And we
> > return from there "-1" (since we do not support the old version with
> > the original Makefile.inc).
> > - Back in olsr_load_dl(), we see the error and dlclose(3) the shared lib
> > again (since we can't use it).
> > And precisely that dlclose(3) call produces the SIGSEGV (put a
> > printf(3) before and after, look at it with ltrace). But the
> > "dlhandle" (and the pointer to it) there has the correct value (as
> > reported by the dlopen(3)) and I can't find or think of a reason why
> > something could break there with a SIGSEGV.
> > Don't get me wrong, if something is not correct dlclose(3) can (and
> > should) report errors, but simply dying on a SIGSEGV is strange (at
> > best).
> > Any hints anyone?
>
> Thanks for the 1st hint: Linking everything (and not only the bmf
> plugin) against the pthread library doesn't help.
>
> Next try: And not stripping the binary and plugins also didn't help.
>
> And the SIGSEGV occurs with both gcc-3.4.6 and gcc-4.1.1 from
> CentOS-4.5.
After googling http://www.groupsrv.com/linux/about17472.html, I debugged the
thing and found the culprit:
- dlclose(3) calls the shared libs "destructor" - olsr_plugin_exit().
- olsr_plugin_exit() calls CloseBmf().
- CloseBmf() wants to kill a thread which wasn't started before. And it
seems that the pthread library isn't prepared for such an error.
The attached - pretty trivial - patch should fix that (it is actually from
CVS-HEAD but the CloseBmf() function didn't change).
Erik, you probably want to incorporate that change too?
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
More information about the Olsr-dev
mailing list