[Olsr-dev] olsrd-0.5.2 bmf segmentation fault

Erik Tromp (spam-protected)
Tue Jul 31 18:56:38 CEST 2007


Bernd, Others,

Thanks!

I will do a roundup with patches to BMF in a short time, test everything
against OLSR 0.5.2 and put a new version on sourceforge.

Cheerz,
Erik




-----Oorspronkelijk bericht-----
Van: Bernd Petrovitsch [mailto:(spam-protected)] 
Verzonden: dinsdag 31 juli 2007 13:58
Aan: (spam-protected)
CC: Erik Tromp
Onderwerp: Re: [Olsr-dev] olsrd-0.5.2 bmf segmentation fault

On Mon, 2007-07-30 at 12:28 +0200, Bernd Petrovitsch wrote:
> On Sun, 2007-07-29 at 22:58 +0200, Bernd Petrovitsch wrote:
> [...]
> > Sad news (at least for me): Ido not understand why the SIGSEGV above
> > occurs:
> > - We do a dlopen(3) on the .so file of the plugin (in the olsr_load_dl()
> >   function). And this succeeds (put printf(3)s in there to verify).
> > - We get the function pointer with dlsym(3) to get the interface
> >   version (in the olsr_add_dl() function). This succeeds too and
> >   delivers "4" and so we know that it is the "old version". And we
> >   return from there "-1" (since we do not support the old version with
> >   the original Makefile.inc).
> > - Back in olsr_load_dl(), we see the error and dlclose(3) the shared lib
> >   again (since we can't use it).
> >   And precisely that dlclose(3) call produces the SIGSEGV (put a
> >   printf(3) before and after, look at it with ltrace). But the
> >   "dlhandle" (and the pointer to it) there has the correct value (as
> >   reported by the dlopen(3)) and I can't find or think of a reason why
> >   something could break there with a SIGSEGV.
> >   Don't get me wrong, if something is not correct dlclose(3) can (and
> >   should) report errors, but simply dying on a SIGSEGV is strange (at
> >   best).
> > Any hints anyone?
> 
> Thanks for the 1st hint: Linking everything (and not only the bmf
> plugin) against the pthread library doesn't help.
> 
> Next try: And not stripping the binary and plugins also didn't help.
> 
> And the SIGSEGV occurs with both gcc-3.4.6 and gcc-4.1.1 from 
> CentOS-4.5.

After googling http://www.groupsrv.com/linux/about17472.html, I debugged the
thing and found the culprit:
- dlclose(3) calls the shared libs "destructor" - olsr_plugin_exit().
- olsr_plugin_exit() calls CloseBmf().
- CloseBmf() wants to kill a thread which wasn't started before. And it
   seems that the pthread library isn't prepared for such an error.

The attached - pretty trivial - patch should fix that (it is actually from
CVS-HEAD but the CloseBmf() function didn't change).

Erik, you probably want to incorporate that change too?

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services







More information about the Olsr-dev mailing list