Skip Menu |
 

Date: Tue, 19 Dec 2006 06:26:57 -0500
From: Ezra Peisach <epeisach@MIT.EDU>
To: krb5-bugs@MIT.EDU
Subject: Race condition in utils/support/threads.c if one thread calls exit....
Download (untitled) / with headers
text/plain 3.3KiB
Summary:
----------
If one runs tests/threads/prof1 without the /tmp/foo.conf or
/tmp/foo1.conf file on an SMP machine, one observes

./prof1: prof1: ../../../src/util/support/threads.c:226:
krb5int_getspecific: Assertion `destructors_set[keynum] == 1' failed.
Abort

every once in a while...

The problem:
--------------
One of the threads has called exit() while another thread is in
krb5int_getspecific().

Stack trace: (one thread)

#0 0x005177a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x002a57a5 in raise () from /lib/tls/libc.so.6
#2 0x002a7209 in abort () from /lib/tls/libc.so.6
#3 0x0029ed91 in __assert_fail () from /lib/tls/libc.so.6
#4 0x00578c0a in krb5int_getspecific (keynum=K5_KEY_COM_ERR)
at ../../../src/util/support/threads.c:226
#5 0x003c064b in get_thread_buffer ()
at ../../../src/util/et/error_message.c:98
#6 0x003c06db in error_message (code=2)
at ../../../src/util/et/error_message.c:146
#7 0x003c1a15 in default_com_err_proc (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")", ap=0xb4dc540c
"ç\212\004\b")
at ../../../src/util/et/com_err.c:75
#8 0x003c1ac5 in com_err_va (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")", ap=0xb4dc540c
"ç\212\004\b")
at ../../../src/util/et/com_err.c:104
#9 0x003c1d83 in com_err (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")")
at ../../../src/util/et/com_err.c:131
#10 0x080488f5 in worker (arg=0x0) at ../../../src/tests/threads/prof1.c:37

(another thread):
#0 0x005177a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00d911de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2 0x00d8de3b in _L_mutex_lock_35 () from /lib/tls/libpthread.so.0
#3 0x0057dda0 in flag_pthread_loaded () from
/tmp/b123/lib/libkrb5support.so.0
#4 0x003c3b5c in ?? () from /tmp/b123/lib/libcom_err.so.3
#5 0x003c3c60 in et_list_lock () from /tmp/b123/lib/libcom_err.so.3
#6 0xb7fe22a0 in ?? ()
#7 0xb7fca328 in ?? ()
#8 0x003c0210 in com_err_terminate ()
at ../../../src/util/et/error_message.c:77
#9 0x003c0210 in com_err_terminate ()
at ../../../src/util/et/error_message.c:77
#10 0x003bfcb7 in __do_global_dtors_aux () from
/tmp/b123/lib/libcom_err.so.3
#11 0x003c233a in _fini () from /tmp/b123/lib/libcom_err.so.3
#12 0x00523907 in _dl_fini () from /lib/ld-linux.so.2
#13 0x002a8527 in exit () from /lib/tls/libc.so.6
#14 0x08048901 in worker (arg=Could not find the frame base for "worker".
) at ../../../src/tests/threads/prof1.c:38


What is going on?
------------------

The exit handler in libcomerr calls k5_key_delete before the
k5_mutex_destroy on com_err_hook.
k5_key_delete will set the destructors_set for the key to 0. This is
under a mutex_lock on the global threads lock.

While this is happening, another thread is
using the com_err library - and it wants to do a k5_getspecific which
has a line:
assert(destructors_set[keynum] == 1) - without any mutex locking....

How to fix the problem?
-------------------------
a) user should never call exit in a thread - user should return - or use
pthread_exit?

b) Remove the assertion on destructors_set...

c) Maybe use a mutex before testing destructors_set - and if in
k5_getspecific - return NULL if not set - that would
indicate in the process of exiting?

I have another idea...

In addition to destructor_set - add an array destructor_was_set.
When a key is deleted, the was_set is then registered. Then in the
getspecific/setspecific code, the handler can determine if the shutdown
code is execting as the _set will be zero - and the was_set will be one.

This would be an indication that the proper code had been followed so an
assertion could be tested.

In addition - instead of a simple assertion when testing the global
thread lock - a modification in which the value is tested - if one would
abort - gain the mutex and retest the value... Minimizes the
performance hit...

Ezra