Date: | Tue, 19 Dec 2006 06:26:57 -0500 |
From: | Ezra Peisach <epeisach@MIT.EDU> |
To: | krb5-bugs@MIT.EDU |
Subject: | Race condition in utils/support/threads.c if one thread calls exit.... |
Summary:
----------
If one runs tests/threads/prof1 without the /tmp/foo.conf or
/tmp/foo1.conf file on an SMP machine, one observes
./prof1: prof1: ../../../src/util/support/threads.c:226:
krb5int_getspecific: Assertion `destructors_set[keynum] == 1' failed.
Abort
every once in a while...
The problem:
--------------
One of the threads has called exit() while another thread is in
krb5int_getspecific().
Stack trace: (one thread)
#0 0x005177a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x002a57a5 in raise () from /lib/tls/libc.so.6
#2 0x002a7209 in abort () from /lib/tls/libc.so.6
#3 0x0029ed91 in __assert_fail () from /lib/tls/libc.so.6
#4 0x00578c0a in krb5int_getspecific (keynum=K5_KEY_COM_ERR)
at ../../../src/util/support/threads.c:226
#5 0x003c064b in get_thread_buffer ()
at ../../../src/util/et/error_message.c:98
#6 0x003c06db in error_message (code=2)
at ../../../src/util/et/error_message.c:146
#7 0x003c1a15 in default_com_err_proc (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")", ap=0xb4dc540c
"ç\212\004\b")
at ../../../src/util/et/com_err.c:75
#8 0x003c1ac5 in com_err_va (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")", ap=0xb4dc540c
"ç\212\004\b")
at ../../../src/util/et/com_err.c:104
#9 0x003c1d83 in com_err (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")")
at ../../../src/util/et/com_err.c:131
#10 0x080488f5 in worker (arg=0x0) at ../../../src/tests/threads/prof1.c:37
(another thread):
#0 0x005177a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00d911de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2 0x00d8de3b in _L_mutex_lock_35 () from /lib/tls/libpthread.so.0
#3 0x0057dda0 in flag_pthread_loaded () from
/tmp/b123/lib/libkrb5support.so.0
#4 0x003c3b5c in ?? () from /tmp/b123/lib/libcom_err.so.3
#5 0x003c3c60 in et_list_lock () from /tmp/b123/lib/libcom_err.so.3
#6 0xb7fe22a0 in ?? ()
#7 0xb7fca328 in ?? ()
#8 0x003c0210 in com_err_terminate ()
at ../../../src/util/et/error_message.c:77
#9 0x003c0210 in com_err_terminate ()
at ../../../src/util/et/error_message.c:77
#10 0x003bfcb7 in __do_global_dtors_aux () from
/tmp/b123/lib/libcom_err.so.3
#11 0x003c233a in _fini () from /tmp/b123/lib/libcom_err.so.3
#12 0x00523907 in _dl_fini () from /lib/ld-linux.so.2
#13 0x002a8527 in exit () from /lib/tls/libc.so.6
#14 0x08048901 in worker (arg=Could not find the frame base for "worker".
) at ../../../src/tests/threads/prof1.c:38
What is going on?
------------------
The exit handler in libcomerr calls k5_key_delete before the
k5_mutex_destroy on com_err_hook.
k5_key_delete will set the destructors_set for the key to 0. This is
under a mutex_lock on the global threads lock.
While this is happening, another thread is
using the com_err library - and it wants to do a k5_getspecific which
has a line:
assert(destructors_set[keynum] == 1) - without any mutex locking....
How to fix the problem?
-------------------------
a) user should never call exit in a thread - user should return - or use
pthread_exit?
b) Remove the assertion on destructors_set...
c) Maybe use a mutex before testing destructors_set - and if in
k5_getspecific - return NULL if not set - that would
indicate in the process of exiting?
----------
If one runs tests/threads/prof1 without the /tmp/foo.conf or
/tmp/foo1.conf file on an SMP machine, one observes
./prof1: prof1: ../../../src/util/support/threads.c:226:
krb5int_getspecific: Assertion `destructors_set[keynum] == 1' failed.
Abort
every once in a while...
The problem:
--------------
One of the threads has called exit() while another thread is in
krb5int_getspecific().
Stack trace: (one thread)
#0 0x005177a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x002a57a5 in raise () from /lib/tls/libc.so.6
#2 0x002a7209 in abort () from /lib/tls/libc.so.6
#3 0x0029ed91 in __assert_fail () from /lib/tls/libc.so.6
#4 0x00578c0a in krb5int_getspecific (keynum=K5_KEY_COM_ERR)
at ../../../src/util/support/threads.c:226
#5 0x003c064b in get_thread_buffer ()
at ../../../src/util/et/error_message.c:98
#6 0x003c06db in error_message (code=2)
at ../../../src/util/et/error_message.c:146
#7 0x003c1a15 in default_com_err_proc (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")", ap=0xb4dc540c
"ç\212\004\b")
at ../../../src/util/et/com_err.c:75
#8 0x003c1ac5 in com_err_va (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")", ap=0xb4dc540c
"ç\212\004\b")
at ../../../src/util/et/com_err.c:104
#9 0x003c1d83 in com_err (whoami=0xbff23b12 "./prof1", code=2,
fmt=0x8048b03 "calling profile_init(\"%s\")")
at ../../../src/util/et/com_err.c:131
#10 0x080488f5 in worker (arg=0x0) at ../../../src/tests/threads/prof1.c:37
(another thread):
#0 0x005177a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00d911de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2 0x00d8de3b in _L_mutex_lock_35 () from /lib/tls/libpthread.so.0
#3 0x0057dda0 in flag_pthread_loaded () from
/tmp/b123/lib/libkrb5support.so.0
#4 0x003c3b5c in ?? () from /tmp/b123/lib/libcom_err.so.3
#5 0x003c3c60 in et_list_lock () from /tmp/b123/lib/libcom_err.so.3
#6 0xb7fe22a0 in ?? ()
#7 0xb7fca328 in ?? ()
#8 0x003c0210 in com_err_terminate ()
at ../../../src/util/et/error_message.c:77
#9 0x003c0210 in com_err_terminate ()
at ../../../src/util/et/error_message.c:77
#10 0x003bfcb7 in __do_global_dtors_aux () from
/tmp/b123/lib/libcom_err.so.3
#11 0x003c233a in _fini () from /tmp/b123/lib/libcom_err.so.3
#12 0x00523907 in _dl_fini () from /lib/ld-linux.so.2
#13 0x002a8527 in exit () from /lib/tls/libc.so.6
#14 0x08048901 in worker (arg=Could not find the frame base for "worker".
) at ../../../src/tests/threads/prof1.c:38
What is going on?
------------------
The exit handler in libcomerr calls k5_key_delete before the
k5_mutex_destroy on com_err_hook.
k5_key_delete will set the destructors_set for the key to 0. This is
under a mutex_lock on the global threads lock.
While this is happening, another thread is
using the com_err library - and it wants to do a k5_getspecific which
has a line:
assert(destructors_set[keynum] == 1) - without any mutex locking....
How to fix the problem?
-------------------------
a) user should never call exit in a thread - user should return - or use
pthread_exit?
b) Remove the assertion on destructors_set...
c) Maybe use a mutex before testing destructors_set - and if in
k5_getspecific - return NULL if not set - that would
indicate in the process of exiting?