Skip Menu |
 

Subject: Hang in malloc_consolidate() on RedHat 9 running krb5-1.2.x
Download (untitled) / with headers
text/plain 4.3KiB
We have had two reports of this problem on RedHat 9 machines. Since the
stack traces are so similar, we are currently assuming they are the same
problem:

(1)

From Garry Zacheiss <zacheiss@mit.edu> on Tue, 03 Feb 2004 14:32:08 EST:

While sshing to a Linux Athena machine running krb5-1.2.5. The sshd on
the Linux machine hangs at:


#0 0x42074d8c in malloc_consolidate () from /lib/tls/libc.so.6
#1 0x420743c9 in _int_malloc () from /lib/tls/libc.so.6
#2 0x4207378d in malloc () from /lib/tls/libc.so.6
#3 0x420f50b7 in getservbyname () from /lib/tls/libc.so.6
#4 0x0812222a in krb5_locate_srv_conf (context=0x81b5e88, realm=0x81bc8ec,
name=0x817ed1e "kdc", addr_pp=0xbfffd42c, naddrs=0xbfffd428,
get_masters=0)
at locate_kdc.c:168


(2)

From: Ken Weaverling <weave@spamcop.net>
Date: Mon, 02 Feb 2004 09:57:02 -0600
To: kerberos@mit.edu
Subject: malloc hang inside krb5_sendto_kdc

I'm having some weird kerberos authentication issues since upgrading a
Redhat box from 7.3 to RHEL 3. imap authenticates against a windows 2000
kerberos server. That worked under 7.3 for well over a year on a fairly
heavy loaded box (~300 imap connections open, a few new connects a
second).

Since upgrading to RHEL 3, a few times a day an imap process will go
into a CPU loop and consume all resources and sometimes other processes,
such as our ldap server and apache server will hang until that imap
process is killed.

Attaching to the processes always indicates the hang is within malloc()
and always being called from krb5_sendto_kdc. The loop is somewhere
within malloc. The function never returns.

A sample backtrace...

#0 0xb735c164 in malloc_consolidate () from /lib/tls/libc.so.6
#1 0xb735b769 in _int_malloc () from /lib/tls/libc.so.6
#2 0xb735ab0d in malloc () from /lib/tls/libc.so.6
#3 0xb75ad622 in krb5_sendto_kdc (context=0x1, message=0x81214a8,
realm=0x1,
reply=0xbfffb510, use_master=1) at sendto_kdc.c:97
#4 0xb75961f3 in send_as_request (context=0x8117ba0, request=
0xbfffb5d0,
time_now=0xbfffb510, ret_err_reply=0xbfffb594, ret_as_reply=
0xbfffb598,
use_master=1) at get_in_tkt.c:117
#5 0xb7597420 in krb5_get_init_creds (context=0x8117ba0, creds=
0x81235dc,
client=0x811b9c8, prompter=0, prompter_data=0x0, start_time=0,
in_tkt_service=0x0, options=0x811b934,
gak_fct=0xb7597e20 <krb5_get_as_key_password>, gak_data=0xbfffc310,
use_master=1, as_reply=0xbfffb6bc) at get_in_tkt.c:946
#6 0xb7598877 in krb5_get_init_creds_password (context=0x8117ba0,
creds=0x81235dc, client=0x811b9c8, password=0x8116b50 "", prompter=
0,
data=0x0, start_time=0, in_tkt_service=0x0, options=0x811b934)
at gic_pwd.c:156
#7 0xb729d557 in pam_sm_authenticate () from /lib/security/pam_krb5.so
#8 0xb75d2c06 in pam_fail_delay () from /lib/libpam.so.0
#9 0xb75d2d81 in _pam_dispatch () from /lib/libpam.so.0
#10 0xb75d4858 in pam_authenticate () from /lib/libpam.so.0
#11 0x08062323 in server_input_wait ()
#12 0x0805bd55 in server_input_wait ()
#13 0x0805bfc6 in server_input_wait ()
#14 0x0805b225 in auth_plain_server ()
#15 0x08072b77 in mail_thread_compare_date ()
#16 0x080500c5 in ?? ()
#17 0x08103f2f in cmdbuf ()
#18 0x08056110 in fetch_rfc822_text ()
#19 0xb72ff748 in __libc_start_main () from /lib/tls/libc.so.6
#20 0x0804c9f1 in ?? ()

Line 97 in sendto_kdc.c is:

94 for (i = 0; i < naddr; i++)
95 socklist[i] = INVALID_SOCKET;
96
97 if (!(reply->data = malloc(krb5_max_dgram_size))) {
98 krb5_xfree(addr);
99 krb5_xfree(socklist);
100 return ENOMEM;
101 }

This is krb5-libs-1.2.7-19 btw...

I do have one of four domain controllers running 2003 server, but
krb5.conf points to the 2000 server. We tried pointing to 2003 server
but it fails at times due to the tcp issue which I've read is fixed in
1.3, which is why we aren't upgrading them all right now.

So .... is this a known bug? I've read some stuff that if a program
clobbers malloc'ed memory it can sometimes exhibit a hang in
_malloc_consolidate.

Any hints on next steps I can take? I have an open support call with
redhat for the past two weeks but it's not moving very quickly. :(

thx
Show quoted text
________________________________________________
Kerberos mailing list Kerberos@mit.edu
https://mailman.mit.edu/mailman/listinfo/kerberos
To: nalin@redhat.com
Cc: rt-comment@krbdev.mit.edu
Subject: [krbdev.mit.edu #2197] New glibc threading interacts badly with Kerberos
From: Sam Hartman <hartmans@mit.edu>
Date: Tue, 03 Feb 2004 14:53:13 -0500
RT-Send-Cc:

Hi. We (the MIT Kerberos team) have been getting several vage reports
of malloc or other related libc hangs in Kerberos applications, some
times with raw builds of MIT Kerberos and some times with Redhat's
RPMs.

In at least some cases (ssh hangs in malloc) setting LD_ASSUME_KERNEL
to 2.4.1 will make the problem go away. So we suspect potential new
threading model issues.

Do you happen to know what's going on here? If not, do you have any
hints for us in debugging this problem?
Download (untitled)
message/rfc822 5.7KiB
Return-Path: <kerberos-bounces@MIT.EDU>
Received: from solipsist-nation ([unix socket])
by solipsist-nation (Cyrus v2.1.5-Debian2.1.5-1) with LMTP; Mon, 02 Feb
2004 16:13:20 -0500
X-Sieve: CMU Sieve 2.2
Return-Path: <kerberos-bounces@MIT.EDU>
Received: from fort-point-station.mit.edu (FORT-POINT-STATION.MIT.EDU
[18.7.7.76])
by suchdamage.org (Postfix) with ESMTP id 9D60113230
for <hartmans@suchdamage.org>; Mon, 2 Feb 2004 16:13:19 -0500 (EST)
Received: from pch.mit.edu (PCH.MIT.EDU [18.7.21.90])
by fort-point-station.mit.edu (8.12.4/8.9.2) with ESMTP id i12LDEtF006534;
Mon, 2 Feb 2004 16:13:14 -0500 (EST)
Received: from pch.mit.edu (localhost [127.0.0.1])
by pch.mit.edu (8.12.8p2/8.12.8) with ESMTP id i12LCuqe012762;
Mon, 2 Feb 2004 16:13:08 -0500 (EST)
Received: from fort-point-station.mit.edu (FORT-POINT-STATION.MIT.EDU
[18.7.7.76])
by pch.mit.edu (8.12.8p2/8.12.8) with ESMTP id i12GBCqb003937
for <kerberos@PCH.mit.edu>; Mon, 2 Feb 2004 11:11:12 -0500 (EST)
Received: from ra.nrl.navy.mil (ra.nrl.navy.mil [132.250.1.121])
i12GBB7M001041
for <kerberos@MIT.EDU>; Mon, 2 Feb 2004 11:11:11 -0500 (EST)
Received: (from news@localhost)
by ra.nrl.navy.mil (8.11.7p1+Sun/8.11.7) id i12Fx8X26662
for kerberos@MIT.EDU; Mon, 2 Feb 2004 10:59:08 -0500 (EST)
X-Newsgroups: comp.protocols.kerberos
From: Ken Weaverling <weave@spamcop.net>
Message-ID: <Xns94836F3DC782Dweavespamcopnet@216.196.97.131>
Date: Mon, 02 Feb 2004 09:57:02 -0600
To: kerberos@mit.edu
X-Mailman-Approved-At: Mon, 02 Feb 2004 16:11:12 -0500
Subject: malloc hang inside krb5_sendto_kdc
X-BeenThere: kerberos@mit.edu
X-Mailman-Version: 2.1
Precedence: list
List-Id: The Kerberos Authentication System Mailing List <kerberos.mit.edu>
List-Help: <mailto:kerberos-request@mit.edu?subject=help>
List-Post: <mailto:kerberos@mit.edu>
List-Subscribe: <https://mailman.mit.edu/mailman/listinfo/kerberos>,
<mailto:kerberos-request@mit.edu?subject=subscribe>
List-Archive: <http://mailman.mit.edu/pipermail/kerberos>
List-Unsubscribe: <https://mailman.mit.edu/mailman/listinfo/kerberos>,
<mailto:kerberos-request@mit.edu?subject=unsubscribe>
Sender: kerberos-bounces@MIT.EDU
Errors-To: kerberos-bounces@MIT.EDU
X-Spam-Status: No, hits=-5.3 required=5.0
tests=BAYES_10
version=2.55
X-Spam-Level:
X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp)
MIME-Version: 1.0

I'm having some weird kerberos authentication issues since upgrading a
Redhat box from 7.3 to RHEL 3. imap authenticates against a windows 2000
kerberos server. That worked under 7.3 for well over a year on a fairly
heavy loaded box (~300 imap connections open, a few new connects a
second).

Since upgrading to RHEL 3, a few times a day an imap process will go
into a CPU loop and consume all resources and sometimes other processes,
such as our ldap server and apache server will hang until that imap
process is killed.

Attaching to the processes always indicates the hang is within malloc()
and always being called from krb5_sendto_kdc. The loop is somewhere
within malloc. The function never returns.

A sample backtrace...

#0 0xb735c164 in malloc_consolidate () from /lib/tls/libc.so.6
#1 0xb735b769 in _int_malloc () from /lib/tls/libc.so.6
#2 0xb735ab0d in malloc () from /lib/tls/libc.so.6
#3 0xb75ad622 in krb5_sendto_kdc (context=0x1, message=0x81214a8,
realm=0x1,
reply=0xbfffb510, use_master=1) at sendto_kdc.c:97
#4 0xb75961f3 in send_as_request (context=0x8117ba0, request=
0xbfffb5d0,
time_now=0xbfffb510, ret_err_reply=0xbfffb594, ret_as_reply=
0xbfffb598,
use_master=1) at get_in_tkt.c:117
#5 0xb7597420 in krb5_get_init_creds (context=0x8117ba0, creds=
0x81235dc,
client=0x811b9c8, prompter=0, prompter_data=0x0, start_time=0,
in_tkt_service=0x0, options=0x811b934,
gak_fct=0xb7597e20 <krb5_get_as_key_password>, gak_data=0xbfffc310,
use_master=1, as_reply=0xbfffb6bc) at get_in_tkt.c:946
#6 0xb7598877 in krb5_get_init_creds_password (context=0x8117ba0,
creds=0x81235dc, client=0x811b9c8, password=0x8116b50 "", prompter=
0,
data=0x0, start_time=0, in_tkt_service=0x0, options=0x811b934)
at gic_pwd.c:156
#7 0xb729d557 in pam_sm_authenticate () from /lib/security/pam_krb5.so
#8 0xb75d2c06 in pam_fail_delay () from /lib/libpam.so.0
#9 0xb75d2d81 in _pam_dispatch () from /lib/libpam.so.0
#10 0xb75d4858 in pam_authenticate () from /lib/libpam.so.0
#11 0x08062323 in server_input_wait ()
#12 0x0805bd55 in server_input_wait ()
#13 0x0805bfc6 in server_input_wait ()
#14 0x0805b225 in auth_plain_server ()
#15 0x08072b77 in mail_thread_compare_date ()
#16 0x080500c5 in ?? ()
#17 0x08103f2f in cmdbuf ()
#18 0x08056110 in fetch_rfc822_text ()
#19 0xb72ff748 in __libc_start_main () from /lib/tls/libc.so.6
#20 0x0804c9f1 in ?? ()

Line 97 in sendto_kdc.c is:

94 for (i = 0; i < naddr; i++)
95 socklist[i] = INVALID_SOCKET;
96
97 if (!(reply->data = malloc(krb5_max_dgram_size))) {
98 krb5_xfree(addr);
99 krb5_xfree(socklist);
100 return ENOMEM;
101 }

This is krb5-libs-1.2.7-19 btw...

I do have one of four domain controllers running 2003 server, but
krb5.conf points to the 2000 server. We tried pointing to 2003 server
but it fails at times due to the tcp issue which I've read is fixed in
1.3, which is why we aren't upgrading them all right now.

So .... is this a known bug? I've read some stuff that if a program
clobbers malloc'ed memory it can sometimes exhibit a hang in
_malloc_consolidate.

Any hints on next steps I can take? I have an open support call with
redhat for the past two weeks but it's not moving very quickly. :(

thx
Show quoted text
________________________________________________
Kerberos mailing list Kerberos@mit.edu
https://mailman.mit.edu/mailman/listinfo/kerberos
Date: Thu, 5 Feb 2004 11:48:45 -0500 (EST)
From: Ezra Peisach <epeisach@MIT.EDU>
To: rt-comment@krbdev.mit.edu
Subject: Re: [krbdev.mit.edu #2197] New glibc threading interacts badly with
RT-Send-Cc:

If you are talking about redhat 9 - then you really need to make sure
you are using the latest glibc. If not, do not compile with shared libraries.

I discovered that there was some broken glibc 2.0 fopen interface
compatibility issues in glibc - which I finally got them to patch. The
bug stems from the fact that they decided to have two incompatible file
structures in glibc. It would be ok - except that they decided to keep
an old (smaller) and a new (larger) structure around. The fopen code
would assume a pointer to the the new structure was sent in write past
the end of the old structures. Tickling the bug was sort of
strange. Depended on how you made a shared library, and the fopen had to
be present in the shared library as well.

So - which glibc?

glibc-2.3.2-27.9.7 has the stdio compatibilty fixes in them.

Ezra
To: rt-comment@krbdev.mit.edu, epeisach@mit.edu
Subject: Re: [krbdev.mit.edu #2197] New glibc threading interacts badly with
From: Sam Hartman <hartmans@mit.edu>
Date: Thu, 05 Feb 2004 13:58:31 -0500
RT-Send-Cc:
ssh quiche rpm -q glibc
glibc-2.3.2-27.9.7
From: Steven Mohl <steve.mohl@veritas.com>
To: "'krb5-bugs@mit.edu'" <krb5-bugs@mit.edu>
Date: Fri, 29 Oct 2004 14:35:22 -0700
Cc: LIST-AST bmr-dev <bmr-dev@tkg.com>
Subject: [krbdev.mit.edu #2197] Hang in malloc_consolidate() on RedHat 9 running krb5-1.2.x
RT-Send-Cc:
Download (untitled) / with headers
text/plain 1.3KiB
Hello,
    We have just hit this problem and were wondering if a solution exists. The machine exhibiting it is:
 
Linux rh160.lab.tkg.com 2.4.21-4.EL #1 Fri Oct 3 17:52:56 EDT 2003 i686 athlon i386 GNU/Linux
 
The stack backtrace is:
 
(gdb) bt
#0  0xb63eb421 in malloc_consolidate () from /lib/tls/libc.so.6
#1  0xb63eaa59 in _int_malloc () from /lib/tls/libc.so.6
#2  0xb63e9dfd in malloc () from /lib/tls/libc.so.6
#3  0xb659893d in __builtin_new (sz=663) from /usr/openv/lib/libvxstlportST.so
#4  0xb6598a80 in __builtin_vec_new (sz=663)
   from /usr/openv/lib/libvxstlportST.so
#5  0x0805933a in BareMetalRestore::CppStrDup (
    string=0x8c926f0 "Name:/usr/openv/netbackup/baremetal/client/data/savcf1SsNV6\nOS:linux\nOSLevel:3.0\nBMRVersion:6.0Alpha\nCommand:[ -x /sbin/lvdisplay ] && lvdisplay %s 2>/dev/null\nOutputType:48\nKey1:/dev/Volume03/LogVol0"...)
    at ParseCommon.cpp:910
#6  0x08056684 in BareMetalRestore::ParseClientInformation (
    vFileContents=@0xbfffbc80, bmr=@0xbfffbc90)
    at ParseClientInformation.cpp:419
#7  0x08055f56 in bmrmain (argc=91, argv=0xbfffc7c4) at bic.cpp:235
#8  0x0805569b in main (ac=91, av=0xbfffc7c4) at bic.cpp:96
 
Any assistance you can provide is greatly appreciated.
 
Thanks very much,
Steve Mohl - VERITAS Bare Metal Restore development
Probably fixed by one of several memory management bugfixes. Closing.