Skip Menu |
 

Subject: krb5-1.7 hangs on Solaris
Date: Tue, 22 Sep 2009 13:41:49 -0400
From: "Arlene Berry" <aberry@likewise.com>
To: <krb5-bugs@mit.edu>
Download (untitled) / with headers
text/plain 1.4KiB
Download (untitled) / with headers
text/html 10.9KiB

We’re seeing hangs on Solaris because krb5-1.7 uses res_init and res_search from multiple threads. On Solaris, those functions are not thread safe.  To fix it we changed krb5int_dns_init to call res_init only once and added a mutex around res_search:

 

Index: src/lib/krb5/os/dnsglue.c

===================================================================

--- src/lib/krb5/os/dnsglue.c (revision 37274)

+++ src/lib/krb5/os/dnsglue.c (working copy)

@@ -63,6 +63,10 @@

 static int initparse(struct krb5int_dns_state *);

 #endif

 

+#if !USE_RES_NINIT

+static k5_mutex_t dns_res_lock = K5_MUTEX_PARTIAL_INITIALIZER;

+#endif

+

 /*

  * krb5int_dns_init()

  *

@@ -102,12 +106,24 @@

 #if USE_RES_NINIT

     memset(&statbuf, 0, sizeof(statbuf));

     ret = res_ninit(&statbuf);

-#else

-    ret = res_init();

-#endif

     if (ret < 0)

      return -1;

+#else

+    if (!(_res.options & RES_INIT))

+    {

+     ret = res_init();

+     if (ret < 0)

+         return -1;

 

+     ret = k5_mutex_finish_init(&dns_res_lock);

+     if (ret < 0)

+         return ret;

+    }

+    ret = k5_mutex_lock(&dns_res_lock);

+    if (ret < 0)

+     return ret;

+#endif

+

     do {

      p = (ds->ansp == NULL)

          ? malloc(nextincr) : realloc(ds->ansp, nextincr);

@@ -152,6 +168,8 @@

 errout:

 #if USE_RES_NINIT

     res_ndestroy(&statbuf);

+#else

+    k5_mutex_unlock(&dns_res_lock);

 #endif

     if (ret < 0) {

      if (ds->ansp != NULL) {

 

 

To: rt@krbdev.MIT.EDU
Subject: Re: [krbdev.mit.edu #6569] krb5-1.7 hangs on Solaris
From: Tom Yu <tlyu@MIT.EDU>
Date: Tue, 22 Sep 2009 16:06:53 -0400
RT-Send-Cc:
" Arlene Berry via RT" <rt-comment@krbdev.mit.edu> writes:

Show quoted text
> We're seeing hangs on Solaris because krb5-1.7 uses res_init and
> res_search from multiple threads. On Solaris, those functions are not
> thread safe. To fix it we changed krb5int_dns_init to call res_init
> only once and added a mutex around res_search:

Interesting... what release of Solaris? I thought that the
thread-safe res_ninit() were available on modern Solaris releases.
Subject: [krbdev.mit.edu #6569] krb5-1.7 hangs on Solaris
Date: Tue, 22 Sep 2009 17:42:48 -0400
From: "Arlene Berry" <aberry@likewise.com>
To: <rt@krbdev.mit.edu>
RT-Send-Cc:

We build on Solaris 8 because we support Solaris 8 and up.  I checked and HAVE_RES_NINIT and HAVE_RES_NSEARCH are defined but not HAVE_RES_NDESTROY.  All three are required for dnsglue.c to set USE_RES_NINIT.

From: Ken Raeburn <raeburn@MIT.EDU>
To: rt@krbdev.mit.edu
Subject: Re: [krbdev.mit.edu #6569] krb5-1.7 hangs on Solaris
Date: Tue, 22 Sep 2009 19:28:04 -0400
RT-Send-Cc:
Download (untitled) / with headers
text/plain 2.5KiB
Finding a thread-safe way of using the resolver would be better that
locking around it within the Kerberos library, since the application
may use it in other threads at the same time. It looks to me like
there should be a workaround, thanks to Sun, but other OSes still
using such old versions of BIND (are there any such that we still care
about?) may still have problems. I'd consider using Sun's helper
functions, if the early Solaris 8 releases are to be supported...

Some possibly useful references:

http://opensolaris.org/sc/src/iser/iser-on/usr/src/lib/nsswitch/dns/common/dns_mt.c

Several comments discussing thread safety in various versions of BIND
and the Solaris modifications to it, starting with BIND 8.1.2.
Apparently Sun's modifications included functions that "enabled and
disabled MT mode per-thread" for BIND 8.1.2, but weren't needed later
in 8.2.2. I'm not sure if that means the new BIND code was entirely
thread-safe, or just some parts that Sun cared about.

http://docs.sun.com/app/docs/doc/806-7502/6jgce020l?a=view

"Berkeley Internet Name Domain (BIND) has migrated from version 8.1.2
to 8.2.2 in the Solaris 8 4/01 release." So before that they were
using 8.1.2; with the 4/1 release and later, BIND should apparently be
thread-safe.

http://developers.sun.com/solaris/articles/using_etc_release.html

There were a few releases of "Solaris 8" before the 4/1 release, as
well as a few after.

http://www.cert.org/advisories/CA-2001-02.html

The info from Sun in the vendor section suggests that BIND 8.1.2 was
used back in Solaris 7, so I'm guessing it may have been in the
initial Solaris 8 release. I could be wrong, if it was put in as a
later update to both 7 and 8.

http://www.ops.ietf.org/lists/namedroppers/namedroppers.199x/msg03798.html

BIND 8.2 added the "nearly-thread-safe resolver API", so if there are
thread safety issues it seems likely Arlene is using 8.1.2. (And 8.2
is also when getaddrinfo was added, so our fake one is probably being
used still.)

Ken

P.S. There's also the route of just saying, "We assume X, Y, and Z
facilities on your OS are thread-safe; if not, then the Kerberos
libraries will not be thread-safe either." And encourage the use of
OS updates that implement thread safety for routines POSIX says should
be thread-safe. In early NetBSD releases, we knew this was an issue
around getaddrinfo(), for example. In this case, there are OS updates
available that fix the thread safety problem, and updating to Solaris
9 or 10 isn't even required. But it may still be too much of an
imposition....
From: Arlene Berry <aberry@likewise.com>
To: "krb5-bugs@mit.edu" <krb5-bugs@MIT.EDU>
Subject: RE: [krbdev.mit.edu #6569] krb5-1.7 hangs on Solaris
Date: Fri, 22 Jul 2011 17:39:17 +0000
RT-Send-Cc:
Our current patch is attached. We've added a test for res_ninit because we found it crashes on AIX 6.1. Also, Solaris has res_ninit and res_nclose but not res_ndestroy and we've modified it to use either.
Download rt6569.diff
application/octet-stream 2.4KiB

Message body not shown because it is not plain text.