From kenh@cmf.nrl.navy.mil Wed Mar 18 20:00:05 1998
Received: from MIT.EDU (PACIFIC-CARRIER-ANNEX.MIT.EDU [18.69.0.28]) by rt-11.MIT.EDU (8.7.5/8.7.3) with SMTP id UAA24791 for <bugs@RT-11.MIT.EDU>; Wed, 18 Mar 1998 20:00:00 -0500
Received: from ginger.cmf.nrl.navy.mil by MIT.EDU with SMTP
id AA20821; Wed, 18 Mar 98 20:00:32 EST
Received: from elvis.cmf.nrl.navy.mil (kenh@elvis.cmf.nrl.navy.mil [134.207.10.38])
by ginger.cmf.nrl.navy.mil (8.8.5/8.8.5) with ESMTP id TAA28453
for <krb5-bugs@mit.edu>; Wed, 18 Mar 1998 19:59:58 -0500 (EST)
Received: (from kenh@localhost)
by elvis.cmf.nrl.navy.mil (8.8.5/8.8.5) id TAA05879;
Wed, 18 Mar 1998 19:59:55 -0500 (EST)
Message-Id: <199803190059.TAA05879@elvis.cmf.nrl.navy.mil>
Date: Wed, 18 Mar 1998 19:59:55 -0500 (EST)
From: Ken Hornstein <kenh@cmf.nrl.navy.mil>
Reply-To: kenh@cmf.nrl.navy.mil
To: krb5-bugs@MIT.EDU
Subject: Subtle bug in replay cache code
X-Send-Pr-Version: 3.99
System: SunOS elvis 4.1.4 6 sun4c
Architecture: sun4
up on long-running daemons (such as the KDC).
If you check the directory that holds the replay cache for your KDC, you
will probably notice a WHOLE bunch of files named something like:
krb5_RC84act
krb5_RC84acu
krb5_RC84acv
krb5_RC84acw
krb5_RC84acx
krb5_RC84acy
krb5_RC84acz
krb5_RC84ada
And over time, these will eventually fill up the partition holding your
replay cache, and then bad things start to happen.
The bug here is fairly subtle, and was introduced by one of the PR's
that I submitted a while ago (but I can't take all of the blame; my
PR was based on a patch that jik sent to me :-) ). After a while
replay caches would grow really large, and daemons that only did
authentication once (such as telnetd, rlogind, etc etc) would literally
spend minutes trying to recover the cache. However, since recover
discards expired entries, the there never was a case where krb5_rc_store()
would cause the replay cache to be expunged.
The original patch counted the number of expired entries, and if they
grew past some threshold would cause the cache to be recovered. That's
a good thing. _However_, this causes a problem for expunges that
are triggered by krb5_rc_store(). In this case, the first thing that
the krb5_rc_dfl_expunge() does is close the cache and recover it ...
and that recover calls krb5_rc_dfl_expunge() _itself_, and this confuses
things to the point where the temporary file used by the first call
to expunge is left around.
a single argument (such as "test"), and watch the temporary files get
created.
#include <stdio.h>
#include <krb5.h>
#include <com_err.h>
int
main(int argc, char *argv[])
{
krb5_context context;
krb5_error_code code;
krb5_rcache rc;
krb5_deltat d = 5*60;
char buf[500];
krb5_donot_replay rep;
rep.server = "host/foo.bar@BAR";
rep.client = "foo@BAR";
rep.cusec = 0;
rep.ctime = 99999;
krb5_init_context(&context);
rc = (krb5_rcache) malloc(sizeof(*rc));
krb5_rc_resolve_type(context, &rc, krb5_rc_default_type(context));
if ((code = krb5_rc_resolve(context, rc, argv[1])) != 0) {
com_err("rc_resolve", code, "");
exit(1);
}
if ((code = krb5_rc_initialize(context, rc, d)) != 0) {
com_err("rc_initialize", code, "");
exit(1);
}
while ((code = krb5_rc_store(context, rc, &rep)) == 0)
rep.cusec++;
com_err("rc_store", code, "");
exit(0);
}
This seems to solve the problem. Basically, an extra structure field is
introduced (similar to "recovering"), which prevents the second call to
krb5_rc_dfl_expunge().
cvs diff: Diffing .
Index: rc_dfl.c
===================================================================
RCS file: /afs/cmf/project/cvsroot/krb5/lib/krb5/rcache/rc_dfl.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 rc_dfl.c
--- rc_dfl.c 1997/06/02 21:57:35 1.1.1.1
+++ rc_dfl.c 1998/03/19 00:22:30
@@ -115,6 +115,7 @@
krb5_rc_iostuff d;
#endif
char recovering;
+ char expunging;
}
;
@@ -287,6 +288,7 @@
t->d.fd = -1;
#endif
t->recovering = 0;
+ t->expunging = 0;
return 0;
cleanup:
@@ -459,7 +461,7 @@
krb5_rc_free_entry(context, &rep);
if (retval)
krb5_rc_io_close(context, &t->d);
- else if (expired_entries > EXCESSREPS)
+ else if (! t->expunging && expired_entries > EXCESSREPS)
retval = krb5_rc_dfl_expunge(context, id);
t->recovering = 0;
return retval;
@@ -578,10 +580,12 @@
free(name);
if (retval)
return retval;
+ ((struct dfl_data *)id->data)->expunging = 1;
retval = krb5_rc_dfl_recover(context, id);
if (retval)
return retval;
t = (struct dfl_data *)id->data; /* point to recovered cache */
+ t->expunging = 0;
}
tmp = (krb5_rcache) malloc(sizeof(*tmp));
State-Changed-From-To: open-closed
State-Changed-By: tlyu
State-Changed-When: Fri Sep 14 19:25:30 2001
State-Changed-Why:
Dealt with some time ago; we've also fixed up the replay cache code a
little since then. Of course it's still an unholy mess, but we'll get
to a real fix at some point.
Received: from MIT.EDU (PACIFIC-CARRIER-ANNEX.MIT.EDU [18.69.0.28]) by rt-11.MIT.EDU (8.7.5/8.7.3) with SMTP id UAA24791 for <bugs@RT-11.MIT.EDU>; Wed, 18 Mar 1998 20:00:00 -0500
Received: from ginger.cmf.nrl.navy.mil by MIT.EDU with SMTP
id AA20821; Wed, 18 Mar 98 20:00:32 EST
Received: from elvis.cmf.nrl.navy.mil (kenh@elvis.cmf.nrl.navy.mil [134.207.10.38])
by ginger.cmf.nrl.navy.mil (8.8.5/8.8.5) with ESMTP id TAA28453
for <krb5-bugs@mit.edu>; Wed, 18 Mar 1998 19:59:58 -0500 (EST)
Received: (from kenh@localhost)
by elvis.cmf.nrl.navy.mil (8.8.5/8.8.5) id TAA05879;
Wed, 18 Mar 1998 19:59:55 -0500 (EST)
Message-Id: <199803190059.TAA05879@elvis.cmf.nrl.navy.mil>
Date: Wed, 18 Mar 1998 19:59:55 -0500 (EST)
From: Ken Hornstein <kenh@cmf.nrl.navy.mil>
Reply-To: kenh@cmf.nrl.navy.mil
To: krb5-bugs@MIT.EDU
Subject: Subtle bug in replay cache code
X-Send-Pr-Version: 3.99
Show quoted text
>Number: 564
>Category: krb5-libs
>Synopsis: replay cache code can call expunge twice, which hoses things up
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: krb5-unassigned
>State: closed
>Class: sw-bug
>Submitter-Id: unknown
>Arrival-Date: Wed Mar 18 20:01:01 EST 1998
>Last-Modified: Fri Sep 14 19:26:17 EDT 2001
>Originator: Ken Hornstein
>Organization:
Navel Research Laboratory>Category: krb5-libs
>Synopsis: replay cache code can call expunge twice, which hoses things up
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: krb5-unassigned
>State: closed
>Class: sw-bug
>Submitter-Id: unknown
>Arrival-Date: Wed Mar 18 20:01:01 EST 1998
>Last-Modified: Fri Sep 14 19:26:17 EDT 2001
>Originator: Ken Hornstein
>Organization:
Show quoted text
>Release: 1.0pl5
>Environment:
>Environment:
System: SunOS elvis 4.1.4 6 sun4c
Architecture: sun4
Show quoted text
>Description:
There is a problem with the replay cache code that only happens to showup on long-running daemons (such as the KDC).
If you check the directory that holds the replay cache for your KDC, you
will probably notice a WHOLE bunch of files named something like:
krb5_RC84act
krb5_RC84acu
krb5_RC84acv
krb5_RC84acw
krb5_RC84acx
krb5_RC84acy
krb5_RC84acz
krb5_RC84ada
And over time, these will eventually fill up the partition holding your
replay cache, and then bad things start to happen.
The bug here is fairly subtle, and was introduced by one of the PR's
that I submitted a while ago (but I can't take all of the blame; my
PR was based on a patch that jik sent to me :-) ). After a while
replay caches would grow really large, and daemons that only did
authentication once (such as telnetd, rlogind, etc etc) would literally
spend minutes trying to recover the cache. However, since recover
discards expired entries, the there never was a case where krb5_rc_store()
would cause the replay cache to be expunged.
The original patch counted the number of expired entries, and if they
grew past some threshold would cause the cache to be recovered. That's
a good thing. _However_, this causes a problem for expunges that
are triggered by krb5_rc_store(). In this case, the first thing that
the krb5_rc_dfl_expunge() does is close the cache and recover it ...
and that recover calls krb5_rc_dfl_expunge() _itself_, and this confuses
things to the point where the temporary file used by the first call
to expunge is left around.
Show quoted text
>How-To-Repeat:
Here's sample code that illustrates this problem. Run this program witha single argument (such as "test"), and watch the temporary files get
created.
#include <stdio.h>
#include <krb5.h>
#include <com_err.h>
int
main(int argc, char *argv[])
{
krb5_context context;
krb5_error_code code;
krb5_rcache rc;
krb5_deltat d = 5*60;
char buf[500];
krb5_donot_replay rep;
rep.server = "host/foo.bar@BAR";
rep.client = "foo@BAR";
rep.cusec = 0;
rep.ctime = 99999;
krb5_init_context(&context);
rc = (krb5_rcache) malloc(sizeof(*rc));
krb5_rc_resolve_type(context, &rc, krb5_rc_default_type(context));
if ((code = krb5_rc_resolve(context, rc, argv[1])) != 0) {
com_err("rc_resolve", code, "");
exit(1);
}
if ((code = krb5_rc_initialize(context, rc, d)) != 0) {
com_err("rc_initialize", code, "");
exit(1);
}
while ((code = krb5_rc_store(context, rc, &rep)) == 0)
rep.cusec++;
com_err("rc_store", code, "");
exit(0);
}
Show quoted text
>Fix:
This seems to solve the problem. Basically, an extra structure field is
introduced (similar to "recovering"), which prevents the second call to
krb5_rc_dfl_expunge().
cvs diff: Diffing .
Index: rc_dfl.c
===================================================================
RCS file: /afs/cmf/project/cvsroot/krb5/lib/krb5/rcache/rc_dfl.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 rc_dfl.c
--- rc_dfl.c 1997/06/02 21:57:35 1.1.1.1
+++ rc_dfl.c 1998/03/19 00:22:30
@@ -115,6 +115,7 @@
krb5_rc_iostuff d;
#endif
char recovering;
+ char expunging;
}
;
@@ -287,6 +288,7 @@
t->d.fd = -1;
#endif
t->recovering = 0;
+ t->expunging = 0;
return 0;
cleanup:
@@ -459,7 +461,7 @@
krb5_rc_free_entry(context, &rep);
if (retval)
krb5_rc_io_close(context, &t->d);
- else if (expired_entries > EXCESSREPS)
+ else if (! t->expunging && expired_entries > EXCESSREPS)
retval = krb5_rc_dfl_expunge(context, id);
t->recovering = 0;
return retval;
@@ -578,10 +580,12 @@
free(name);
if (retval)
return retval;
+ ((struct dfl_data *)id->data)->expunging = 1;
retval = krb5_rc_dfl_recover(context, id);
if (retval)
return retval;
t = (struct dfl_data *)id->data; /* point to recovered cache */
+ t->expunging = 0;
}
tmp = (krb5_rcache) malloc(sizeof(*tmp));
Show quoted text
>Audit-Trail:
State-Changed-From-To: open-closed
State-Changed-By: tlyu
State-Changed-When: Fri Sep 14 19:25:30 2001
State-Changed-Why:
Dealt with some time ago; we've also fixed up the replay cache code a
little since then. Of course it's still an unholy mess, but we'll get
to a real fix at some point.
Show quoted text
>Unformatted: