Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: binary MIME-Version: 1.0 X-Mailer: MIME-tools 5.411 (Entity 5.404) X-RT-Original-Encoding: iso-8859-1 Content-Length: 16097 From jik@cam.ov.com Wed Oct 23 13:45:40 1996 Received: from MIT.EDU (SOUTH-STATION-ANNEX.MIT.EDU [18.72.1.2]) by rt-11.MIT.EDU (8.7.5/8.7.3) with SMTP id NAA19461 for ; Wed, 23 Oct 1996 13:45:40 -0400 Received: from pad-thai.cam.ov.com by MIT.EDU with SMTP id AA05828; Wed, 23 Oct 96 13:40:33 EDT Received: from gza-client1.cam.ov.com by pad-thai.cam.ov.com (8.7.5/) with SMTP id ; Wed, 23 Oct 1996 17:40:24 GMT Received: by gza-client1.cam.ov.com (8.6.10/4.7) id NAA22248; Wed, 23 Oct 1996 13:40:23 -0400 Message-Id: <199610231740.NAA22248@gza-client1.cam.ov.com> Date: Wed, 23 Oct 1996 13:40:23 -0400 From: jik@cam.ov.com (Jonathan I. Kamens) To: kenh@cmf.nrl.navy.mil (Ken Hornstein) Cc: kerberos@MIT.EDU, krb5-bugs@MIT.EDU In-Reply-To: <199610231628.MAA01236@ginger.cmf.nrl.navy.mil> Subject: Re: Replay cache hell >Number: 132 >Category: krb5-libs >Synopsis: Re: Replay cache hell >Confidential: yes >Severity: serious >Priority: medium >Responsible: krb5-unassigned >State: closed >Class: sw-bug >Submitter-Id: unknown >Arrival-Date: Wed Oct e 13:46:00 EDT 1996 >Last-Modified: Mon Nov 18 22:30:40 EST 1996 >Originator: >Organization: >Release: >Environment: >Description: >How-To-Repeat: >Fix: >Audit-Trail: Responsible-Changed-From-To: gnats-admin->krb5-unassigned Responsible-Changed-By: bjaspan Responsible-Changed-When: Wed Nov 13 15:56:18 1996 Responsible-Changed-Why: categorizing From: "Barry Jaspan" To: jik@cam.ov.com Cc: kenh@cmf.nrl.navy.mil, kerberos@MIT.EDU, krb5-bugs@MIT.EDU Subject: Re: pending/132: Re: Replay cache hell Date: Wed, 13 Nov 1996 20:55:35 GMT From: Ken Hornstein To: krb5-bugs@MIT.EDU Subject: Fix for infinitely growing replay cache The replay cache code automatically prunes the replay cache if too many entries are in a hash bucket. This works great for long-living daemons like the KDC. However, for daemons that start up once, do one authentication, then exit, the replay cache will grow without bounds. Eventually noticable delays will incur as each invokation of the daemon requires it to read in the entire replay cache. Retrieve your mail via KPOP, and notice over a period of time that it keeps taking longer and longer ... This is a rework of the patch that Jonathan Kamens posted a little while back. His patch was for beta 4 (I think) -- this is done up for beta 7. I've been running with this for a couple of weeks now with no ill effects. I just wanted to resend this so anyone who needed a fix could put it in place now. --- lib/krb5/rcache/rc_dfl.c.orig Mon Nov 27 15:51:53 1995 +++ lib/krb5/rcache/rc_dfl.c Wed Oct 23 15:19:34 1996 @@ -54,7 +54,11 @@ of live krb5_donot_replays by EXCESSREPS. With the defaults here, a typical cache might build up some 10K of expired krb5_donot_replays before an automatic expunge, with the waste basically independent of the number of stores per -minute. */ +minute. + +The rcache will also automatically be expunged when it encounters more +than EXCESSREPS expired entries when recovering a cache in +dfl_recover. */ static int hash(rep, hsize) krb5_donot_replay *rep; @@ -110,6 +114,7 @@ #ifndef NOIOSTUFF krb5_rc_iostuff d; #endif + char recovering; } ; @@ -281,6 +286,7 @@ #ifndef NOIOSTUFF t->d.fd = -1; #endif + t->recovering = 0; return 0; cleanup: @@ -388,12 +394,15 @@ #else struct dfl_data *t = (struct dfl_data *)id->data; - krb5_donot_replay *rep; + krb5_donot_replay *rep = 0; krb5_error_code retval; long max_size; + int expired_entries = 0; if ((retval = krb5_rc_io_open(context, &t->d, t->name))) return retval; + + t->recovering = 1; max_size = krb5_rc_io_size(context, &t->d); @@ -429,6 +438,8 @@ if (rc_store(context, id, rep) == CMP_MALLOC) { retval = KRB5_RC_MALLOC; goto io_fail; } + } else { + expired_entries++; } /* * free fields allocated by rc_io_fetch @@ -448,6 +459,9 @@ krb5_rc_free_entry(context, &rep); if (retval) krb5_rc_io_close(context, &t->d); + else if (expired_entries > EXCESSREPS) + retval = krb5_rc_dfl_expunge(context, id); + t->recovering = 0; return retval; #endif @@ -461,7 +475,7 @@ { int clientlen, serverlen, len; char *buf, *ptr; - unsigned long ret; + krb5_error_code ret; clientlen = strlen (rep->client) + 1; serverlen = strlen (rep->server) + 1; @@ -488,7 +502,7 @@ krb5_rcache id; krb5_donot_replay *rep; { - unsigned long ret; + krb5_error_code ret; struct dfl_data *t = (struct dfl_data *)id->data; switch(rc_store(context, id,rep)) { @@ -556,17 +570,20 @@ krb5_rcache tmp; krb5_deltat lifespan = t->lifespan; /* save original lifespan */ - name = t->name; - t->name = 0; /* Clear name so it isn't freed */ - (void) krb5_rc_dfl_close_no_free(context, id); - retval = krb5_rc_dfl_resolve(context, id, name); - free(name); - if (retval) - return retval; - retval = krb5_rc_dfl_recover(context, id); - if (retval) - return retval; - t = (struct dfl_data *)id->data; /* point to recovered cache */ + if (! t->recovering) { + name = t->name; + t->name = 0; /* Clear name so it isn't freed */ + (void) krb5_rc_dfl_close_no_free(context, id); + retval = krb5_rc_dfl_resolve(context, id, name); + free(name); + if (retval) + return retval; + retval = krb5_rc_dfl_recover(context, id); + if (retval) + return retval; + t = (struct dfl_data *)id->data; /* point to recovered cache */ + } + tmp = (krb5_rcache) malloc(sizeof(*tmp)); if (!tmp) return ENOMEM; From: "Barry Jaspan" To: jik@cam.ov.com Cc: kenh@cmf.nrl.navy.mil, kerberos@MIT.EDU, krb5-bugs@MIT.EDU Subject: Re: krb5-libs/132: Re: Replay cache hell Date: Wed, 13 Nov 1996 21:04:42 GMT [ My previous attempt at appending this info to krb5-libs/132 does not seem to have worked. ] The replay cache code automatically prunes the replay cache if too many entries are in a hash bucket. This works great for long-living daemons like the KDC. However, for daemons that start up once, do one authentication, then exit, the replay cache will grow without bounds. Eventually noticable delays will incur as each invokation of the daemon requires it to read in the entire replay cache. Retrieve your mail via KPOP, and notice over a period of time that it keeps taking longer and longer ... This is a rework of the patch that Jonathan Kamens posted a little while back. His patch was for beta 4 (I think) -- this is done up for beta 7. I've been running with this for a couple of weeks now with no ill effects. I just wanted to resend this so anyone who needed a fix could put it in place now. --- lib/krb5/rcache/rc_dfl.c.orig Mon Nov 27 15:51:53 1995 +++ lib/krb5/rcache/rc_dfl.c Wed Oct 23 15:19:34 1996 @@ -54,7 +54,11 @@ of live krb5_donot_replays by EXCESSREPS. With the defaults here, a typical cache might build up some 10K of expired krb5_donot_replays before an automatic expunge, with the waste basically independent of the number of stores per -minute. */ +minute. + +The rcache will also automatically be expunged when it encounters more +than EXCESSREPS expired entries when recovering a cache in +dfl_recover. */ static int hash(rep, hsize) krb5_donot_replay *rep; @@ -110,6 +114,7 @@ #ifndef NOIOSTUFF krb5_rc_iostuff d; #endif + char recovering; } ; @@ -281,6 +286,7 @@ #ifndef NOIOSTUFF t->d.fd = -1; #endif + t->recovering = 0; return 0; cleanup: @@ -388,12 +394,15 @@ #else struct dfl_data *t = (struct dfl_data *)id->data; - krb5_donot_replay *rep; + krb5_donot_replay *rep = 0; krb5_error_code retval; long max_size; + int expired_entries = 0; if ((retval = krb5_rc_io_open(context, &t->d, t->name))) return retval; + + t->recovering = 1; max_size = krb5_rc_io_size(context, &t->d); @@ -429,6 +438,8 @@ if (rc_store(context, id, rep) == CMP_MALLOC) { retval = KRB5_RC_MALLOC; goto io_fail; } + } else { + expired_entries++; } /* * free fields allocated by rc_io_fetch @@ -448,6 +459,9 @@ krb5_rc_free_entry(context, &rep); if (retval) krb5_rc_io_close(context, &t->d); + else if (expired_entries > EXCESSREPS) + retval = krb5_rc_dfl_expunge(context, id); + t->recovering = 0; return retval; #endif @@ -461,7 +475,7 @@ { int clientlen, serverlen, len; char *buf, *ptr; - unsigned long ret; + krb5_error_code ret; clientlen = strlen (rep->client) + 1; serverlen = strlen (rep->server) + 1; @@ -488,7 +502,7 @@ krb5_rcache id; krb5_donot_replay *rep; { - unsigned long ret; + krb5_error_code ret; struct dfl_data *t = (struct dfl_data *)id->data; switch(rc_store(context, id,rep)) { @@ -556,17 +570,20 @@ krb5_rcache tmp; krb5_deltat lifespan = t->lifespan; /* save original lifespan */ - name = t->name; - t->name = 0; /* Clear name so it isn't freed */ - (void) krb5_rc_dfl_close_no_free(context, id); - retval = krb5_rc_dfl_resolve(context, id, name); - free(name); - if (retval) - return retval; - retval = krb5_rc_dfl_recover(context, id); - if (retval) - return retval; - t = (struct dfl_data *)id->data; /* point to recovered cache */ + if (! t->recovering) { + name = t->name; + t->name = 0; /* Clear name so it isn't freed */ + (void) krb5_rc_dfl_close_no_free(context, id); + retval = krb5_rc_dfl_resolve(context, id, name); + free(name); + if (retval) + return retval; + retval = krb5_rc_dfl_recover(context, id); + if (retval) + return retval; + t = (struct dfl_data *)id->data; /* point to recovered cache */ + } + tmp = (krb5_rcache) malloc(sizeof(*tmp)); if (!tmp) return ENOMEM; State-Changed-From-To: open-analyzed State-Changed-By: epeisach State-Changed-When: Fri Nov 15 20:31:07 1996 State-Changed-Why: The patch against beta 7 looks correct. (I stupidly started with the beta 4 version and worked out the identical beta 7 patch). I believe the code should work as advertized - but I would want to run it by purify or examine it some more to make sure there are no potential memory leaks with this code. This could be deadly for a long running server. State-Changed-From-To: analyzed-closed State-Changed-By: tytso State-Changed-When: Mon Nov 18 22:30:16 1996 State-Changed-Why: Patch from Ken H. applied >Unformatted: >figure out when there are too many expired entires in a replay cache >and expunge them. > > The only time that the replay cache code does an automatic >expunge is when it encounters more than 30 cache entries in a single >hash bucket. This doesn't happen until the replay cache on disk is >very, very large and there are many, many expired entries in it. The >result of this is that applications which use krb5_recvauth will start >out running at an acceptable speed, but will gradually get slower and >slower as they are used more, because their replay caches grow and >aren't expunged and therefore take a long time to recover each time the >application starts up. I observed this behavior with the new popper >which I installed on pad-thai. > > The fix for this is to also do an automatic expunge when >recovering the cache, if more than 30 expired entries are encountered >read in. Here's the patch (against krb5 beta 4; I don't know if it'll still apply cleanly to beta 7, but it should be pretty close) with the changes I made to fix the problem: ---cut here--- --- rc_dfl.c@@/main/1 Mon May 22 23:05:29 1995 +++ rc_dfl.c@@/main/2 Thu Apr 18 03:30:57 1996 @@ -54,8 +54,12 @@ of live krb5_donot_replays by EXCESSREPS. With the defaults here, a typical cache might build up some 10K of expired krb5_donot_replays before an automatic expunge, with the waste basically independent of the number of stores per -minute. */ +minute. +The rcache will also automatically be expunged when it encounters more +than EXCESSREPS expired entries when recovering a cache in +dfl_recover. */ + static int hash(rep, hsize) krb5_donot_replay *rep; int hsize; @@ -109,6 +113,7 @@ #ifndef NOIOSTUFF krb5_rc_iostuff d; #endif + char recovering; } ; @@ -270,6 +275,7 @@ #ifndef NOIOSTUFF t->d.fd = -1; #endif + t->recovering = 0; return 0; cleanup: @@ -374,13 +380,16 @@ #else struct dfl_data *t = (struct dfl_data *)id->data; - krb5_donot_replay *rep; + krb5_donot_replay *rep = 0; krb5_error_code retval; int max_size; + int expired_entries = 0; if (retval = krb5_rc_io_open(&t->d,t->name)) return retval; - + + t->recovering = 1; + max_size = krb5_rc_io_size(&t->d); rep = NULL; @@ -391,18 +400,17 @@ /* now read in each auth_replay and insert into table */ for (;;) { - rep = NULL; if (krb5_rc_io_mark(&t->d)) { retval = KRB5_RC_IO; goto io_fail; } - if (!(rep = (krb5_donot_replay *) malloc(sizeof(krb5_donot_replay)))) { + if (!(rep || + (rep = (krb5_donot_replay *) malloc(sizeof(krb5_donot_replay))))) { retval = KRB5_RC_MALLOC; goto io_fail; } - rep->client = NULL; - rep->server = NULL; + memset(rep, 0, sizeof(*rep)); retval = krb5_rc_io_fetch (t, rep, max_size); @@ -414,6 +422,7 @@ if (alive(rep,t->lifespan) == CMP_EXPIRED) { krb5_rc_free_entry(&rep); + expired_entries++; continue; } @@ -427,7 +436,6 @@ */ FREE(rep->server); FREE(rep->client); - rep = NULL; } retval = 0; krb5_rc_io_unmark(&t->d); @@ -439,6 +447,9 @@ krb5_rc_free_entry(&rep); if (retval) krb5_rc_io_close(&t->d); + else if (expired_entries > EXCESSREPS) + retval = krb5_rc_dfl_expunge(id); + t->recovering = 0; return retval; #endif @@ -450,7 +461,7 @@ { int clientlen, serverlen, len; char *buf, *ptr; - unsigned long ret; + krb5_error_code ret; clientlen = strlen (rep->client) + 1; serverlen = strlen (rep->server) + 1; @@ -476,7 +487,7 @@ krb5_rcache id; krb5_donot_replay *rep; { - unsigned long ret; + krb5_error_code ret; struct dfl_data *t = (struct dfl_data *)id->data; switch(store(id,rep)) { @@ -543,17 +554,20 @@ krb5_rcache tmp; krb5_deltat lifespan = t->lifespan; /* save original lifespan */ - name = t->name; - t->name = 0; /* Clear name so it isn't freed */ - (void) krb5_rc_dfl_close_no_free(id); - retval = krb5_rc_dfl_resolve(id, name); - free(name); - if (retval) + if (! t->recovering) { + name = t->name; + t->name = 0; /* Clear name so it isn't freed */ + (void) krb5_rc_dfl_close_no_free(id); + retval = krb5_rc_dfl_resolve(id, name); + free(name); + if (retval) return retval; - retval = krb5_rc_dfl_recover(id); - if (retval) + retval = krb5_rc_dfl_recover(id); + if (retval) return retval; - t = (struct dfl_data *)id->data; /* point to recovered cache */ + t = (struct dfl_data *)id->data; /* point to recovered cache */ + } + tmp = (krb5_rcache) malloc(sizeof(*tmp)); if (!tmp) return ENOMEM; @@ -574,6 +588,7 @@ return KRB5_RC_IO; if (krb5_rc_io_move(&t->d, &((struct dfl_data *)tmp->data)->d)) return KRB5_RC_IO; + krb5_rc_dfl_close(tmp); #endif return 0; } ---cut here--- It looks like we never submitted this patch to krb5-bugs@mit.edu, so I'm CC'ing this message there. -- Jonathan Kamens | OpenVision Technologies, Inc. | jik@cam.ov.com Here's the description of this bug in our (OV's) bug database: > krb5_recvauth relies on the replay cache code to automatically