Skip Menu |
 

Date: Sun, 18 Aug 2013 07:39:01 -0400
From: Richard Basch <basch@alum.mit.edu>
Subject: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: krb5-bugs@mit.edu

In my test environment, even with krb5-1.11.3, I noticed a database reload (full resync) may still fail and result in the ulog being updated with the new serial number, resulting in an inconsistent environment.

 

I have another patch available which seems to fix the condition. Specifically, I have seen the condition occur with an accompanying log message:

…/kdb5_util returned a bad exit status (2)

 

krb5-1.11

https://github.com/rbasch/krb5/commit/83c34de8a740711961d05fde150cc8b16e68f9e1

 

krb5-1.10

https://github.com/rbasch/krb5/commit/638b2e299157b1c2e637cb992bc07cf9f5598594

 

Date: Sun, 18 Aug 2013 07:40:12 -0400
From: Richard Basch <basch@alum.mit.edu>
Subject: FW: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: krb5-bugs@mit.edu, krbdev@mit.edu
CC: richard.basch@gs.com

In a test environment, even with krb5-1.11.3, I noticed a database reload (full resync) may still fail and result in the ulog being updated with the new serial number, resulting in an inconsistent environment.

 

I have another patch available which seems to fix the condition. Specifically, I have seen the condition occur with an accompanying log message:

…/kdb5_util returned a bad exit status (2)

 

krb5-1.11

https://github.com/rbasch/krb5/commit/83c34de8a740711961d05fde150cc8b16e68f9e1

 

krb5-1.10

https://github.com/rbasch/krb5/commit/638b2e299157b1c2e637cb992bc07cf9f5598594

 

Sending to krb5-bugs is sufficient; you do not need to forward to krb5-
bugs and krbdev.

I do not really understand how kdb5_util could exit with status 2
(implying that it got all the way through load_db(), since nothing in
the code calls exit(2)) but not reset the ulog header with the code in
your earlier patch from issue #7588. Also, I would really love to know
in what way kdb5_util load is sometimes failing, since that should be
rare. Unfortunately, kdb5_util was written as a command-line tool and
not a component of a server, so when it fails, it just writes to stderr
instead of logging.

Rather than add a redundant second piece of code to reinitialize the
ulog header on failure, I wonder if we should make load_db() leave the
ulog header alone until after it has called krb5_db_promote(). I don't
see any point in having a window where the ulog header doesn't reflect
the status of the live database.
Date: Thu, 22 Aug 2013 18:01:11 -0400
Subject: RE: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
From: Richard Basch <basch@alum.mit.edu>
To: rt-comment@krbdev.mit.edu
RT-Send-Cc:
Download (untitled) / with headers
text/plain 1.4KiB
I have yet to find the kdb5_util bug too... it happens semi-regularly except when I try to reproduce it. And it turns out there is a third problem... I will send another patch shortly which should be safer and which supersedes both patches though the original can stay...


Richard Basch

(Sent via my Samsung GALAXY S™4)



-------- Original message --------
From: Greg Hudson via RT <rt-comment@krbdev.mit.edu>
Date: 08/21/2013 16:12 (GMT-05:00)
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated


Sending to krb5-bugs is sufficient; you do not need to forward to krb5-
bugs and krbdev.

I do not really understand how kdb5_util could exit with status 2
(implying that it got all the way through load_db(), since nothing in
the code calls exit(2)) but not reset the ulog header with the code in
your earlier patch from issue #7588.  Also, I would really love to know
in what way kdb5_util load is sometimes failing, since that should be
rare.  Unfortunately, kdb5_util was written as a command-line tool and
not a component of a server, so when it fails, it just writes to stderr
instead of logging.

Rather than add a redundant second piece of code to reinitialize the
ulog header on failure, I wonder if we should make load_db() leave the
ulog header alone until after it has called krb5_db_promote().  I don't
see any point in having a window where the ulog header doesn't reflect
the status of the live database.
Date: Sat, 24 Aug 2013 17:11:10 -0400
From: Richard Basch <basch@alum.MIT.EDU>
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: krbdev@mit.edu
CC: richard.basch@gs.com, "'Richard Basch'" <basch@alum.MIT.EDU>
RT-Send-Cc:
Download (untitled) / with headers
text/plain 2.3KiB
Instead of the supplied patch, refer to the following patches instead:

1.11:
https://github.com/rbasch/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

I am relatively confident in the above since they move the ulog update to
after the db promotion, ensuring everything is ok first. This should avoid
all the issues which previously existed.

There is a remote chance something might prevent the ulog update from taking
place but the db might be updated. I am not quite sure what the right answer
is in this case, but certainly the other way round of updating the ulog
before the database matches is wrong. There are pros and cons to resetting
the ulog prior to the db load, but in either scenario, the final state
should not be set until after the db is loaded & promoted.


Show quoted text
-----Original Message-----
From: krb5 [mailto:rt@krbdev.mit.edu]
Sent: Sunday, August 18, 2013 6:26 PM
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync
may fail and still result in ulog being updated


Greetings,

This message has been automatically generated in response to the
creation of a trouble ticket regarding:
"krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog
being updated",
a summary of which appears below.

There is no need to reply to this message right now. Your ticket has been
assigned an ID of [krbdev.mit.edu #7695].

Please include the string:

[krbdev.mit.edu #7695]

in the subject line of all future correspondence about this issue. To do so,

you may reply to this message.

Thank you,


-------------------------------------------------------------------------
In my test environment, even with krb5-1.11.3, I noticed a database reload
(full resync) may still fail and result in the ulog being updated with the
new serial number, resulting in an inconsistent environment.



I have another patch available which seems to fix the condition.
Specifically, I have seen the condition occur with an accompanying log
message:

./kdb5_util returned a bad exit status (2)



krb5-1.11

https://github.com/rbasch/krb5/commit/83c34de8a740711961d05fde150cc8b16e68f9
e1



krb5-1.10

https://github.com/rbasch/krb5/commit/638b2e299157b1c2e637cb992bc07cf9f55985
94


From: ghudson@mit.edu
Subject: git commit

Update ulog state after promoting DB when loading

If we are doing a full load, do not touch the ulog header until after
we promote the temporary DB to live. This avoids the same bugs as the
#7588 fix, but more robustly. Based on a patch from Richard Basch.

https://github.com/krb5/krb5/commit/825fa2be6f119677a09acccb109ab976cfc601f8
Author: Greg Hudson <ghudson@mit.edu>
Commit: 825fa2be6f119677a09acccb109ab976cfc601f8
Branch: master
src/kadmin/dbutil/dump.c | 51 +++++++++++++++++++---------------------------
1 files changed, 21 insertions(+), 30 deletions(-)
Date: Mon, 26 Aug 2013 23:16:12 -0400
From: Richard Basch <basch@alum.MIT.EDU>
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: "'Richard Basch'" <basch@alum.MIT.EDU>, krbdev@mit.edu
CC: richard.basch@gs.com
RT-Send-Cc:
Download (untitled) / with headers
text/plain 3.7KiB
I believe these are final... They compile and I believe they will fix the
issues (I will be testing for a couple days, but "no news is good news").

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

1.11:
https://github.com/rbasch/krb5/commit/f6237998bf7b20ea898d8b1ac2b30255caad89
d8
https://github.com/rbasch/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df


Show quoted text
-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Monday, August 26, 2013 10:12 PM
To: 'Richard Basch'; 'krbdev@mit.edu'
Cc: 'richard.basch@gs.com'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full
resync may fail and still result in ulog being updated

There is a second bug... I don't believe the patch I provided is wrong, but
I believe to be effective in my test environment, a second fix is required,
specifically to the dump side of kdb5_util, where it might not properly
determine whether to create a new dump file (the quick serial number check
is flawed).

Stay tuned for another git reference (though this one should only apply to
1.11 since 1.10 doesn't use the same conditional dump optimization logic).


-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Saturday, August 24, 2013 5:11 PM
To: krbdev@mit.edu
Cc: richard.basch@gs.com; 'Richard Basch'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full
resync may fail and still result in ulog being updated

Instead of the supplied patch, refer to the following patches instead:

1.11:
https://github.com/rbash/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

I am relatively confident in the above since they move the ulog update to
after the db promotion, ensuring everything is ok first. This should avoid
all the issues which previously existed.

There is a remote chance something might prevent the ulog update from taking
place but the db might be updated. I am not quite sure what the right answer
is in this case, but certainly the other way round of updating the ulog
before the database matches is wrong. There are pros and cons to resetting
the ulog prior to the db load, but in either scenario, the final state
should not be set until after the db is loaded & promoted.


-----Original Message-----
From: krb5 [mailto:rt@krbdev.mit.edu]
Sent: Sunday, August 18, 2013 6:26 PM
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync
may fail and still result in ulog being updated


Greetings,

This message has been automatically generated in response to the
creation of a trouble ticket regarding:
"krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog
being updated",
a summary of which appears below.

There is no need to reply to this message right now. Your ticket has been
assigned an ID of [krbdev.mit.edu #7695].

Please include the string:

[krbdev.mit.edu #7695]

in the subject line of all future correspondence about this issue. To do so,

you may reply to this message.

Thank you,


-------------------------------------------------------------------------
In my test environment, even with krb5-1.11.3, I noticed a database reload
(full resync) may still fail and result in the ulog being updated with the
new serial number, resulting in an inconsistent environment.



I have another patch available which seems to fix the condition.
Specifically, I have seen the condition occur with an accompanying log
message:

./kdb5_util returned a bad exit status (2)



krb5-1.11

https://github.com/rbasch/krb5/commit/83c34de8a740711961d05fde150cc8b16e68f9
e1



krb5-1.10

https://github.com/rbasch/krb5/commit/638b2e299157b1c2e637cb992bc07cf9f55985
94


Date: Mon, 26 Aug 2013 22:12:10 -0400
From: Richard Basch <basch@alum.MIT.EDU>
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: "'Richard Basch'" <basch@alum.MIT.EDU>, krbdev@mit.edu
CC: richard.basch@gs.com
RT-Send-Cc:
There is a second bug... I don't believe the patch I provided is wrong, but
I believe to be effective in my test environment, a second fix is required,
specifically to the dump side of kdb5_util, where it might not properly
determine whether to create a new dump file (the quick serial number check
is flawed).

Stay tuned for another git reference (though this one should only apply to
1.11 since 1.10 doesn't use the same conditional dump optimization logic).


Show quoted text
-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Saturday, August 24, 2013 5:11 PM
To: krbdev@mit.edu
Cc: richard.basch@gs.com; 'Richard Basch'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full
resync may fail and still result in ulog being updated

Instead of the supplied patch, refer to the following patches instead:

1.11:
https://github.com/rbasch/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

I am relatively confident in the above since they move the ulog update to
after the db promotion, ensuring everything is ok first. This should avoid
all the issues which previously existed.

There is a remote chance something might prevent the ulog update from taking
place but the db might be updated. I am not quite sure what the right answer
is in this case, but certainly the other way round of updating the ulog
before the database matches is wrong. There are pros and cons to resetting
the ulog prior to the db load, but in either scenario, the final state
should not be set until after the db is loaded & promoted.


-----Original Message-----
From: krb5 [mailto:rt@krbdev.mit.edu]
Sent: Sunday, August 18, 2013 6:26 PM
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync
may fail and still result in ulog being updated


Greetings,

This message has been automatically generated in response to the
creation of a trouble ticket regarding:
"krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog
being updated",
a summary of which appears below.

There is no need to reply to this message right now. Your ticket has been
assigned an ID of [krbdev.mit.edu #7695].

Please include the string:

[krbdev.mit.edu #7695]

in the subject line of all future correspondence about this issue. To do so,

you may reply to this message.

Thank you,


-------------------------------------------------------------------------
In my test environment, even with krb5-1.11.3, I noticed a database reload
(full resync) may still fail and result in the ulog being updated with the
new serial number, resulting in an inconsistent environment.



I have another patch available which seems to fix the condition.
Specifically, I have seen the condition occur with an accompanying log
message:

./kdb5_util returned a bad exit status (2)



krb5-1.11

https://github.com/rbasch/krb5/commit/83c34de8a740711961d05fde150cc8b16e68f9
e1



krb5-1.10

https://github.com/rbasch/krb5/commit/638b2e299157b1c2e637cb992bc07cf9f55985
94


From: "Basch, Richard" <Richard.Basch@gs.com>
To: "'Richard Basch'" <basch@alum.MIT.EDU>, "krbdev@mit.edu" <krbdev@mit.edu>
Date: Mon, 26 Aug 2013 23:38:58 -0400
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
RT-Send-Cc:
Download (untitled) / with headers
text/plain 4.2KiB
There's yet another issue... it may be exacerbated by having a slow machine try to load a large database.
The dump file is now being sent correctly, but there is still at least one more problem in the load.

Show quoted text
-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Monday, August 26, 2013 11:16 PM
To: 'Richard Basch'; krbdev@mit.edu
Cc: Basch, Richard [Tech]
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated

I believe these are final... They compile and I believe they will fix the issues (I will be testing for a couple days, but "no news is good news").

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

1.11:
https://github.com/rbasch/krb5/commit/f6237998bf7b20ea898d8b1ac2b30255caad89
d8
https://github.com/rbasch/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df


-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Monday, August 26, 2013 10:12 PM
To: 'Richard Basch'; 'krbdev@mit.edu'
Cc: 'richard.basch@gs.com'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated

There is a second bug... I don't believe the patch I provided is wrong, but I believe to be effective in my test environment, a second fix is required, specifically to the dump side of kdb5_util, where it might not properly determine whether to create a new dump file (the quick serial number check is flawed).

Stay tuned for another git reference (though this one should only apply to
1.11 since 1.10 doesn't use the same conditional dump optimization logic).


-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Saturday, August 24, 2013 5:11 PM
To: krbdev@mit.edu
Cc: richard.basch@gs.com; 'Richard Basch'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated

Instead of the supplied patch, refer to the following patches instead:

1.11:
https://github.com/rbash/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

I am relatively confident in the above since they move the ulog update to after the db promotion, ensuring everything is ok first. This should avoid all the issues which previously existed.

There is a remote chance something might prevent the ulog update from taking place but the db might be updated. I am not quite sure what the right answer is in this case, but certainly the other way round of updating the ulog before the database matches is wrong. There are pros and cons to resetting the ulog prior to the db load, but in either scenario, the final state should not be set until after the db is loaded & promoted.


-----Original Message-----
From: krb5 [mailto:rt@krbdev.mit.edu]
Sent: Sunday, August 18, 2013 6:26 PM
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated


Greetings,

This message has been automatically generated in response to the
creation of a trouble ticket regarding:
"krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog
being updated",
a summary of which appears below.

There is no need to reply to this message right now. Your ticket has been
assigned an ID of [krbdev.mit.edu #7695].

Please include the string:

[krbdev.mit.edu #7695]

in the subject line of all future correspondence about this issue. To do so,

you may reply to this message.

Thank you,


-------------------------------------------------------------------------
In my test environment, even with krb5-1.11.3, I noticed a database reload
(full resync) may still fail and result in the ulog being updated with the
new serial number, resulting in an inconsistent environment.



I have another patch available which seems to fix the condition.
Specifically, I have seen the condition occur with an accompanying log
message:

./kdb5_util returned a bad exit status (2)



krb5-1.11

https://github.com/rbasch/krb5/commit/83c34de8a740711961d05fde150cc8b16e68f9
e1



krb5-1.10

https://github.com/rbasch/krb5/commit/638b2e299157b1c2e637cb992bc07cf9f55985
94


Date: Wed, 28 Aug 2013 00:25:04 -0400
From: Richard Basch <basch@alum.MIT.EDU>
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: "'Richard Basch'" <basch@alum.MIT.EDU>, krbdev@mit.edu
CC: richard.basch@gs.com
RT-Send-Cc:
Download (untitled) / with headers
text/plain 5.6KiB
One more fix for 1.11 related to full resync (and this set of fixes is
"final"; I have confirmed functionality in my test environment).

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

1.11:
https://github.com/rbasch/krb5/commit/f6237998bf7b20ea898d8b1ac2b30255caad89
d8
https://github.com/rbasch/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df
https://github.com/rbasch/krb5/commit/906d18fe56849ee59a114c31e5242a749166bc
f5

Basically, there was a significant performance hit with the 1.11 DB reloads,
which I finally tracked down to the ulog locking for each record being
inserted during the database during a DB reload (causing a timeout even with
a 30 minute timeout; the reload then completed in 2-3 minutes after the data
transfer after making the last change).

In essence, 1.11 required 3 patches... 1.10 only 1 patch.

The other problem with 1.11 was I found it would pick up improper/stale dump
files especially if the serial number had been reset to a lower value. So in
essence, my patch for 1.11 is in 3 parts:
- Reset the ulog to the new information only after the new database is
promoted.
- Perform a more thorough check whether a conditional dump is required (the
quick check was flawed)
- Only perform a ulog_lock in krb5_db_put_principal if it is on the master
KDC (whether you adopt this technique or simply do not lock if iproprole is
a specific value is debatable), but during a full db reload, you should not
perform a ulog_lock for each record being loaded.

This set seems to solve all the problems I have observed.


Show quoted text
-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Monday, August 26, 2013 11:16 PM
To: 'Richard Basch'; 'krbdev@mit.edu'
Cc: 'richard.basch@gs.com'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full
resync may fail and still result in ulog being updated

I believe these are final... They compile and I believe they will fix the
issues (I will be testing for a couple days, but "no news is good news").

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

1.11:
https://github.com/rbasch/krb5/commit/f6237998bf7b20ea898d8b1ac2b30255caad89
d8
https://github.com/rbasch/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df


-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Monday, August 26, 2013 10:12 PM
To: 'Richard Basch'; 'krbdev@mit.edu'
Cc: 'richard.basch@gs.com'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full
resync may fail and still result in ulog being updated

There is a second bug... I don't believe the patch I provided is wrong, but
I believe to be effective in my test environment, a second fix is required,
specifically to the dump side of kdb5_util, where it might not properly
determine whether to create a new dump file (the quick serial number check
is flawed).

Stay tuned for another git reference (though this one should only apply to
1.11 since 1.10 doesn't use the same conditional dump optimization logic).


-----Original Message-----
From: Richard Basch [mailto:basch@alum.mit.edu]
Sent: Saturday, August 24, 2013 5:11 PM
To: krbdev@mit.edu
Cc: richard.basch@gs.com; 'Richard Basch'
Subject: RE: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full
resync may fail and still result in ulog being updated

Instead of the supplied patch, refer to the following patches instead:

1.11:
https://github.com/rbash/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df

1.10:
https://github.com/rbasch/krb5/commit/fc7aabea9bf0abb0712a8509c38d4382474361
c3

I am relatively confident in the above since they move the ulog update to
after the db promotion, ensuring everything is ok first. This should avoid
all the issues which previously existed.

There is a remote chance something might prevent the ulog update from taking
place but the db might be updated. I am not quite sure what the right answer
is in this case, but certainly the other way round of updating the ulog
before the database matches is wrong. There are pros and cons to resetting
the ulog prior to the db load, but in either scenario, the final state
should not be set until after the db is loaded & promoted.


-----Original Message-----
From: krb5 [mailto:rt@krbdev.mit.edu]
Sent: Sunday, August 18, 2013 6:26 PM
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] AutoReply: krb5-1.11.3/1.10.6 - full resync
may fail and still result in ulog being updated


Greetings,

This message has been automatically generated in response to the
creation of a trouble ticket regarding:
"krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog
being updated",
a summary of which appears below.

There is no need to reply to this message right now. Your ticket has been
assigned an ID of [krbdev.mit.edu #7695].

Please include the string:

[krbdev.mit.edu #7695]

in the subject line of all future correspondence about this issue. To do so,

you may reply to this message.

Thank you,


-------------------------------------------------------------------------
In my test environment, even with krb5-1.11.3, I noticed a database reload
(full resync) may still fail and result in the ulog being updated with the
new serial number, resulting in an inconsistent environment.



I have another patch available which seems to fix the condition.
Specifically, I have seen the condition occur with an accompanying log
message:

./kdb5_util returned a bad exit status (2)



krb5-1.11

https://github.com/rbasch/krb5/commit/83c34de8a740711961d05fde150cc8b16e68f9
e1



krb5-1.10

https://github.com/rbasch/krb5/commit/638b2e299157b1c2e637cb992bc07cf9f55985
94


Please send follow-ups by replying (without quoting) to a message you
receive from rt@krbdev.mit.edu, not by forwarding them to
krbdev@mit.edu. We've been scooping these follow-ups out of the krbdev
moderation queue and appending them to the ticket.

I believe the performance problem you noted was actually introduced by
your first patch. If you look at the code you moved in that patch,
you'll see that one of the things it does is set log_ctx->iproprole =
IPROP_NULL before restore_dump(). The other part of the third patch
should be unnecessary, because ulog_lock() does nothing if iproprole is
IPROP_NULL.

Your third patch also seems broken, in that setting log_ctx->iproprole =
IPROP_NULL before restore_dump() will prevent the ulog from being
updated afterwards. On master
(https://github.com/krb5/krb5/commit/825fa2be6f119677a09acccb109ab976cfc
601f8) I worked around this by setting iproprole to IPROP_SLAVE instead
of IPROP_NULL, but that only works because in master, kdb5.c only calls
ulog_lock() if iproprole is IPROP_MASTER.
Date: Wed, 28 Aug 2013 18:48:02 -0400
From: Richard Basch <basch@alum.mit.edu>
Subject: RE: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: rt-comment@krbdev.mit.edu
RT-Send-Cc:
Download (untitled) / with headers
text/plain 2.9KiB
My concern was within restore_dump()... it calls krb5_db_put_principal()
which in turn calls ulog_lock for each record (this was a major performance
hit especially with a database with about a half-million principals). Under
Kerberos 1.10, I found it only took 2-3 minutes to load the database (and
another 2-3 to send it over the wire), but I was seeing inordinately long
load times which I could trace back to the ulog_lock/unlock for every
record. I will have to review the master branch carefully, but if that code
also exists in the master branch, see how long it takes to do a dump/restore
for a database with about 500,000 principals (and compare it with the time
it took under Kerberos 1.10.x).

And I agree with you that I might have introduced the performance issue with
my first patch (when I moved the iproprole setting, which is why I set it to
IPROP_NULL as it was in the original state before I patched it).

I am confused how you say the third patch is broken, as by this point, the
ulog is mapped and I am setting the ulog at the end of the restore. During
the restore, you do not want to be touching the ulog at all (e.g.
IPROP_NULL) because the database being restored is "on the side". Once it is
promoted, one might argue the first thing should be to set it to
IPROP_SLAVE, then set the information, but that is not what was in the code
originally either.

The goal should be not to allow any ulog processing or locking during the
restore (which is to a temp db anyway), and then set the ulog state after
the restore.

I will play around with the master branch (what I supplied was based on the
1.11 branch and empirical testing, but the two branches have deviated enough
at this point).


Show quoted text
-----Original Message-----
From: Greg Hudson via RT [mailto:rt-comment@krbdev.mit.edu]
Sent: Wednesday, August 28, 2013 6:07 PM
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail
and still result in ulog being updated

Please send follow-ups by replying (without quoting) to a message you
receive from rt@krbdev.mit.edu, not by forwarding them to
krbdev@mit.edu. We've been scooping these follow-ups out of the krbdev
moderation queue and appending them to the ticket.

I believe the performance problem you noted was actually introduced by
your first patch. If you look at the code you moved in that patch,
you'll see that one of the things it does is set log_ctx->iproprole =
IPROP_NULL before restore_dump(). The other part of the third patch
should be unnecessary, because ulog_lock() does nothing if iproprole is
IPROP_NULL.

Your third patch also seems broken, in that setting log_ctx->iproprole =
IPROP_NULL before restore_dump() will prevent the ulog from being
updated afterwards. On master
(https://github.com/krb5/krb5/commit/825fa2be6f119677a09acccb109ab976cfc
601f8) I worked around this by setting iproprole to IPROP_SLAVE instead
of IPROP_NULL, but that only works because in master, kdb5.c only calls
ulog_lock() if iproprole is IPROP_MASTER.
Date: Wed, 28 Aug 2013 18:48:43 -0400
From: Richard Basch <basch@alum.mit.edu>
Subject: RE: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: 'Richard Basch' <basch@alum.mit.edu>, rt-comment@krbdev.mit.edu
RT-Send-Cc:
Download (untitled) / with headers
text/plain 1.6KiB
My concern was within restore_dump()... it calls krb5_db_put_principal()
which in turn calls ulog_lock for each record (this was a major performance
hit especially with a database with about a half-million principals). Under
Kerberos 1.10, I found it only took 2-3 minutes to load the database (and
another 2-3 to send it over the wire), but I was seeing inordinately long
load times which I could trace back to the ulog_lock/unlock for every
record. I will have to review the master branch carefully, but if that code
also exists in the master branch, see how long it takes to do a dump/restore
for a database with about 500,000 principals (and compare it with the time
it took under Kerberos 1.10.x).

And I agree with you that I might have introduced the performance issue with
my first patch (when I moved the iproprole setting, which is why I set it to
IPROP_NULL as it was in the original state before I patched it).

I am confused how you say the third patch is broken, as by this point, the
ulog is mapped and I am setting the ulog at the end of the restore. During
the restore, you do not want to be touching the ulog at all (e.g.
IPROP_NULL) because the database being restored is "on the side". Once it is
promoted, one might argue the first thing should be to set it to
IPROP_SLAVE, then set the information, but that is not what was in the code
originally either.

The goal should be not to allow any ulog processing or locking during the
restore (which is to a temp db anyway), and then set the ulog state after
the restore.

I will play around with the master branch (what I supplied was based on the
1.11 branch and empirical testing, but the two branches have deviated enough
at this point).
At line 2925, log_ctx->iproprole is set to IPROP_NULL. At line 2970, the
conditional won't fire because log_ctx->iproprole is false, and the ulog
won't be updated.

In 1.11, you do want iproprole to be IPROP_NULL during restore_dump() in
order to avoid locking the ulog (whereas on master, any iproprole other
than IPROP_MASTER will do). However, you'll need to change the
bookkeeping somehow so that the conditional at line 2970 still fires.
Date: Wed, 28 Aug 2013 19:54:47 -0400
From: Richard Basch <basch@alum.mit.edu>
Subject: RE: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: rt-comment@krbdev.mit.edu
RT-Send-Cc:
Agreed... I'll fix my 1.11 branch.

But also fix the conditional dump, which is seriously broken (this was one
of the 3 GIT commits I supplied). I had a QA environment where I restored a
dump, but the old from_master had a later serial number than what was
present in the new ulog. It doesn't do a proper sanity check and thought the
from_master was ok to use. I fixed the logic to actually search through the
ulog for the corresponding entry to determine if the dump header really does
match a ulog entry. I first perform a quick check if the latest dump matches
the latest ulog header (which dovetails with my hierarchical tree
propagation patches which I submitted earlier, but the check is agnostic as
to whether that patch is applied or not) -- this is the only optimization
which is "safe" and guaranteed. For instance, if a ulog is reinitialized,
the new ulog may not have any entries to match against, but the slaves
should not defer getting a database resync from the master until an update
is performed solely because of this condition (that was the theoretical
driver behind retaining the optimization). I did not confirm whether it
might potentially result in a second full resync after the first update, but
with or without conditional dumps, the issue already would have existed (so
I wasn't worried, and for this section of code, it seemed to be the logical
action and if the theoretical double-sync issue exists, the problem should
be fixed elsewhere). The only other optimization would possibly have been to
match against the first sno entry in the ulog header, since that might be
the only other condition where a dump is not required. The only reason I did
not do that initially is for busy sites, by the time you made the dump and
sent it, the other side may already need another one (it is a fringe case,
where accepting the first entry without a ulog history vs. not accepting it
both have potential consequences of requiring another full resync anyway; I
opted to simply let it do another dump if it wasn't in the actual ulog
history).
Date: Wed, 28 Aug 2013 23:42:52 -0400
From: Richard Basch <basch@alum.mit.edu>
Subject: RE: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
To: 'Richard Basch' <basch@alum.mit.edu>, rt-comment@krbdev.mit.edu, ghudson@mit.edu
CC: richard.basch@gs.com
RT-Send-Cc:
Download (untitled) / with headers
text/plain 1.1KiB
Here is my updated krb5-1.11 branch, with the patches in order:

https://github.com/rbasch/krb5/commit/affc746f296869d25c49ee2eabc843c60470ac
df
https://github.com/rbasch/krb5/commit/f6237998bf7b20ea898d8b1ac2b30255caad89
d8
https://github.com/rbasch/krb5/commit/906d18fe56849ee59a114c31e5242a749166bc
f5
https://github.com/rbasch/krb5/commit/9a788de948a73557defd3f520fba7983944f6e
f6

I basically save the iproplog state temporarily so I know I should update
the ulog later.

There are basically two fixes:
1. Update the ulog only after the new db is promoted
2. When using conditional dumps, make sure the dump header is present in the
ulog (the original "guessing" led to too many cases where the old dump was
assumed to be ok when it did not match against any ulog entries).

My patches above include a fix to lib/kdb to also suppress ulog locking when
not in master mode for krb5_db_put_principal, but it might not be required
based on the last patch (the main file which needs to be patched is
kadmin/dbutil/dump.c). If ulog_lock is a no-op when iproprole = IPROP_NULL,
then this can be omitted, otherwise, it is required; I set iproprole =
IPROP_NULL just prior to restore_dump().
In the absence of hierarchical iprop, I can't come up with a scenario
where current_dump_sno_in_ulog() erroneously returns true. Can you
outline a series of operations on the master which would cause it to do
so?
Date: Thu, 29 Aug 2013 09:35:24 -0400
Subject: RE: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated
From: Richard Basch <basch@alum.mit.edu>
To: rt-comment@krbdev.mit.edu
RT-Send-Cc:
It might be related but I prefer defensive checks and at best it is currently a very spotty check which might be a problem if you run multiple realms on a box. I can think of a few conditions without hierarchical but most would be multiple processes and an inadvertent configuration error.


Richard Basch

(Sent via my Samsung GALAXY S™4)



-------- Original message --------
From: Greg Hudson via RT <rt-comment@krbdev.mit.edu>
Date: 08/29/2013 07:28 (GMT-05:00)
To: basch@alum.mit.edu
Subject: [krbdev.mit.edu #7695] krb5-1.11.3/1.10.6 - full resync may fail and still result in ulog being updated


In the absence of hierarchical iprop, I can't come up with a scenario
where current_dump_sno_in_ulog() erroneously returns true.  Can you
outline a series of operations on the master which would cause it to do
so?