Subject: | iprop + lockout + large DB = KDC outage during full resync |
When we dump the database for an iprop full resync, kdb5_util dump takes
out a database read lock for the length of the dump operation, to make
sure we are getting a consistent snapshot reflecting a particular serial
number. If the KDC needs to update the account lockout fields during
this time because of a preauth success or failure, it will wait for the
dump to complete.
Even without the account lockout changes, the dump will delay password
changes and administrative database updates, which can be inconvenient
for a large principal database.
We probably do not need the full resync snapshot to reflect exactly the
serial number, as long as updates for later serial numbers are processed
idempotently by the slave (which they are except for deletes, which will
currently fail if the entry is already deleted). Producing a consistent
snapshot makes the iprop system easier to analyze, but is probably not
necessary.
Alternatively, we could try to store or lock the non-replicated account
lockout fields separately from the rest of the database. This is a more
difficult change (partly because we have to consider more than just
DB2), and wouldn't solve the problem of iprop full resyncs delaying
kadmind.
More in this email thread:
http://mailman.mit.edu/pipermail/kerberos/2014-April/019798.html
out a database read lock for the length of the dump operation, to make
sure we are getting a consistent snapshot reflecting a particular serial
number. If the KDC needs to update the account lockout fields during
this time because of a preauth success or failure, it will wait for the
dump to complete.
Even without the account lockout changes, the dump will delay password
changes and administrative database updates, which can be inconvenient
for a large principal database.
We probably do not need the full resync snapshot to reflect exactly the
serial number, as long as updates for later serial numbers are processed
idempotently by the slave (which they are except for deletes, which will
currently fail if the entry is already deleted). Producing a consistent
snapshot makes the iprop system easier to analyze, but is probably not
necessary.
Alternatively, we could try to store or lock the non-replicated account
lockout fields separately from the rest of the database. This is a more
difficult change (partly because we have to consider more than just
DB2), and wouldn't solve the problem of iprop full resyncs delaying
kadmind.
More in this email thread:
http://mailman.mit.edu/pipermail/kerberos/2014-April/019798.html