Skip Menu |
 

From: John Devitofranceschi <jdvf@optonline.net>
Subject: Issues when rolling the master key online
Date: Sun, 30 Sep 2018 18:23:42 -0400
To: krb5-bugs@mit.edu
Download (untitled) / with headers
text/plain 4.3KiB
Following the instructions as given when using incremental propagation
(https://web.mit.edu/kerberos/krb5-1.12/doc/admin/database.html#updating-the-master-key),
it seems that you can end up with the master KDC in a bad way AND a dead kpropd on the slave.

Also, there's another case where things can go wrong in a different way when the update log rolls over.
The full resync request gets raised but doesn't get fulfilled until daemon processes get restarted. kpropd doesn't crash in that case, though.
There may also be a bad result when the kdb principal encryption incremental update is bundled with the mkey puge.
Let me know if you want the logs from those, too.

Master: Solaris 11 server running MIT 1.15
Slave: Fedora 28 server running MIT 1.16.1 (provided with the distro)

Also tried this with both hosts being 1.13.2 (Solaris 10), 15.1 (RHEL 7) and later with both running 1.16.1 on the hosts as described above and achieved similar results.


Create new mkey, wait for slave update (kdb5_util add_mkey -s)
-------------------------------------------------------------------------------------
MASTER LOG
Sep 29 17:35:48 endless.foonon.com kadmind[18090](Notice): Request: iprop_get_updates_1, UPDATE_OK; Incoming SerialNo=10; Outgoing SerialNo=11, success, client=kiprop/topper28.foonon.com@FOONON.COM, service=kiprop/endless.foonon.com@FOONON.COM, addr=192.168.1.224
SLAVE LOG
Sep 29 17:35:48 topper28 kpropd[27555]: Incremental updates: 1 updates / 7756 us

Use new mkey & update princs: kdb5_util use_mkey 2 ; kdb5_util update_princ_encryption (1132 principals)
-----------------------------------------------------------------------------------------------------------------------------------------------

# kdb5_util -d ./principal -sf ./.k5.FOONON.COM list_mkeys
Master keys for Principal: K/M@FOONON.COM
KVNO: 2, Enctype: aes256-cts-hmac-sha1-96, Active on: Sat Sep 29 17:37:20 EDT 2018 *

MASTER LOG
Sep 29 17:37:48 endless.foonon.com kadmind[18090](Notice): Request: iprop_get_updates_1, UPDATE_ERROR; Incoming SerialNo=11; Outgoing SerialNo=N/A, Update log conversion error, client=kiprop/topper28.foonon.com@FOONON.COM, service=kiprop/endless.foonon.com@FOONON.COM, addr=192.168.1.224
Sep 29 17:37:48 endless.foonon.com kadmind[18090](info): closing down fd 21
SLAVE LOG
Sep 29 17:37:48 topper28 kpropd[27555]: get_updates, error returned from master KDC.
Sep 29 17:37:48 topper28 kpropd[27555]: ERROR returned by master KDC, bailing.
Sep 29 17:37:48 topper28 kpropd[27555]: /usr/sbin/kpropd: Operation not permitted do_iprop failed.

Purge old key: kdb5_util purge_mkeys
---------------------------------------------------
kadmin.local: getprinc K/M
Principal: K/M@FOONON.COM
...
Last modified: Sat Sep 29 17:37:59 EDT 2018 (K/M@FOONON.COM)
...
Number of keys: 1
Key: vno 2, aes256-cts-hmac-sha1-96
MKey: vno 2
...

KDC MASTER LOG
Sep 29 17:39:49 endless.foonon.com krb5kdc[18065](info): AS_REQ (8 etypes {18 17 16 23 20 19 25 26}) 192.168.1.200: DECRYPT_CLIENT_KEY: host/endless.foonon.com@FOONON.COM for krbtgt/FOONON.COM@FOONON.COM, Decrypt integrity check failed
Sep 29 17:43:08 endless.foonon.com krb5kdc[18065](info): AS_REQ (8 etypes {18 17 16 23 20 19 25 26}) 192.168.1.200: DECRYPT_CLIENT_KEY: host/endless.foonon.com@FOONON.COM for krbtgt/FOONON.COM@FOONON.COM, Decrypt integrity check failed


Restart kadmind
----------------------
KDC MASTER LOG
Sep 29 17:44:36 endless.foonon.com krb5kdc[18065](info): AS_REQ (8 etypes {18 17 16 23 20 19 25 26}) 192.168.1.200: DECRYPT_CLIENT_KEY: host/endless.foonon.com@FOONON.COM for krbtgt/FOONON.COM@FOONON.COM, Decrypt integrity check failed

Restart krb5kdc
---------------------
KDC MASTER LOG
Sep 29 17:45:00 endless.foonon.com krb5kdc[18166](info): AS_REQ (8 etypes {18 17 16 23 20 19 25 26}) 192.168.1.200: ISSUE: authtime 1538257500, etypes {rep=18 tkt=18 ses=18}, host/endless.foonon.com@FOONON.COM for krbtgt/FOONON.COM@FOONON.COM

Restart kpropd
--------------------
MASTER LOG
Sep 29 17:47:53 endless.foonon.com kadmind[18160](Notice): Request: iprop_get_updates_1, UPDATE_OK; Incoming SerialNo=11; Outgoing SerialNo=1145, success, client=kiprop/topper28.foonon.com@FOONON.COM, service=kiprop/endless.foonon.com@FOONON.COM, addr=192.168.1.224
SLAVE LOG
Sep 29 17:47:53 topper28 kpropd[27627]: Incremental updates: 1134 updates / 428790 us

Changes included in the incremental update: activating the new master key, the princ enc changes, purging the old K/M key

Normal operation resumes.
Download smime.p7s
application/pkcs7-signature 2.3KiB

Message body not shown because it is not plain text.

Download (untitled) / with headers
text/plain 2.3KiB
I believe I know what went wrong on the master KDC in this scenario.
krb5_dbe_decrypt_key_data() in libkdb5 contains a mechanism for a
running process to continue working across a master key change, which
is:

1. try to decrypt the key with our current master key list
2. (if that fails) try to reread the master key list using our newest
master key
3. try again to decrypt the key with the newly read master key list

Importantly, step 2 does not re-read the stash file; it relies on the
auxiliary data in the K/M principal entry which contains copies of
the newest master key encrypted in the older ones.

Therefore, if there is no KDC activity between
update_princ_encryption and purge_mkeys, step 2 will fail because the
KDC never got a chance to update its master key list before the K/M
auxiliary data was pruned. I can reproduce this symptom
("DECRYPT_CLIENT_KEY: ... Decrypt integrity check failed") by
changing t_mkey.py to do a purge_mkeys and kinit right after an
update_princ_encryption.

The simplest operational workaround is to wait a while before purging
the old master key, and to make sure that the master KDC and kadmind
see some activity during that window.

At this time I am not sure what the best fix is. We can document the
need for KDC/kadmind activity before purge_mkeys, but that's not
really satisfactory for a couple of reasons (it's hard to be really
sure that kadmind and the master key have had activity, and there is
no safety check). We could maybe make step 2 reread the stash file,
but aside from possible implementation difficulties, it's possible to
operate a KDC without a stash file. We could add a signal handler to
the KDC and kadmind which causes a reread of the master key list, but
that's not very elegant either.

I think you may be right that there is another potential issue if
update_princ_encryption and purge_mkeys are propagated to a replica
KDC too quickly, particularly if that happens via full dump, but I
haven't worked out details. Ideally I would like to untangle any
problems there from this issue and address it in a separate ticket.

You implied that you observed a kpropd crash when the master KDC
becomes non-functional. That would be a third potential bug. Can
you confirm that the process actually stopped running, and perhaps
produce a core file with a backtrace? (I can also try to reproduce
that failure myself.)
From: ghudson@mit.edu
Subject: git commit

Document necessary delay in master key rolllover

During master key rollover, if the old master key is purged
immediately after updating principal encryption, running processes may
not successfully update their in-memory copies of the master key.
Document that the administrator should delay purging the master key
until after propagation and some daemon activity.

https://github.com/krb5/krb5/commit/24425b730161c3d27d86a7ae0caa2305f70167f6
Author: Greg Hudson <ghudson@mit.edu>
Commit: 24425b730161c3d27d86a7ae0caa2305f70167f6
Branch: master
doc/admin/database.rst | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
From: ghudson@mit.edu
Subject: git commit

Document necessary delay in master key rolllover

During master key rollover, if the old master key is purged
immediately after updating principal encryption, running processes may
not successfully update their in-memory copies of the master key.
Document that the administrator should delay purging the master key
until after propagation and some daemon activity.

(cherry picked from commit 24425b730161c3d27d86a7ae0caa2305f70167f6)

https://github.com/krb5/krb5/commit/91f331c507f6d36906b8432485b9b639c31ebff2
Author: Greg Hudson <ghudson@mit.edu>
Commit: 91f331c507f6d36906b8432485b9b639c31ebff2
Branch: krb5-1.17
doc/admin/database.rst | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)