Skip Menu |
 

Date: Thu, 11 Nov 2021 18:56:33 +0100
Subject: Race condition in krb5_set_password()
To: krb5-bugs@mit.edu
From: "Sumit Bose" <sbose@redhat.com>
Download (untitled) / with headers
text/plain 5.4KiB
*Problem statement:*

Local program calls krb5_set_password() to change machine account
password. Password change successfully done in AD, but due to race
condition in krb5_set_password(), that function returns error (status
code KRB5_KPASSWD_AUTHERROR).

Because that function returns errror, local program does not update
local store with new password. Thus, AD has new password for machine
account (with incremented KVNO) and local store has old password (with
original KVNO).

This occurs in small percentage of the time (when the race condition hits).

*Analysis:*

There is a race during password changes if a client retransmits the request
while the server is actually working on the first request but was not able to
process the change before the hardcoded 1s timeout of the client. If a client
in an Active Directory domain tries to automatically update its machine account
password and runs into this race condition it will typically lose the access
to the domain because the client will receive an error and must assume that the
password change failed while on the server (DC) side the password was updated.
As a result the client cannot authenticate itself in the Active Directory
domain anymore and the machine account password must be reset manually.

https://krbdev.mit.edu/rt/Ticket/Display.html?id=7905 attempted to fix this by
changing the default from UDP to TCP. But there is still a fallback to UDP and
there will be even retransmits via TCP:

[3504323] 1634978406.103204: Creating authenticator for HOST$@EXAMPLE.COM -> kadmin/changepw@EXAMPLE.COM, seqnum 0, subkey aes256-cts/3997, session key aes256-cts/D6BF
[3504323] 1634978406.103206: Resolving hostname my_ad_dc.example.com
[3504323] 1634978406.103207: Initiating TCP connection to stream 10.10.10.11:464
[3504323] 1634978407.618306: Sending initial UDP request to dgram 10.10.10.11:464
[3504323] 1634978407.618307: Sending TCP request to stream 10.10.10.11:464
[3504323] 1634978407.618308: Received answer (99 bytes) from stream 10.10.10.11:464
[3504323] 1634978407.618309: Terminating TCP connection to stream 10.10.10.11:464


This can also be see in the network traffic (10.10.10.11 is the KDC and
10.10.20.22 is the client):

83 0.836616 10.10.20.22 → 10.10.10.11 TCP 76 36556 → 464 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2699120574 TSecr=0 WS=128
84 - unrelated -
85 - unrelated -
86 1.837870 10.10.20.22 → 10.10.10.11 IPv4 1516 Fragmented IP protocol (proto=UDP 17, off=0, ID=4ec8)
87 1.867158 10.10.20.22 → 10.10.10.11 TCP 76 [TCP Retransmission] 36556 → 464 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2699121605 TSecr=0 WS=128
88 1.868315 10.10.10.11 → 10.10.20.22 TCP 76 464 → 36556 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1 TSval=1786871194 TSecr=2699121605
89 1.868354 10.10.20.22 → 10.10.10.11 TCP 68 36556 → 464 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2699121606 TSecr=1786871194
90 1.868484 10.10.20.22 → 10.10.10.11 TCP 1516 [TCP segment of a reassembled PDU]
91 1.868491 10.10.20.22 → 10.10.10.11 KPASSWD 183 Reply
92 1.869813 10.10.10.11 → 10.10.20.22 TCP 68 464 → 36556 [ACK] Seq=1 Ack=1564 Win=263424 Len=0 TSval=1786871196 TSecr=2699121606
93 1.870088 10.10.10.11 → 10.10.20.22 KPASSWD 171 KRB Error: KRB5KRB_AP_ERR_REPEAT
94 1.870098 10.10.20.22 → 10.10.10.11 TCP 68 36556 → 464 [ACK] Seq=1564 Ack=104 Win=29312 Len=0 TSval=2699121608 TSecr=1786871196
95 1.870190 10.10.20.22 → 10.10.10.11 TCP 68 36556 → 464 [FIN, ACK] Seq=1564 Ack=104 Win=29312 Len=0 TSval=2699121608 TSecr=1786871196
96 1.870945 10.10.10.11 → 10.10.20.22 TCP 68 464 → 36556 [ACK] Seq=104 Ack=1565 Win=263424 Len=0 TSval=1786871197 TSecr=2699121608
97 1.870999 10.10.10.11 → 10.10.20.22 TCP 62 464 → 36556 [RST, ACK] Seq=104 Ack=1565 Win=0 Len=0
98 1.910398 10.10.10.11 → 10.10.20.22 IP 217 Bogus IP version (0)

Packet 86 is the UDP request while Packet 98 is the corresponding reply, not
sure why wireshark has issues decoding them.

The client calling krb5_set_password() will return with an error, but not with
KRB5KRB_AP_ERR_REPEAT but with KRB5_KPASSWD_AUTHERROR because the payload in
packet 93:

Kerberos
krb-error
pvno: 5
msg-type: krb-error (30)
stime: 2021-10-23 08:40:07 (UTC)
susec: 446539
error-code: eRR-REPEAT (34)
realm: EXAMPLE.COM
sname
name-type: kRB5-NT-SRV-INST (2)
sname-string: 2 items
SNameString: kadmin
SNameString: changepw
e-data: 0003

has "e-data: 0003" which is retrieved by get_error_edata() as error code
returned to the caller.

The client now has an unclear error code triggered by the replay and has to
assume the password change failed while the server might process the initial
request and change the password.

The client might now try to get a TGT with the new key but this might fail as
well because it is not clear how long the server might need to change the
password.

I can think of multiple ways how to solve it:
- do not retry in libkrb5 at all
- ignore KRB5KRB_AP_ERR_REPEAT during krb5_set_password() and wait for other
replies from the server
- close the initial TCP connection if no data was send before trying to send
the request with UDP
- longer or configurable timeouts

Please let me know if more details are needed.

I might be able to help fixing the issue if you would let me know which would
be the preferred way to solve this issue.

bye,
Sumit
Show quoted text
> But there is still a fallback to UDP and there will be even retransmits via TCP

Will there actually be retransmits via TCP?  I think our sendto_kdc code only tries TCP once per server, relying on the kernel to do retransmits within the TCP stack.

However, if there are multiple admin servers, it may have one TCP request going to each admin servers in parallel, which could lead to a replay error.

Show quoted text
> The client calling krb5_set_password() will return with an error, but not with
KRB5KRB_AP_ERR_REPEAT but with KRB5_KPASSWD_AUTHERROR because [...] has "e-data: 0003" which is retrieved by get_error_edata() as error code
returned to the caller.

What code is actually returned by krb5_set_password() in this scenario?  KRB5_KPASSWD_AUTHERROR isn't a com_err code, and should only be communicated to the caller via *result_code.

Show quoted text
> I can think of multiple ways how to solve it:

I'll have to think on this.  Abandoning UDP for the kpasswd client is attractively simple.  But I don't know if we can safely assume that there are no deployments which admit only UDP kpasswd.  Also, we'd still ideally want to prevent sendto_kdc from trying TCP requests to multiple servers in parallel for kpasswd requests.  (There's an argument for applying this pratice to all kinds of requests, but in rare cases a KDC will ignore a Kerberos request in order to make the client use a different KDC.  So we'd want to resume trying other KDCs if we see the TCP connection closed with no reply.  That starts to be complicated.)

Once we do fall back to UDP, we're in a bad situation, because there's no way to recover from a lost UDP reply (ignoring AP_ERR_REPEAT will just lead to a timeout).  This specific scenario can be mitigated by giving up on TCP once we fall back to UDP, but not every scenario.
 
After some thought, I think a reasonable strategy is to try TCP only, and after that completely fails, try UDP only.  This will have terrible performance if TCP/464 is blackholed, but it will at least work.

I can see two basic implementation directions: we could define a new k5_transport_strategy and handle this within k5_sendto(), or we could make two separate calls to k5_sendto().  The latter option still requires an adjustment to the internal k5_locate/k5_sendto APIs since we can't currently ask for UDP only.

This plan does not rule out potential TCP-only races involving multiple admin servers, but I think we can defer worrying about that until it becomes a real issue.
Subject: git commit
From: ghudson@mit.edu

Try harder to avoid password change replay errors

Commit d7b3018d338fc9c989c3fa17505870f23c3759a8 (ticket 7905) changed
change_set_password() to prefer TCP. However, because UDP_LAST falls
back to UDP after one second, we can still get a replay error due to a
dropped packet, before the TCP layer has a chance to retry.

Instead, try k5_sendto() with NO_UDP, and only fall back to UDP after
TCP fails completely without reaching a server. In sendto_kdc.c,
implement an ONLY_UDP transport strategy to allow the UDP fallback.

https://github.com/krb5/krb5/commit/6297788e24cefa8f3fdd36f514e2e6569fa7b34a
Author: Greg Hudson <ghudson@mit.edu>
Commit: 6297788e24cefa8f3fdd36f514e2e6569fa7b34a
Branch: master
src/lib/krb5/os/changepw.c | 9 ++++++++-
src/lib/krb5/os/os-proto.h | 1 +
src/lib/krb5/os/sendto_kdc.c | 12 ++++++++----
3 files changed, 17 insertions(+), 5 deletions(-)