From raeburn@raeburn.org Mon Aug 25 09:32:17 1997
Received: from MIT.EDU (PACIFIC-CARRIER-ANNEX.MIT.EDU [18.69.0.28]) by rt-11.MIT.EDU (8.7.5/8.7.3) with SMTP id JAA16201 for <bugs@RT-11.MIT.EDU>; Mon, 25 Aug 1997 09:32:16 -0400
Received: from tweedledumb.cygnus.com by MIT.EDU with SMTP
id AA23141; Mon, 25 Aug 97 09:32:14 EDT
Received: from kr-pc.cygnus.com (kr-pc.cygnus.com [192.80.44.193])
by tweedledumb.cygnus.com (8.8.5/8.8.5) with ESMTP id JAA23050;
Mon, 25 Aug 1997 09:31:54 -0400 (EDT)
Received: (from raeburn@localhost) by kr-pc.cygnus.com (8.8.5/8.6.9) id JAA09326; Mon, 25 Aug 1997 09:31:09 -0400 (EDT)
Message-Id: <199708251331.JAA09326@kr-pc.cygnus.com>
Date: Mon, 25 Aug 1997 09:31:09 -0400 (EDT)
From: raeburn@cygnus.com
Reply-To: raeburn@cygnus.com
To: bugs@cygnus.com
Cc: krb5-bugs@MIT.EDU
Subject: Kerberos rshd deadlock
X-Send-Pr-Version: 3.102
System: NetBSD kr-pc.cygnus.com 1.2B NetBSD 1.2B (RAEBURN) #1: Sun Nov 17 01:30:47 EST 1996 root@kr-pc.cygnus.com:/h/NetBSD-build/sys/arch/i386/compile/RAEBURN i386
[This was with KerbNet 1.2, but I expect the MIT code suffers the same
lossage. Note I'm sending to both bug addresses. MIT folks, this
should be category "krb5-appl", but that's not on the KerbNet category
list.]
This command (rsh from osf1 to linux) hangs:
(sleep 15 ; cat /vmunix) \
| rsh maneki-neko cat /vmlinuz \
| (sleep 30 ; cat > /dev/null)
The cat process (on the Linux box) stops in "pipe_wait" state. The
kshd process does too. According to netstat, a large amount of data
is in the receive queue; none is in the send queue, or the receive
queue of the client. Even when the client side finishes the 30-second
sleep and starts reading, the server side does not recover.
My guess: kshd got data from the net, found that the pipe to the child
was available, and tried writing to it, but since the child wasn't
reading, less pipe buffer space was available than was needed, so kshd
blocked waiting for the child to read. Then the child blocked because
kshd wasn't reading its output.
I first ran into this with "rsync", running in "push" mode; it keeps
locking up. The server side is probably starting to receive file
contents while still sending checksum data for the hierarchy. Come to
think of it, the perl script I used to use instead of rsync, which did
"rsh host tar -c -f - --files-from - < list-o-files" may have
triggered this bug a few times too. I should see if rdist hangs
too....
All other cases of writes in rsh and kshd ought to be checked to make
sure they can't block the flow of data in another direction.
Run
(sleep 15 ; cat /somebigfile) \
| rsh somehost cat /somebigfile \
| (sleep 30 ; cat > /dev/null)
between two machines on a fast network. (Fast enough that the various
kernel buffers can be filled in the specified delays.)
IMNSHO kshd (and rsh) should be using non-blocking I/O for any writes.
If the child isn't reading, let the pipe fill, stop reading from the
net, let that buffer fill, and let the kernel throttle the TCP
transmission. If a deadlock still results, *then* it's an application
bug.
From: Tami King <tking@globe.acs.ohio-state.edu>
To: krb5-bugs@MIT.EDU
Cc: Subject: Re: pending/463: Kerberos rshd deadlock
Date: Mon, 25 Aug 1997 09:41:11 -0400 (EDT)
I am out of the office until August 25. I will read your email when I
return. If you need assistance before then, please contact Jim Butler
(butler.33@osu.edu) or Vickie Starbuck (starbuck.1@osu.edu).
------------------------------------------------------------------------------
Tamara I. King ---- __o University Technology Services
King.281@osu.edu ---- _`\<,_ The Ohio State University
------------------------------ (*)/ (*) ------------------------------------
Responsible-Changed-From-To: gnats-admin->tlyu
Responsible-Changed-By: tlyu
Responsible-Changed-When: Mon Sep 1 22:09:57 1997
Responsible-Changed-Why:
Refiled
From: Ken Raeburn <raeburn@MIT.EDU>
To: krb5-bugs@MIT.EDU
Cc: Subject: krb5-appl/463: kshd hangs with write deadlock
Date: Wed, 7 Feb 2001 13:06:17 -0500 (EST)
For the record, this bug is still present in MIT's post-1.2 release.
And it appears to affect rsync in a bad, bad way, such that I pretty
much have to use ssh instead for large rsync jobs.
Received: from MIT.EDU (PACIFIC-CARRIER-ANNEX.MIT.EDU [18.69.0.28]) by rt-11.MIT.EDU (8.7.5/8.7.3) with SMTP id JAA16201 for <bugs@RT-11.MIT.EDU>; Mon, 25 Aug 1997 09:32:16 -0400
Received: from tweedledumb.cygnus.com by MIT.EDU with SMTP
id AA23141; Mon, 25 Aug 97 09:32:14 EDT
Received: from kr-pc.cygnus.com (kr-pc.cygnus.com [192.80.44.193])
by tweedledumb.cygnus.com (8.8.5/8.8.5) with ESMTP id JAA23050;
Mon, 25 Aug 1997 09:31:54 -0400 (EDT)
Received: (from raeburn@localhost) by kr-pc.cygnus.com (8.8.5/8.6.9) id JAA09326; Mon, 25 Aug 1997 09:31:09 -0400 (EDT)
Message-Id: <199708251331.JAA09326@kr-pc.cygnus.com>
Date: Mon, 25 Aug 1997 09:31:09 -0400 (EDT)
From: raeburn@cygnus.com
Reply-To: raeburn@cygnus.com
To: bugs@cygnus.com
Cc: krb5-bugs@MIT.EDU
Subject: Kerberos rshd deadlock
X-Send-Pr-Version: 3.102
Show quoted text
>Number: 463
>Category: krb5-appl
>Synopsis: kshd hangs with write deadlock
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: tlyu
>State: open
>Class: sw-bug
>Submitter-Id: unknown
>Arrival-Date: Mon Aug 25 09:33:01 EDT 1997
>Last-Modified: Wed Feb 07 13:07:00 EST 2001
>Originator: Ken Raeburn
>Organization:
Cygnus Solutions>Category: krb5-appl
>Synopsis: kshd hangs with write deadlock
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: tlyu
>State: open
>Class: sw-bug
>Submitter-Id: unknown
>Arrival-Date: Mon Aug 25 09:33:01 EDT 1997
>Last-Modified: Wed Feb 07 13:07:00 EST 2001
>Originator: Ken Raeburn
>Organization:
Show quoted text
>Release: kerbnet-1.2
>Environment:
>Environment:
System: NetBSD kr-pc.cygnus.com 1.2B NetBSD 1.2B (RAEBURN) #1: Sun Nov 17 01:30:47 EST 1996 root@kr-pc.cygnus.com:/h/NetBSD-build/sys/arch/i386/compile/RAEBURN i386
Show quoted text
>Description:
[This was with KerbNet 1.2, but I expect the MIT code suffers the same
lossage. Note I'm sending to both bug addresses. MIT folks, this
should be category "krb5-appl", but that's not on the KerbNet category
list.]
This command (rsh from osf1 to linux) hangs:
(sleep 15 ; cat /vmunix) \
| rsh maneki-neko cat /vmlinuz \
| (sleep 30 ; cat > /dev/null)
The cat process (on the Linux box) stops in "pipe_wait" state. The
kshd process does too. According to netstat, a large amount of data
is in the receive queue; none is in the send queue, or the receive
queue of the client. Even when the client side finishes the 30-second
sleep and starts reading, the server side does not recover.
My guess: kshd got data from the net, found that the pipe to the child
was available, and tried writing to it, but since the child wasn't
reading, less pipe buffer space was available than was needed, so kshd
blocked waiting for the child to read. Then the child blocked because
kshd wasn't reading its output.
I first ran into this with "rsync", running in "push" mode; it keeps
locking up. The server side is probably starting to receive file
contents while still sending checksum data for the hierarchy. Come to
think of it, the perl script I used to use instead of rsync, which did
"rsh host tar -c -f - --files-from - < list-o-files" may have
triggered this bug a few times too. I should see if rdist hangs
too....
All other cases of writes in rsh and kshd ought to be checked to make
sure they can't block the flow of data in another direction.
Show quoted text
>How-To-Repeat:
Run
(sleep 15 ; cat /somebigfile) \
| rsh somehost cat /somebigfile \
| (sleep 30 ; cat > /dev/null)
between two machines on a fast network. (Fast enough that the various
kernel buffers can be filled in the specified delays.)
Show quoted text
>Fix:
IMNSHO kshd (and rsh) should be using non-blocking I/O for any writes.
If the child isn't reading, let the pipe fill, stop reading from the
net, let that buffer fill, and let the kernel throttle the TCP
transmission. If a deadlock still results, *then* it's an application
bug.
Show quoted text
>Audit-Trail:
From: Tami King <tking@globe.acs.ohio-state.edu>
To: krb5-bugs@MIT.EDU
Cc: Subject: Re: pending/463: Kerberos rshd deadlock
Date: Mon, 25 Aug 1997 09:41:11 -0400 (EDT)
I am out of the office until August 25. I will read your email when I
return. If you need assistance before then, please contact Jim Butler
(butler.33@osu.edu) or Vickie Starbuck (starbuck.1@osu.edu).
------------------------------------------------------------------------------
Tamara I. King ---- __o University Technology Services
King.281@osu.edu ---- _`\<,_ The Ohio State University
------------------------------ (*)/ (*) ------------------------------------
Responsible-Changed-From-To: gnats-admin->tlyu
Responsible-Changed-By: tlyu
Responsible-Changed-When: Mon Sep 1 22:09:57 1997
Responsible-Changed-Why:
Refiled
From: Ken Raeburn <raeburn@MIT.EDU>
To: krb5-bugs@MIT.EDU
Cc: Subject: krb5-appl/463: kshd hangs with write deadlock
Date: Wed, 7 Feb 2001 13:06:17 -0500 (EST)
For the record, this bug is still present in MIT's post-1.2 release.
And it appears to affect rsync in a bad, bad way, such that I pretty
much have to use ssh instead for large rsync jobs.
Show quoted text
>Unformatted: