Bug 954 - Sudo causing Ansible connections to hang with RHEL 8
Sudo causing Ansible connections to hang with RHEL 8
Status: RESOLVED FIXED
Product: Sudo
Classification: Unclassified
Component: Sudo
1.9.4
Other Linux
: low normal
Assigned To: Todd C. Miller
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2021-01-20 08:41 MST by infrastructuresystems
Modified: 2021-01-26 11:14 MST (History)
1 user (show)

See Also:


Attachments
sudo debug (517.23 KB, application/x-gzip)
2021-01-20 10:08 MST, abliss1
Details

Note You need to log in before you can comment on or make changes to this bug.
Description infrastructuresystems 2021-01-20 08:41:57 MST
When using Ansible to access RHEL 8 hosts, connections occasionally hang, causing stale ssh processes to peg CPU on target machines.

We believe we have narrowed it down to an issue with sudo.

1. disabling the sudo log plugin avoids the issue
2. disabling pipelining in Ansible avoids the issue
Comment 1 infrastructuresystems 2021-01-20 08:46:14 MST
We cannot reproduce this using normal sudo commands. Meaning:
I can login to a host and run "sudo echo hello" 1000x and itll work every time.

But if i run an ansible playbook with 20-30 steps - im almost guaranteed to get a hang before it finishes - though it's intermittent and "random" in where it fails.

Example of a "hung" sudo process eating up CPU and causing the box to run slowly looks like this:

[badams2@mytestbox~]$  ps auxx | grep BECOME
linux_e+ 4163824  0.0  0.0  12696  3016 ?        Ss   09:34   0:00 /bin/sh -c sudo -H -S -n  -u root /bin/sh -c 'echo BECOME-SUCCESS-nhyfxgrmnxwdljipmybwetousznismgr ; /usr/bin/python' && sleep 0
root     4163847 98.9  0.0 124824  7460 ?        R    09:34   3:16 sudo -H -S -n -u root /bin/sh -c echo BECOME-SUCCESS-nhyfxgrmnxwdljipmybwetousznismgr ; /usr/bin/python
badams2  4164562  0.0  0.0  12108  1088 pts/0    R+   09:37   0:00 grep --color=auto BECOME

The sudo process (when ansible makes it hang) will never quit/exit on its own. If we get enough hung like that - the box will become exhuasted/unconnectable - and require a reboot to resolve.
Comment 2 Todd C. Miller 2021-01-20 08:49:55 MST
Some questions:

1) Does your sudoers file set the log_input or log_output flags?  If not, I don't see why disabling the I/O plugin would make a difference.

2) Do you know if Ansible ssh pipelining allocates a pty?

To help debug this you could add a line like the following to /etc/sudo.conf:

Debug sudo /var/log/sudo_debug all@debug

Or, to just do exec debugging (which is probably where this is occuring)

Debug sudo /var/log/sudo_debug exec@debug

If you can attach a sudo debug log that will help greatly in debugging the problem.
Comment 3 abliss1 2021-01-20 10:08:25 MST
Created attachment 548 [details]
sudo debug
Comment 4 abliss1 2021-01-20 10:11:34 MST
Todd,
I've attached a debug of the issue (I had to compress it due to it's file size). The attached debug should only contain sudo invocations invoked by Ansible including the latest entries within the file.  Note that I had to compress it due to it's file size.

Best,
Aaron
Comment 5 Todd C. Miller 2021-01-20 10:18:19 MST
Thanks.  Is your sudoers configured to log sessions to sudo_logsrvd?
Comment 6 abliss1 2021-01-20 10:20:50 MST
Todd,
Yes the following bits are included regarding logging in /etc/sudoers:

Defaults    log_servers = fqdnn_of_logserver:30343
Defaults    ignore_iolog_errors

Best,
Aaron
Comment 7 Todd C. Miller 2021-01-20 10:28:05 MST
The high CPU load is related to the connection to the log server.  Something is causing the log server client event to fire continuously and it is not being cleared.  I don't see why that would be but at least I know where to start looking.
Comment 8 abliss1 2021-01-20 10:32:10 MST
Todd,
Would sending along our log_server config help?

Best,
Aaron
Comment 9 Todd C. Miller 2021-01-20 11:11:40 MST
I don't think I need to see the server logs, this looks like a client-side bug.  TLS is weird in that you can need to do a read as part of a write and vice versa.  There is a bug in the handling of this case.
Comment 10 Todd C. Miller 2021-01-20 11:47:31 MST
I believe this is fixed by the following commit:
https://www.sudo.ws/repos/sudo/rev/e4239bb932aa

However, because it is hard to reproduce I can't say for sure.
Comment 11 abliss1 2021-01-20 12:24:59 MST
Todd,

I've compiled a new RHEL 8 RPM from the 1.9.5-2 source but also included the updated log_client.c from your commit.  We'll test shortly and let you know if we this helps. 

Best,
Aaron
Comment 12 Todd C. Miller 2021-01-21 13:33:24 MST
This bug may be fixed by https://www.sudo.ws/repos/sudo/rev/b398dcc0933d
Comment 13 Todd C. Miller 2021-01-26 10:17:32 MST
Submitter confirmed that this is fixed by https://www.sudo.ws/repos/sudo/rev/b398dcc0933d.
Comment 14 Todd C. Miller 2021-01-26 11:14:18 MST
Fixed in sudo 1.9.5p2, available now.