Bugzilla – Bug 954
Sudo causing Ansible connections to hang with RHEL 8
Last modified: 2021-01-26 11:14:18 MST
When using Ansible to access RHEL 8 hosts, connections occasionally hang, causing stale ssh processes to peg CPU on target machines. We believe we have narrowed it down to an issue with sudo. 1. disabling the sudo log plugin avoids the issue 2. disabling pipelining in Ansible avoids the issue
We cannot reproduce this using normal sudo commands. Meaning: I can login to a host and run "sudo echo hello" 1000x and itll work every time. But if i run an ansible playbook with 20-30 steps - im almost guaranteed to get a hang before it finishes - though it's intermittent and "random" in where it fails. Example of a "hung" sudo process eating up CPU and causing the box to run slowly looks like this: [badams2@mytestbox~]$ ps auxx | grep BECOME linux_e+ 4163824 0.0 0.0 12696 3016 ? Ss 09:34 0:00 /bin/sh -c sudo -H -S -n -u root /bin/sh -c 'echo BECOME-SUCCESS-nhyfxgrmnxwdljipmybwetousznismgr ; /usr/bin/python' && sleep 0 root 4163847 98.9 0.0 124824 7460 ? R 09:34 3:16 sudo -H -S -n -u root /bin/sh -c echo BECOME-SUCCESS-nhyfxgrmnxwdljipmybwetousznismgr ; /usr/bin/python badams2 4164562 0.0 0.0 12108 1088 pts/0 R+ 09:37 0:00 grep --color=auto BECOME The sudo process (when ansible makes it hang) will never quit/exit on its own. If we get enough hung like that - the box will become exhuasted/unconnectable - and require a reboot to resolve.
Some questions: 1) Does your sudoers file set the log_input or log_output flags? If not, I don't see why disabling the I/O plugin would make a difference. 2) Do you know if Ansible ssh pipelining allocates a pty? To help debug this you could add a line like the following to /etc/sudo.conf: Debug sudo /var/log/sudo_debug all@debug Or, to just do exec debugging (which is probably where this is occuring) Debug sudo /var/log/sudo_debug exec@debug If you can attach a sudo debug log that will help greatly in debugging the problem.
Created attachment 548 [details] sudo debug
Todd, I've attached a debug of the issue (I had to compress it due to it's file size). The attached debug should only contain sudo invocations invoked by Ansible including the latest entries within the file. Note that I had to compress it due to it's file size. Best, Aaron
Thanks. Is your sudoers configured to log sessions to sudo_logsrvd?
Todd, Yes the following bits are included regarding logging in /etc/sudoers: Defaults log_servers = fqdnn_of_logserver:30343 Defaults ignore_iolog_errors Best, Aaron
The high CPU load is related to the connection to the log server. Something is causing the log server client event to fire continuously and it is not being cleared. I don't see why that would be but at least I know where to start looking.
Todd, Would sending along our log_server config help? Best, Aaron
I don't think I need to see the server logs, this looks like a client-side bug. TLS is weird in that you can need to do a read as part of a write and vice versa. There is a bug in the handling of this case.
I believe this is fixed by the following commit: https://www.sudo.ws/repos/sudo/rev/e4239bb932aa However, because it is hard to reproduce I can't say for sure.
Todd, I've compiled a new RHEL 8 RPM from the 1.9.5-2 source but also included the updated log_client.c from your commit. We'll test shortly and let you know if we this helps. Best, Aaron
This bug may be fixed by https://www.sudo.ws/repos/sudo/rev/b398dcc0933d
Submitter confirmed that this is fixed by https://www.sudo.ws/repos/sudo/rev/b398dcc0933d.
Fixed in sudo 1.9.5p2, available now.