How To Distinguish Between A Crash and A Graceful Reboot in RHEL 7 or RHEL 8

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

How to distinguish between a crash and a

graceful reboot in RHEL 7 or RHEL 8


Updated September 1 2020 at 7:19 PM - 
English 

How can you distinguish between a system crash and a graceful reboot or shutdown in
RHEL 7 or RHEL 8? This article outlines 4 approaches:

1. Inspect wtmp with last -x


2. Inspect auditd logs with ausearch
3. Requires configuration: Create a custom service unit
4. Requires configuration: Inspect previous boots in persistent systemd journal with
journalctl

(1) Inspect wtmp with last -x


With a simple last -Fxn2 shutdown reboot command, the system wtmp file reports the
two most recent shutdowns or reboots. reboot denotes the system booting up;
whereas, shutdown denotes the system going down.

A graceful shutdown would show up as a reboot line followed by shutdown line, as in


the following example:

~]# last -Fxn2 shutdown reboot


reboot system boot 4.18.0-80.el8.x8 Mon Aug 31 06:33:11 2020 still running
shutdown system down 4.18.0-80.el8.x8 Mon Aug 31 06:33:01 2020 - Mon Aug 31 06:33:11
2020 (00:00)

Note: events from  last are printed in descending chronological order, with most recent
at the top.

An ungraceful shutdown can be inferred by the omission of shutdown; instead there


will either be a single reboot line (if the wtmp file had been truncated/rotated prior to the
crash) or 2 reboot lines in a row, as in this example:

~]# last -Fxn2 shutdown reboot


reboot system boot 4.18.0-147.5.1.e Tue Sep 1 07:16:25 2020 still running
reboot system boot 4.18.0-147.5.1.e Mon Aug 3 07:10:56 2020 still running
(2) Inspect auditd logs with ausearch
auditd is great and all the different events that it logs can be seen by checking ausearch
-m. Apropos to the problem at hand, it logs system shutdown and system boot as above.
The command ausearch -i -m system_boot,system_shutdown | tail -4 will report
the 2 most recent shutdowns or boots. If this reports a SYSTEM_SHUTDOWN followed
by a SYSTEM_BOOT, all is well; however, if it reports 2 SYSTEM_BOOT lines in a row
or only a single SYSTEM_BOOT line, then the system did not shutdown gracefully.

Graceful shutdown:

~]# ausearch -i -m system_boot,system_shutdown | tail -4


----
type=SYSTEM_SHUTDOWN msg=audit(08/31/2020 06:33:01.571:595) : pid=27156 uid=root
auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp
exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'
----
type=SYSTEM_BOOT msg=audit(08/31/2020 06:33:12.838:9) : pid=828 uid=root auid=unset
ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp
exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'

Note: as the timestamps should make clear, events from  ausearch are printed in
ascending chronological order, with oldest at the top.

Ungraceful shutdown:

~]# ausearch -i -m system_boot,system_shutdown | tail -4


----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:10:32.392:7) : pid=657 uid=root auid=unset
ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp
exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:11:41.134:7) : pid=656 uid=root auid=unset
ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp
exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'

Another ungraceful shutdown:


Presence of only one SYSTEM_BOOT record could be explained by the system being
up for so long prior to the crash that audit logs of the previous reboot had been rotated
out ... so that the only result is from when the system was just booted.

~]# ausearch -i -m system_boot,system_shutdown | tail -4


----
type=SYSTEM_BOOT msg=audit(09/01/2020 07:16:27.069:10) : pid=1057 uid=root auid=unset
ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp
exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'

(3) Create a custom service unit


Note: If you're trying to diagnose a potential crash right now, this will not help. You need
to set it up first.

This approach is great because it allows for complete control. Here's an example of how
to do it.

1. Create a service that runs only at shutdown


(Optionally customize the service name and the graceful_shutdown file)
2. ~]# cat /etc/systemd/system/set_gracefulshutdown.service
3. [Unit]
4. Description=Set flag for graceful shutdown
5. DefaultDependencies=no
6. RefuseManualStart=true
7. Before=shutdown.target
8.
9. [Service]
10. Type=oneshot
11. ExecStart=/bin/touch /root/graceful_shutdown
12.
13. [Install]
14. WantedBy=shutdown.target
15.
16. ~]# systemctl daemon-reload
17. ~]# systemctl enable set_gracefulshutdown
18. Create a service that runs only at startup and only IF the graceful_shutdown file
created by the above service exists
(Optionally customize the service name and ensure the graceful_shutdown file
matches the above service)
19. ~]# cat /etc/systemd/system/check_graceful.service
20. [Unit]
21. Description=Check if previous system shutdown was graceful
22. ConditionPathExists=/root/graceful_shutdown
23. RefuseManualStart=true
24. RefuseManualStop=true
25.
26. [Service]
27. Type=oneshot
28. RemainAfterExit=true
29. ExecStart=/bin/rm /root/graceful_shutdown
30.
31. [Install]
32. WantedBy=multi-user.target
33.
34. ~]# systemctl daemon-reload
35. ~]# systemctl enable check_graceful
36. Any time after a graceful reboot, systemctl is-active check_graceful would be
able to confirm the previous reboot was graceful.
Example output:
37. ~]# systemctl is-active check_graceful && echo GOOD || echo BAD
38. active
39. GOOD
40.
41. ~]# systemctl status check_graceful
42. ● check_graceful.service - Check if system booted after a graceful shutdown
43. Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor
preset: disabled)
44. Active: active (exited) since Tue 2016-09-20 01:10:32 EDT; 20s ago
45. Process: 669 ExecStart=/bin/rm /root/graceful_shutdown (code=exited,
status=0/SUCCESS)
46. Main PID: 669 (code=exited, status=0/SUCCESS)
47. CGroup: /system.slice/check_graceful.service
48.
49. Sep 20 01:10:32 a72.example.com systemd[1]: Starting Check if system booted
after a graceful shutdown...
50. Sep 20 01:10:32 a72.example.com systemd[1]: Started Check if system booted
after a graceful shutdown.
51. After a crash or otherwise ungraceful shutdown, the following would be seen:
52. ~]# systemctl is-active check_graceful && echo GOOD || echo BAD
53. inactive
54. BAD
55.
56. ~]# systemctl status check_graceful
57. ● check_graceful.service - Check if system booted after a graceful shutdown
58. Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor
preset: disabled)
59. Active: inactive (dead)
60. Condition: start condition failed at Tue 2016-09-20 01:11:41 EDT; 16s ago
61. ConditionPathExists=/root/graceful_shutdown was not met
62.
63. Sep 20 01:11:41 a72.example.com systemd[1]: Started Check if system booted
after a graceful shutdown.

(4) Inspect previous boots in persistent systemd journal


with journalctl
Note: If you're trying to diagnose a potential crash right now, this will not help unless
you have previously configured systemd to persist the journal to disk.
1. Configure systemd-journald to keep a persistent journal on-disk
Either update /etc/systemd/journald.conf or create the dir yourself as follows
2. # Create standard log dir and fix ownership/perms
3. ~]# mkdir /var/log/journal; systemd-tmpfiles --create --prefix
/var/log/journal 2>/dev/null
4.
5. # Next: tell systemd to flush the current journal to disk
6. ~]# systemctl -s SIGUSR1 kill systemd-journald
7.
8. # OPTIONAL: reboot not required other than to give the following commands more
than one boot to inspect
9. ~]# reboot
10. Optionally use journalctl --list-boots to get a list of boots in ascending
chronological order
0 refers to current runtime logs since the system was booted; -1 covers logs from
the previous boot; -2 the boot before that, etc
Example:
11. ~]# journalctl --list-boots
12. -2 e1dbd8f133f643d1a816605d96f3ca07 Fri 2020-03-27 22:31:25 UTC—Thu 2020-05-14
01:02:51 UTC
13. -1 1969253689e842deaea06ca32f4650c7 Thu 2020-05-14 01:10:00 UTC—Thu 2020-06-04
08:29:42 UTC
14. 0 26a4a2ff48594778850d917a7e2ad195 Tue 2020-09-01 07:16:20 UTC—Tue 2020-09-01
19:11:20 UTC
15. Use journalctl -b -1 -n to look at the last 10 lines of the previous boot
The following example output shows that the previous system reboot was
graceful
16. ~]# journalctl -b -1 -n
17. -- Logs begin at Tue 2016-09-20 01:01:15 EDT, end at Tue 2016-09-20 01:21:33
EDT. --
18. Sep 20 01:21:19 a72.example.com systemd[1]: Stopped Create Static Device Nodes
in /dev.
19. Sep 20 01:21:19 a72.example.com systemd[1]: Stopping Create Static Device
Nodes in /dev...
20. Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Shutdown.
21. Sep 20 01:21:19 a72.example.com systemd[1]: Starting Shutdown.
22. Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Final Step.
23. Sep 20 01:21:19 a72.example.com systemd[1]: Starting Final Step.
24. Sep 20 01:21:19 a72.example.com systemd[1]: Starting Reboot...
25. Sep 20 01:21:19 a72.example.com systemd[1]: Shutting down.
26. Sep 20 01:21:19 a72.example.com systemd-shutdown[1]: Sending SIGTERM to
remaining processes...
27. Sep 20 01:21:19 a72.example.com systemd-journal[483]: Journal stopped

Note from the author: In my experiences troubleshooting RHEL 7 problems for


customers in Red Hat support (in the years leading up to 2016 when I wrote this article),
this was somewhat less reliable than the other methods. When bad things happen, it
was definitely possible for the indexing in journald to get so bad that the  journalctl -b
-1 command only gives an error. I'm unsure if this has been improved in later versions
of RHEL 7 and RHEL 8.

You might also like