[GTALUG] war story: power failure makes CentOS system unbootable
D. Hugh Redelmeier
hugh at mimosa.com
Sun Aug 8 09:47:42 EDT 2021
I'm replying to a really old posting I made.
[Mystery: I'm replying to the copy I sent. Formatting is screwed up when
I reply to the copy I got back from the list server.]
[I'm top posting because I'm just including it whole, for reference. I'm
sure most of you have deleted it from you mailboxes after 4 years.]
I followed the instructions on how to make /boot/esp mounted on demand,
but I still have problems with my CentOS box not rebooting after a
crash because the ESP
is marked as dirty.
"journalctl -b" says that it is mounting /boot/efi because packagekitd
is requesting it
Aug 07 10:38:35 reddoor.mimosa.com PackageKit[1855]: daemon start
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Got automount request for /boot/efi, triggered by 1855 (packagekitd)
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Starting File System Check on /dev/disk/by-uuid/A9B4-1C74...
Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: fsck.fat 3.0.20 (12 Jun 2013)
Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: /dev/sda1: 25 files, 2913/51145 clusters
Aug 07 10:38:35 reddoor.mimosa.com chronyd[724]: Selected source 207.34.48.31
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Started File System Check on /dev/disk/by-uuid/A9B4-1C74.
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Mounting /boot/efi...
Aug 07 10:38:36 reddoor.mimosa.com spice-vdagent[1942]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: Entering running state
Aug 07 10:38:36 reddoor.mimosa.com gnome-session[1460]: gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
Aug 07 10:38:36 reddoor.mimosa.com systemd[1]: Mounted /boot/efi.
Fix: why the heck am I running packagekit? Stop that.
$ sudo systemctl disable packagekit
$ sudo systemctl stop packagekit
$ sudo umount /boot/efi
See https://bugzilla.redhat.com/show_bug.cgi?id=1991228
| From: D. Hugh Redelmeier <hugh at mimosa.com>
| To: Toronto Linux Users Group <talk at gtalug.org>
| Date: Thu, 6 Jul 2017 13:12:11 -0400 (EDT)
| Subject: war story: power failure makes CentOS system unbootable
|
| tl;dr / spoiler: change fstab so that /boot/efi is not automatically
| mounted. See the recommendation at the end of this message.
|
| My gateway computer is a little PC running CentOS.
| It does not come back after a power failure.
| The reason (as best I can tell) is interesting and I think that I have a
| fix.
|
| My system is UEFI. It boots from a UEFI system partition, known as
| /boot/efi to Linux. If this gets corrupted, it won't boot. It is a
| VFAT partition.
|
| On my gateway (a Zotac Zbox RI321, a cute little box with two ethernet
| ports), the UEFI firmware apparently won't boot if the dirty bit is
| set in the system partition.
|
| CentOS normally runs with /boot/efi mounted. So when the system isn't
| shut down cleanly, the dirty bit will be on.
|
| Consequence: the system is not going to boot after a power failure.
|
| Odd observation 1: if I put a live Fedora 25 USB stick in the system
| and try to boot in this otherwise-unbootable state, CentOS boots from
| the main disk. So this really looks like a firmware bug/quirk.
|
| Odd observation 2: fsck doesn't seem to automatically fix the system
| partition. Once CentOS 7 booted, I dismounted /boot/efi and
| did an fsck on it.
|
| $ sudo fsck /dev/sda1
| fsck from util-linux 2.23.2
| fsck.fat 3.0.20 (12 Jun 2013)
| 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
| 1) Remove dirty bit
| 2) No action
| ? 1
| Leaving filesystem unchanged.
| /dev/sda1: 16 files, 2420/51145 clusters
|
| Googling got me to <https://www.centos.org/forums/viewtopic.php?t=50917>
| In particular, this advice seemed quite good:
|
| ==== recommendation ====
|
| Two other changes I recommend for UEFI systems, to each OS's /etc/fstab.
|
| - For the /boot/efi mountpoint, add the mount options
| x-systemd.automount,noauto
|
| - Change fs_passno (last column) to a 1 or 2; the canonical fstab
| instructions suggest 2, but systemd treats 1 and 2 as the same.
|
| The first change means the EFI System partition will not be automatically
| read-write mounted at boot time; it's a bad idea this is the default
| because it puts the ESP at risk especially if ther are crashes, FAT has no
| journal and will therefore always be marked dirty in such a case; no other
| UEFI OS mounts the ESP by default. Second, if anything tries to access
| /boot/efi (read or write), systemd will automatically mount it, and
| because of the fs_passno 1 or 2, it will fsck it first and that fixes and
| clears the dirty bit in case it's set.
|
| Right now without these changes, it's just a matter of having the right
| number and bad timing of crashes to render the EFI System partition
| corrupt.
|
More information about the talk
mailing list