[GTALUG] war story: power failure makes CentOS system unbootable

D. Hugh Redelmeier hugh at mimosa.com
Sun Aug 8 09:47:42 EDT 2021


I'm replying to a really old posting I made.

[Mystery: I'm replying to the copy I sent.  Formatting is screwed up when 
I reply to the copy I got back from the list server.]

[I'm top posting because I'm just including it whole, for reference.  I'm 
sure most of you have deleted it from you mailboxes after 4 years.]

I followed the instructions on how to make /boot/esp mounted on demand, 
but I still have problems with my CentOS box not rebooting after a 
crash because the ESP 
is marked as dirty.

"journalctl -b" says that it is mounting /boot/efi because packagekitd
is requesting it

Aug 07 10:38:35 reddoor.mimosa.com PackageKit[1855]: daemon start
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Got automount request for /boot/efi, triggered by 1855 (packagekitd)
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Starting File System Check on /dev/disk/by-uuid/A9B4-1C74...
Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: fsck.fat 3.0.20 (12 Jun 2013)
Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: /dev/sda1: 25 files, 2913/51145 clusters
Aug 07 10:38:35 reddoor.mimosa.com chronyd[724]: Selected source 207.34.48.31
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Started File System Check on /dev/disk/by-uuid/A9B4-1C74.
Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Mounting /boot/efi...
Aug 07 10:38:36 reddoor.mimosa.com spice-vdagent[1942]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: Entering running state
Aug 07 10:38:36 reddoor.mimosa.com gnome-session[1460]: gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
Aug 07 10:38:36 reddoor.mimosa.com systemd[1]: Mounted /boot/efi.

Fix: why the heck am I running packagekit?  Stop that.

 $ sudo systemctl disable packagekit
 $ sudo systemctl stop packagekit
 $ sudo umount /boot/efi

See https://bugzilla.redhat.com/show_bug.cgi?id=1991228

| From: D. Hugh Redelmeier <hugh at mimosa.com>
| To: Toronto Linux Users Group <talk at gtalug.org>
| Date: Thu, 6 Jul 2017 13:12:11 -0400 (EDT)
| Subject: war story: power failure makes CentOS system unbootable
| 
| tl;dr / spoiler: change fstab so that /boot/efi is not automatically
| mounted.  See the recommendation at the end of this message.
| 
| My gateway computer is a little PC running CentOS.
| It does not come back after a power failure.
| The reason (as best I can tell) is interesting and I think that I have a 
| fix.
| 
| My system is UEFI.  It boots from a UEFI system partition, known as
| /boot/efi to Linux.  If this gets corrupted, it won't boot.  It is a
| VFAT partition.
| 
| On my gateway (a Zotac Zbox RI321, a cute little box with two ethernet
| ports), the UEFI firmware apparently won't boot if the dirty bit is
| set in the system partition.
| 
| CentOS normally runs with /boot/efi mounted.  So when the system isn't 
| shut down cleanly, the dirty bit will be on.
| 
| Consequence: the system is not going to boot after a power failure.
| 
| Odd observation 1: if I put a live Fedora 25 USB stick in the system
| and try to boot in this otherwise-unbootable state, CentOS boots from
| the main disk.  So this really looks like a firmware bug/quirk.
| 
| Odd observation 2: fsck doesn't seem to automatically fix the system 
| partition.  Once CentOS 7 booted, I dismounted /boot/efi and
| did an fsck on it.
| 
|     $ sudo fsck /dev/sda1
|     fsck from util-linux 2.23.2
|     fsck.fat 3.0.20 (12 Jun 2013)
|     0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
|     1) Remove dirty bit
|     2) No action
|     ? 1
|     Leaving filesystem unchanged.
|     /dev/sda1: 16 files, 2420/51145 clusters
| 
| Googling got me to <https://www.centos.org/forums/viewtopic.php?t=50917>
| In particular, this advice seemed quite good:
| 
| ==== recommendation ====
| 
| Two other changes I recommend for UEFI systems, to each OS's /etc/fstab.
| 
| - For the /boot/efi mountpoint, add the mount options 
|   x-systemd.automount,noauto
| 
| - Change fs_passno (last column) to a 1 or 2; the canonical fstab 
|   instructions suggest 2, but systemd treats 1 and 2 as the same.
| 
| The first change means the EFI System partition will not be automatically 
| read-write mounted at boot time; it's a bad idea this is the default 
| because it puts the ESP at risk especially if ther are crashes, FAT has no 
| journal and will therefore always be marked dirty in such a case; no other 
| UEFI OS mounts the ESP by default. Second, if anything tries to access 
| /boot/efi (read or write), systemd will automatically mount it, and 
| because of the fs_passno 1 or 2, it will fsck it first and that fixes and 
| clears the dirty bit in case it's set.
| 
| Right now without these changes, it's just a matter of having the right 
| number and bad timing of crashes to render the EFI System partition 
| corrupt.
| 


More information about the talk mailing list