[GTALUG] war story: power failure makes CentOS system unbootable

D. Hugh Redelmeier hugh at mimosa.com
Thu Sep 29 12:22:08 EDT 2022


I'm continuing a very old thread of mine.
Because of the age, I'm top-posting.

Summary up until now:

- system fails to reboot after power failure

- problem seems to be firmware distaste for dirty bit on ESP (/boot/efi)

- dirty bit always happens because /boot/efi is always mounted, even 
  though it isn't needed 99% of the time

- mounting-on-demand doesn't fix this because packagekit accesses the ESP 
  almost immediately.  This feels like a bug.

- I disabled packagekit to prevent /boot/efi from being automounted.
  Well, it grew back, somehow.  Updates, I guess.

Now:

So now I'll use a different approach.  I'll get automount to also 
autodismount.  Oh, the wonders and mysteries provided by systemd!

1. Modify the /etc/fstab entry for /boot/efi to include 
	"x-systemd.idle-timeout=600"
   This asks for dismounting after 600 seconds of inactivity.

2. Get the relevant daemon(s) to pay attention:
	systemctl daemon-reload
	systemctl restart boot-efi.automount
   Note that boot-efi.automount's name is synthesized automatically from 
   the mount point.  So is the target script itself.

3. Hope this work

See systemd.mount(5).
You can ignore or be confused by systemd.automount(5).


| From: D. Hugh Redelmeier <hugh at mimosa.com>
| Date: Sun, 8 Aug 2021 09:47:42 -0400 (EDT)
| Subject: Re: war story: power failure makes CentOS system unbootable
| 
| I'm replying to a really old posting I made.
| 
| [Mystery: I'm replying to the copy I sent.  Formatting is screwed up when 
| I reply to the copy I got back from the list server.]
| 
| [I'm top posting because I'm just including it whole, for reference.  I'm 
| sure most of you have deleted it from you mailboxes after 4 years.]
| 
| I followed the instructions on how to make /boot/esp mounted on demand, 
| but I still have problems with my CentOS box not rebooting after a 
| crash because the ESP 
| is marked as dirty.
| 
| "journalctl -b" says that it is mounting /boot/efi because packagekitd
| is requesting it
| 
| Aug 07 10:38:35 reddoor.mimosa.com PackageKit[1855]: daemon start
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Got automount request for /boot/efi, triggered by 1855 (packagekitd)
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Starting File System Check on /dev/disk/by-uuid/A9B4-1C74...
| Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: fsck.fat 3.0.20 (12 Jun 2013)
| Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: /dev/sda1: 25 files, 2913/51145 clusters
| Aug 07 10:38:35 reddoor.mimosa.com chronyd[724]: Selected source 207.34.48.31
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Started File System Check on /dev/disk/by-uuid/A9B4-1C74.
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Mounting /boot/efi...
| Aug 07 10:38:36 reddoor.mimosa.com spice-vdagent[1942]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
| Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
| Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: Entering running state
| Aug 07 10:38:36 reddoor.mimosa.com gnome-session[1460]: gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
| Aug 07 10:38:36 reddoor.mimosa.com systemd[1]: Mounted /boot/efi.
| 
| Fix: why the heck am I running packagekit?  Stop that.
| 
|  $ sudo systemctl disable packagekit
|  $ sudo systemctl stop packagekit
|  $ sudo umount /boot/efi
| 
| See https://bugzilla.redhat.com/show_bug.cgi?id=1991228
| 
| | From: D. Hugh Redelmeier <hugh at mimosa.com>
| | To: Toronto Linux Users Group <talk at gtalug.org>
| | Date: Thu, 6 Jul 2017 13:12:11 -0400 (EDT)
| | Subject: war story: power failure makes CentOS system unbootable
| | 
| | tl;dr / spoiler: change fstab so that /boot/efi is not automatically
| | mounted.  See the recommendation at the end of this message.
| | 
| | My gateway computer is a little PC running CentOS.
| | It does not come back after a power failure.
| | The reason (as best I can tell) is interesting and I think that I have a 
| | fix.
| | 
| | My system is UEFI.  It boots from a UEFI system partition, known as
| | /boot/efi to Linux.  If this gets corrupted, it won't boot.  It is a
| | VFAT partition.
| | 
| | On my gateway (a Zotac Zbox RI321, a cute little box with two ethernet
| | ports), the UEFI firmware apparently won't boot if the dirty bit is
| | set in the system partition.
| | 
| | CentOS normally runs with /boot/efi mounted.  So when the system isn't 
| | shut down cleanly, the dirty bit will be on.
| | 
| | Consequence: the system is not going to boot after a power failure.
| | 
| | Odd observation 1: if I put a live Fedora 25 USB stick in the system
| | and try to boot in this otherwise-unbootable state, CentOS boots from
| | the main disk.  So this really looks like a firmware bug/quirk.
| | 
| | Odd observation 2: fsck doesn't seem to automatically fix the system 
| | partition.  Once CentOS 7 booted, I dismounted /boot/efi and
| | did an fsck on it.
| | 
| |     $ sudo fsck /dev/sda1
| |     fsck from util-linux 2.23.2
| |     fsck.fat 3.0.20 (12 Jun 2013)
| |     0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
| |     1) Remove dirty bit
| |     2) No action
| |     ? 1
| |     Leaving filesystem unchanged.
| |     /dev/sda1: 16 files, 2420/51145 clusters
| | 
| | Googling got me to <https://www.centos.org/forums/viewtopic.php?t=50917>
| | In particular, this advice seemed quite good:
| | 
| | ==== recommendation ====
| | 
| | Two other changes I recommend for UEFI systems, to each OS's /etc/fstab.
| | 
| | - For the /boot/efi mountpoint, add the mount options 
| |   x-systemd.automount,noauto
| | 
| | - Change fs_passno (last column) to a 1 or 2; the canonical fstab 
| |   instructions suggest 2, but systemd treats 1 and 2 as the same.
| | 
| | The first change means the EFI System partition will not be automatically 
| | read-write mounted at boot time; it's a bad idea this is the default 
| | because it puts the ESP at risk especially if ther are crashes, FAT has no 
| | journal and will therefore always be marked dirty in such a case; no other 
| | UEFI OS mounts the ESP by default. Second, if anything tries to access 
| | /boot/efi (read or write), systemd will automatically mount it, and 
| | because of the fs_passno 1 or 2, it will fsck it first and that fixes and 
| | clears the dirty bit in case it's set.
| | 
| | Right now without these changes, it's just a matter of having the right 
| | number and bad timing of crashes to render the EFI System partition 
| | corrupt.
| | 
| 


More information about the talk mailing list