[GTALUG] btrfs weirdity.

D. Hugh Redelmeier hugh at mimosa.com
Tue Jun 30 04:53:02 EDT 2020


Warning: it is the middle of the night and I'm going to ramble.

| From: Dhaval Giani via talk <talk at gtalug.org>

| People love talking smack about btrfs. Here's some real insight from Josef
| Bacik though at
| https://lwn.net/ml/fedora-devel/03fbbb9a-7e74-fc49-c663-32722d6f75a2@toxicpanda.com/
| and
| https://lwn.net/ml/fedora-devel/cf5b4944-74c4-b4a4-0d65-71d9821a4a71@toxicpanda.com/.
| Read through the chain. For those who don't know him, Josef is a core btrfs
| developer. I also quite like Josef personally, he is a good developer,
| great to work with and is personally invested in helping folks out.
| 
| This is a solid fs,

Interesting.  I've only read a bit so far.

Generally, I don't wish to be a guinea pig when it comes to file
systems.  So I switch very rarely.  The EXT series has been good to
me.

I never switched to Reiser or XFS, mostly due to my conservatism.  If
I remember correctly, inode numbers in Reiser and perhaps XFS didn't
have the properties I expected (i.e. within a file system, there was a
stable isomorphism between the numbers and the files).

I was going to consider BTRFS when RHEL dropped it.  I took that as a
Bad Sign.  Do you know why they did this?

BTRFS progress has seemed painfully slow.  To me, that's a sign that
it might be too complex.  Complexity is my enemy.

- too many places for bugs

- too much code that isn't tested much

- harder to adjust to changing environments

Why does compression help with write amplification on SSDs?  My
understanding is that SSD firmware compresses already.  Possible
answer: compression of encrypted stuff is ineffective.  But BTRFS
doesn't encrypt (LUKS can do so underneath BTRFS, it seems).

The following are some random thoughts about filesystems.  I'm
interested in any reactions to these.

The UNIX model of a file being a randomly-accessed array of fixed-size
blocks doesn't fit very well with compression.  Even if a large
portion of files are accessed purely as a byte stream.  That's perhaps
a flaw in UNIX but it is tough to change.

In modern systems, with all kinds of crazy containerization, I guess
de-duplication might be very useful.  As well as COW, I think.  Is
this something for the File System, or a layer below, like LVM?

There's something appealing about modularizing the FS code by
composable layers.  But not if the overhead is observable.  Or the
composability leaves rough edges.

Here's a natural order for layers:
	FS (UNIX semantics + ACLS etc, more than just POSIX)
	de-duplication
	compression
	encryption
	aggregation for efficient use of device?

I don't know where to fit in checksums.  Perhaps it's a natural part
of encryption (encryption without integrity checking has interesting
weaknesses).

I don't know how to deal with the variable-sized blocks that come out
of compression.  Hardware has co-evolved with file-systems to expect
blocks of 512 or 4096 bytes.  (I remember IBM/360 disk drives which
supported a range of block sizes as if each track was a short piece of
magnetic tape.)

I don't know how to have file systems more respectfully reflect the
underlying nature of SSDs and shingled HDDs

I also am still waiting for translucent mounts like Plan 9.

I think that many or most drives do whole-volume encryption invisible
to the OS.  This really isn't useful to the OS since the whole volume
has a single key.

The most secure encryption is end-to-end.  It tends to be less
convenient.  Maybe my placement of encryption near the bottom of the
stack isn't good enough.

I have more questions than answers (or even opinions) and my systems
are modest, so I stay with EXT4 directly on old fashioned (GUID)
partitions.

| there are times when I see the reactions from users and
| I really wonder why I do open source work anyway.

I'd like to understand this comment better.

I work on open source and don't remember feeling anything like that
from comments in this thread.  Other kinds of comments, perhaps.

| I have lost data on all
| sorts of filesystems, and I have also been at the end of stupid mistakes
| made. Almost everything is recoverable if you stop, and ask for help and
| keep your ego out of it.

More concretely, stop, make an image copy on the disk, and experiment
with that.  Ones first actions in a disaster may well compound it.

But I find filesystems and editors that lose your data are
unforgivable.  Sometimes it's just a filesystem that is fragile in the
face of hardware errors.


More information about the talk mailing list