[GTALUG] btrfs weirdity.

Tue Jun 30 09:41:20 EDT 2020

On 6/30/20 4:53 AM, D. Hugh Redelmeier via talk wrote:
> Warning: it is the middle of the night and I'm going to ramble.
>
[snip]
>
> The following are some random thoughts about filesystems.  I'm
> interested in any reactions to these.
>
> The UNIX model of a file being a randomly-accessed array of fixed-size
> blocks doesn't fit very well with compression.  Even if a large
> portion of files are accessed purely as a byte stream.  That's perhaps
> a flaw in UNIX but it is tough to change.
Fixed size disk blocks is not just a UNIX thing.
Admittedly I have not seen all the hardware out there but I do not 
believe that there has been a disk drive that has not been formatted 
with fixed block sizes outside of some very old or very new stuff.

Think of it from a hardware perspective If you have random sized blocks 
you need to manage fragmentation and then likely some form of free space 
clean up.
And that level of compute power was not available in a disk controller 
until fairly recently.
By which time the standard design was so entrenched that any other 
layouts were not gain enough traction to be worth designing and trying 
to sell.

For example there are disk drives with Ethernet interfaces and have a 
native object store.
The Segate Kinetic drives never seemed to get beyond the sampling phase.

> In modern systems, with all kinds of crazy containerization, I guess
> de-duplication might be very useful.  As well as COW, I think.  Is
> this something for the File System, or a layer below, like LVM?
I have a problem with de-duplication.
I am not sure how well it actually works in practice.
At the file system level its just linking of the 2 identical files until 
one is changed so you need COW.

At the block level you have to look at the overhead of the hash function 
and the storage of the hash data.
The size of the hash key relates to the likelihood an error duplicate 
match but the size of the block.
The duplicate blocks still need to be compared causing an extra reads.
Lets say you use SHA-2 for your hash you have a key of 32 bytes if you 
use 512 bytes for your block size then your hash table is about a 6% 
overhead.
If you go for larger blocks then you will get less hits because the 
filesystems want to allocate smaller blocks for small file efficiency.
If you use LVM extents then the hit rate drops even more.

It may work well where you have a large number of VMs where the disk 
images tend to start out all the same  and where the OS will tend to 
stay static leaving large parts of the disk untouched for a long time.
It may also be possible to develop file systems that are amenable to 
de-duplication.

>
> There's something appealing about modularizing the FS code by
> composable layers.  But not if the overhead is observable.  Or the
> composability leaves rough edges.
>
> Here's a natural order for layers:
> 	FS (UNIX semantics + ACLS etc, more than just POSIX)
> 	de-duplication
> 	compression
> 	encryption
> 	aggregation for efficient use of device?
This appears to be what Redhat is pushing with their VDO(Virtual Data 
Optimizer )
>
> I don't know where to fit in checksums.  Perhaps it's a natural part
> of encryption (encryption without integrity checking has interesting
> weaknesses).
nothing beats dd if=/dev/zero of=your_secret_file for security ;)

>
> I don't know how to deal with the variable-sized blocks that come out
> of compression.  Hardware has co-evolved with file-systems to expect
> blocks of 512 or 4096 bytes.  (I remember IBM/360 disk drives which
> supported a range of block sizes as if each track was a short piece of
> magnetic tape.)
Move from disks to object stores(Key/Value).
>
> I don't know how to have file systems more respectfully reflect the
> underlying nature of SSDs and shingled HDDs
>
> I also am still waiting for translucent mounts like Plan 9.
How would translucent mounts compare to overlay mounts?
> I think that many or most drives do whole-volume encryption invisible
> to the OS.  This really isn't useful to the OS since the whole volume
> has a single key.
>
> The most secure encryption is end-to-end.  It tends to be less
> convenient.  Maybe my placement of encryption near the bottom of the
> stack isn't good enough.
I would argue that encryption should be as high in the stack as possible.
encrypting the disk provides "at rest" security so when the drives are 
sold to someone at the bankrupcy sale they cannot use the data.
It does not help the hacker who has gained access to the system from 
dumping the database of credit card info.

[snip]

-- 
Alvin Starr                   ||   land:  (647)478-6285
Netvel Inc.                   ||   Cell:  (416)806-0133
alvin at netvel.net              ||