[GTALUG] SSD wear leveling [was Re: Build critique request and the story behind it.]
D. Hugh Redelmeier
hugh at mimosa.com
Mon Nov 20 15:44:37 EST 2017
| From: Lennart Sorensen via talk <talk at gtalug.org>
| Wear leveling ought to mean it doesn't matter how you partition though.
| If you want to reduce wear, don't write to it.
True, but it is more complicated.
Underneath the facade of a normal HDD, an SSD does a bunch of tricky
things.
Terminology (mine):
virtual block: what the disk host adapter and OS sees. Just like a
block on an HDD.
real block: a chunk of flash that can hold one virtual block
erase block: the smallest unit of flash that can be erased.
- an erase block contains a lot of real blocks. Think roughly a
megabyte. The collection of real blocks within it is fixed.
- only erased real blocks can be written to. And only once before
they are erased again.
- real blocks can be in one of three states:
+ free (not representing any virtual block but not erased)
+ erased (not representing any virtual block, erased)
Note: an erased real block is not an erase block (it will
be inside an erase block).
+ in-use (representing a virtual block)
- in the real hardware, you can never update a block in place. So
when a program writes to a virtual block, a real, erased block is
written and some book-keeping is done.
If the write was to a virtual block that was represented by a real
block, that real block becomes free: there is no way for the
computer to reference it, so it need not be preserved.
- the SSD firmware keeps track of erased blocks. When it
runs out, it does a garbage collect phase to find unused blocks. If
it finds that a whole erase-block is full of free blocks, it will
erase that block and add it to the free pool.
But that isn't normal. Normally, an erase block is like swiss
cheese and the good stuff has to be moved to an erased block to allow
their former erase block to be erased.
As you can see, a write to a block might precipitate as much as 1MiB
of actual writes. That's called "write amplification" and it is
can wear out SSDs quite seriously. And it will slow things down a
lot.
- how does the drive firmware learn that a physical block is free?
+ a block on an SSD is born free
+ a write to a virtual block will cause a write to a newly allocated
physical block AND implicitly make the old physical block free (but
not erased!)
+ deleting a file on an SSD causes its virtual blocks to be free,
but the SSD firmware does not know that until a trim command
tells it.
Consequences:
- having a lot of free physical blocks cuts down on write amplification
- the effect is non-linear
- to increase the number of free blocks
+ use trim
* fstrim(8)
* trimm option to mount
+ allocate less of the disk drive for OS use.
But, if it isn't a new disk, you have to tell the SDD firmware
that the free space is free. I don't know how to do that.
More information about the talk
mailing list