[GTALUG] SSD wear leveling [was Re: Build critique request and the story behind it.]

Jamon Camisso jamon.camisso at utoronto.ca
Mon Nov 20 23:03:34 EST 2017


On 2017-11-20 03:44 PM, D. Hugh Redelmeier via talk wrote:
> True, but it is more complicated.
> 
> Underneath the facade of a normal HDD, an SSD does a bunch of tricky
> things.
> 
> Terminology (mine):
> 
> virtual block: what the disk host adapter and OS sees.  Just like a
> block on an HDD.
> 
> real block: a chunk of flash that can hold one virtual block
> 
> erase block: the smallest unit of flash that can be erased.
> 
> - an erase block contains a lot of real blocks.  Think roughly a
>   megabyte.  The collection of real blocks within it is fixed.
> 
> - only erased real blocks can be written to.  And only once before
>   they are erased again.
> 
> - real blocks can be in one of three states:
> 
>   + free (not representing any virtual block but not erased)
> 
>   + erased (not representing any virtual block, erased)
>     Note: an erased real block is not an erase block (it will
>     be inside an erase block).
> 
>   + in-use (representing a virtual block)
> 
> - in the real hardware, you can never update a block in place.  So
>   when a program writes to a virtual block, a real, erased block is
>   written and some book-keeping is done.
> 
>   If the write was to a virtual block that was represented by a real
>   block, that real block becomes free: there is no way for the
>   computer to reference it, so it need not be preserved.
> 
> - the SSD firmware keeps track of erased blocks.  When it
>   runs out, it does a garbage collect phase to find unused blocks.  If
>   it finds that a whole erase-block is full of free blocks, it will
>   erase that block and add it to the free pool.
> 
>   But that isn't normal.  Normally, an erase block is like swiss
>   cheese and the good stuff has to be moved to an erased block to allow
>   their former erase block to be erased.
> 
>   As you can see, a write to a block might precipitate as much as 1MiB
>   of actual writes.  That's called "write amplification" and it is
>   can wear out SSDs quite seriously.  And it will slow things down a
>   lot.
> 
> - how does the drive firmware learn that a physical block is free?
> 
>   + a block on an SSD is born free
> 
>   + a write to a virtual block will cause a write to a newly allocated
>     physical block AND implicitly make the old physical block free (but
>     not erased!)
> 
>   + deleting a file on an SSD causes its virtual blocks to be free,
>     but the SSD firmware does not know that until a trim command
>     tells it.
> 
> Consequences:
> 
> - having a lot of free physical blocks cuts down on write amplification
> 
> - the effect is non-linear
> 
> - to increase the number of free blocks
> 
>   + use trim
>     * fstrim(8)
>     * trimm option to mount
> 
>   + allocate less of the disk drive for OS use.
>     But, if it isn't a new disk, you have to tell the SDD firmware
>     that the free space is free.  I don't know how to do that.

One of the better write ups that I've seen about SSDs in general and and
over-provisioning specifically:

https://www.seagate.com/ca/en/tech-insights/ssd-over-provisioning-benefits-master-ti/

Cheers, Jamon


More information about the talk mailing list