[GTALUG] SSD wear leveling [was Re: Build critique request and the story behind it.]

D. Hugh Redelmeier hugh at mimosa.com
Mon Nov 20 15:44:37 EST 2017


| From: Lennart Sorensen via talk <talk at gtalug.org>

| Wear leveling ought to mean it doesn't matter how you partition though.
| If you want to reduce wear, don't write to it.

True, but it is more complicated.

Underneath the facade of a normal HDD, an SSD does a bunch of tricky
things.

Terminology (mine):

virtual block: what the disk host adapter and OS sees.  Just like a
block on an HDD.

real block: a chunk of flash that can hold one virtual block

erase block: the smallest unit of flash that can be erased.

- an erase block contains a lot of real blocks.  Think roughly a
  megabyte.  The collection of real blocks within it is fixed.

- only erased real blocks can be written to.  And only once before
  they are erased again.

- real blocks can be in one of three states:

  + free (not representing any virtual block but not erased)

  + erased (not representing any virtual block, erased)
    Note: an erased real block is not an erase block (it will
    be inside an erase block).

  + in-use (representing a virtual block)

- in the real hardware, you can never update a block in place.  So
  when a program writes to a virtual block, a real, erased block is
  written and some book-keeping is done.

  If the write was to a virtual block that was represented by a real
  block, that real block becomes free: there is no way for the
  computer to reference it, so it need not be preserved.

- the SSD firmware keeps track of erased blocks.  When it
  runs out, it does a garbage collect phase to find unused blocks.  If
  it finds that a whole erase-block is full of free blocks, it will
  erase that block and add it to the free pool.

  But that isn't normal.  Normally, an erase block is like swiss
  cheese and the good stuff has to be moved to an erased block to allow
  their former erase block to be erased.

  As you can see, a write to a block might precipitate as much as 1MiB
  of actual writes.  That's called "write amplification" and it is
  can wear out SSDs quite seriously.  And it will slow things down a
  lot.

- how does the drive firmware learn that a physical block is free?

  + a block on an SSD is born free

  + a write to a virtual block will cause a write to a newly allocated
    physical block AND implicitly make the old physical block free (but
    not erased!)

  + deleting a file on an SSD causes its virtual blocks to be free,
    but the SSD firmware does not know that until a trim command
    tells it.

Consequences:

- having a lot of free physical blocks cuts down on write amplification

- the effect is non-linear

- to increase the number of free blocks

  + use trim
    * fstrim(8)
    * trimm option to mount

  + allocate less of the disk drive for OS use.
    But, if it isn't a new disk, you have to tell the SDD firmware
    that the free space is free.  I don't know how to do that.


More information about the talk mailing list