How to replace a hard drive...

D. Hugh Redelmeier hugh-pmF8o41NoarQT0dZR+AlfA at public.gmane.org
Sun May 8 21:59:14 UTC 2011


| From: Giles Orr <gilesorr-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org>
| 
| I have a couple of USB flash drives that give me I/O errors whenever I
| copy stuff to a specific spot on the drive.  ie. when I get to the 6Gb
| mark on one, whatever the file is I'm writing to the drive, it fails.

What is the error?  What does the kernel log say (look at the output
of the dmesg command).

I'm not so sure there is such a thing as "a specific spot on the
drive".  Flash memory controllers do something called "wear-leveling".
Usually a write to a block (secretly, behind the OS's back) will free
that bit of silicon and allocate a new one.  The idea is that the
flash memory circuits have a modest life so the controller spreads the
activity around.

| If I run "fsck.vfat" against the drive, it always finds an error and
| fixes it ... and then I have exactly the same problem again.

fsck, in general (excluding the badblock check) fixes the filesystem
structure.  No data blocks need be examined, only blocks representing
the structural information (inodes, directories, indirect blocks,
etc. depending on the particular filesystem).  Similarly, mkfs need
not initialize datablocks, the vast majority of blocks on a disk.

fsck and mkfs are, in general terms, not responsible for bad block
handling -- modern disk drives hide that stuff from the filesystem.
In the old days (20 years ago?), disks were not smart enough to do
this so filesystems had various ways of handling bad blocks.  It would
not surprise me if that code has suffered bitrot since I imagine that
it is rarely exercised.

|  Now I
| read Lennart's view on fsck in the thread about bad hard drives:
| 
| On 5 May 2011 12:45, Lennart Sorensen <lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org> wrote:
| > fsck means nothing.  Use mkfs with badblock check.  Unless you low level
| > formatted it, nothing is done about bad sectors.  Of course modern drives
| > don't need that since they can automatically map bad sectors _on_write_
| > (not on read).  Writing to the whole disk should help the drive remap
| > all bad sectors.

Modern drives don't support low-level formatting as far as I know.
All those physical things, including geometry, are secrets from the
computer.  The geometry that is exposed is a fiction for old time's
sake (to keep dumb old software ignorant of its ignorance).

| The man page says "dosfsck - check and repair MS-DOS file systems".
| So it moves stuff around but doesn't tag bad sectors?  That doesn't
| seem like much of a fix.  Does this solution also apply to flash
| drives?  (That is, I should reformat them with a badblock check?)
| These drives don't seem to be remapping on write ...

fsck fixes filesystems that have inconsistent metadata (i.e. NOT
data).  They work by exploiting redundancy and use heuristics.

Your report does not tell us if your problem is in the metadata or the
data.  I suspect metadata because you say that the errors are on the
same specific spot.  In particular, I imagine that the metadata block
gives a read error and that the things you are doing don't result in
it being rewritten (because writing should fix it).  This analysis
strongly depends on your report that the errors are at the exact same
spot on the drive.

If you have two drives that get errors at the same spot, then
something more interesting is happening.  Like, for example, you might
be trying to write a file larger than the filesystem will accept.


More information about the Legacy mailing list