<div dir="ltr"><div dir="ltr">On Mon, 29 Jul 2019 at 13:28, Stewart C. Russell via talk <<a href="mailto:talk@gtalug.org">talk@gtalug.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'm guessing this is bad, right?<br>
<br>
[Mon Jul 29 12:59:48 2019] print_req_error: critical medium error,<br>
dev nvme0n1, sector 296089600 flags 80700<br>
[Mon Jul 29 12:59:48 2019] print_req_error: critical medium error,<br>
dev nvme0n1, sector 296089744 flags 0<br>
<br>
Is it an oh-shit-get-yerself-a-new-drive-NOW thing, or …?<br>
<br>
Drive is a 2+ year old Intel 512 GB SSD. Not entirely sure what the<br>
right diagnostics are for SSDs. Filesystem is showing clean but touching<br>
certain known-bad files triggers the error in the system log.<br>
<br>
Dunno if these nvme stats are useful:<br>
<br>
Smart Log for NVME device:nvme0 namespace-id:ffffffff<br>
critical_warning : 0<br>
temperature : 25 C<br>
available_spare : 85%<br>
available_spare_threshold : 10%<br>
percentage_used : 1%<br>
data_units_read : 10,349,479<br>
data_units_written : 10,098,299<br>
host_read_commands : 183,018,841<br>
host_write_commands : 136,702,227<br>
controller_busy_time : 1,342<br>
power_cycles : 201<br>
power_on_hours : 15,722<br>
unsafe_shutdowns : 10<br>
media_errors : 803<br>
num_err_log_entries : 844<br>
Warning Temperature Time : 0<br>
Critical Composite Temperature Time : 0<br>
Thermal Management T1 Trans Count : 0<br>
Thermal Management T2 Trans Count : 0<br>
Thermal Management T1 Total Time : 0<br>
Thermal Management T2 Total Time : 0<br>
<br>
Any suggestions, please, for:<br>
<br>
* what I should be looking for in stats (nvme smart-log-add doesn't give<br>
me anything at all, so no wear-levelling stats)<br>
<br>
* a decent brand to replace it with. I'm likely okay with a SATA SSD.<br>
<br>
cheers,<br>
Stewart<br clear="all"></blockquote><div><br></div><div>The log doesn't sound like heavy use ... and yet that sounds like an "oh-shit-get-yerself-a-new-drive-NOW" error to me. At the very least, stay on top of your backups. As I understand it, when "segments" go bad on a solid state drive (hell, even on a spinning disk these days), the drive firmware should silently move the data and you'd never even know it happened. That you're seeing the errors is alarming and suggests a fairly serious malfunction.<br></div><div><br></div><div>But ... I have no expertise with SSD (or NVMe) drives - I have a few, but none have failed so I haven't had to learn. Ignore this suggestion if you get advice from someone with more knowledge of those drives ...<br></div></div><br>-- <br><div dir="ltr" class="gmail_signature">Giles<br><a href="https://www.gilesorr.com/" target="_blank">https://www.gilesorr.com/</a><br><a href="mailto:gilesorr@gmail.com" target="_blank">gilesorr@gmail.com</a></div></div>