[GTALUG] a solved problem unsolved itself: WordPress, MySQL, UTF-8

Stewart C. Russell scruss at gmail.com
Wed Dec 1 08:05:41 EST 2021


On 2021-11-29 16:25, Jamon Camisso via talk wrote:
> 
> Another thing to try is using mysqli_set_charset("UTF8"); somewhere in 
> your site's code. Substitute in different character sets until you find 
> the correct one ...

Thanks, Jamon, but there isn't a valid encoding for what my database 
seems to be holding. It was UTF-8, and now it's seemingly UTF-8 decoded 
to CP1252 bytes re-encoded to UTF-8 characters again.

If WordPress were using Python (it's not), if my db held the 4 
character, 6 byte UTF-8 string, the equivalent Python code to end up in 
the mess I'm in is:

     >>> bytes(bytes("côté",encoding='utf-8').decode(encoding='cp1252'), 
encoding='utf-8')
     b'c\xc3\x83\xc2\xb4t\xc3\x83\xc2\xa9'

or 6 characters / 10 bytes of gibberish ('côté').

Since this happened in the last month or so, it's not really a legacy 
encoding issue. Perfectly good UTF-8 got destroyed with no input/changes 
from me.

I'd been fairly careful with backups for the first decade of running 
this blog, but the process got wearing after a while, especially since 
every update went flawlessly so the manual backup process was a waste of 
time. Wordpress offers automatic updates without forcing a backup 
checkpoint, which I think is wrong.

cheers,
  Stewart



More information about the talk mailing list