[GTALUG] a solved problem unsolved itself: WordPress, MySQL, UTF-8

Stewart C. Russell scruss at gmail.com
Sat Nov 27 14:41:28 EST 2021


I have been running a WordPress blog hosted on a Linux-based shared host 
since WordPress became a thing. It has worked quite well from about 2004 
up until a few weeks ago.

Sadly, *something* recently decided my database encoding was wrong. And 
that something decided to "fix" it. It certainly "fixed it", but not in 
any way I could want. It also did the same for Catherine's blog.

I know I didn't change any part of the config chain. As far as I can see:

* the MySQL database still thinks the text is encoded in UTF-8;

* Wordpress thinks the data is in UTF-8;

* the web server is serving UTF-8.

(NB: there's going to be some UTF-8 and hex chars in this message.)

A typical post which shows this problem is
https://scruss.com/blog/2016/02/27/t%c9%92k-b%c9%92ks-a-tiny-hardware-speech-synthesizertts/ 


When I should be seeing something like:

[tɒk bɒks]		5b74 c992 6b20 62c9 926b 735d

I'm seeing this in the page served up (and in the db text itself):

[tÉ’k bÉ’ks]		5b74 c389 e280 996b 2062 c389 e280 996b 735d

So the phonetic character U+0252 has been mangled into U+00C9 + U+2019. 
Every UTF-8 character seems to be affected this way.

I wasn't expecting to wake up to a UTF-8 encoding problem this decade. 
There are a raft of "how to fix WP encoding issues" pages that show up 
in web searches, but the newest of them is from 2008 or so.

I'm pretty much resigned to going through 16+ years of posts fixing 
this, but can mangled UTF-8 be recovered without rekeying?

cheers,
  Stewart


More information about the talk mailing list