[GTALUG] a solved problem unsolved itself: WordPress, MySQL, UTF-8

Jamon Camisso jamonation at gmail.com
Mon Nov 29 16:25:57 EST 2021


On 27/11/2021 14:41, Stewart C. Russell via talk wrote:
> I have been running a WordPress blog hosted on a Linux-based shared host 
> since WordPress became a thing. It has worked quite well from about 2004 
> up until a few weeks ago.
<snip>
> So the phonetic character U+0252 has been mangled into U+00C9 + U+2019. 
> Every UTF-8 character seems to be affected this way.
> 
> I wasn't expecting to wake up to a UTF-8 encoding problem this decade. 
> There are a raft of "how to fix WP encoding issues" pages that show up 
> in web searches, but the newest of them is from 2008 or so.
> 
> I'm pretty much resigned to going through 16+ years of posts fixing 
> this, but can mangled UTF-8 be recovered without rekeying?

Probably. If you've been running it for 10+ years, there is/was most 
certainly some latin1 data hanging around, that's likely been converted 
to UTF-8, or UTF-8 that's been double-encoded somewhere along the line.

This page has a section on the possible incorrect casing issue and a 
fix: 
https://codex.wordpress.org/Converting_Database_Character_Sets#Variant:_3-step_conversion_when_data_and_table_charset_already_don.27t_match

The rest of the page has a lot of useful information as well that might 
apply to your situation.

Another thing to try is using mysqli_set_charset("UTF8"); somewhere in 
your site's code. Substitute in different character sets until you find 
the correct one, and then you'll be able to figure out a way to migrate 
your tables to whatever WordPress wants.

Cheers, Jamon



More information about the talk mailing list