[GTALUG] a solved problem unsolved itself: WordPress, MySQL, UTF-8
Jamon Camisso
jamonation at gmail.com
Mon Nov 29 16:25:57 EST 2021
On 27/11/2021 14:41, Stewart C. Russell via talk wrote:
> I have been running a WordPress blog hosted on a Linux-based shared host
> since WordPress became a thing. It has worked quite well from about 2004
> up until a few weeks ago.
<snip>
> So the phonetic character U+0252 has been mangled into U+00C9 + U+2019.
> Every UTF-8 character seems to be affected this way.
>
> I wasn't expecting to wake up to a UTF-8 encoding problem this decade.
> There are a raft of "how to fix WP encoding issues" pages that show up
> in web searches, but the newest of them is from 2008 or so.
>
> I'm pretty much resigned to going through 16+ years of posts fixing
> this, but can mangled UTF-8 be recovered without rekeying?
Probably. If you've been running it for 10+ years, there is/was most
certainly some latin1 data hanging around, that's likely been converted
to UTF-8, or UTF-8 that's been double-encoded somewhere along the line.
This page has a section on the possible incorrect casing issue and a
fix:
https://codex.wordpress.org/Converting_Database_Character_Sets#Variant:_3-step_conversion_when_data_and_table_charset_already_don.27t_match
The rest of the page has a lot of useful information as well that might
apply to your situation.
Another thing to try is using mysqli_set_charset("UTF8"); somewhere in
your site's code. Substitute in different character sets until you find
the correct one, and then you'll be able to figure out a way to migrate
your tables to whatever WordPress wants.
Cheers, Jamon
More information about the talk
mailing list