PHP, Doctrine ORM & MySQL – How To Deal With Chinese, Korean, Japanese & Other Non-English UTF-8 Character Encoding

There was a web application project, PHP in nature, that my team and I had developed for the past couple of months and it was time to setup and move all of the application to the production server which is running on Ubuntu Linux. The application data stored in MySQL is a combination of both English and the Chinese Simplified characters (because of bilingualism). Unfortunately, most of our selenium functional test failed right after we ran it and we’d discovered that any form of data gotten from the MySQL database was displayed only with ???? characters for the data in the Chinese language. This caused me to look for a solution and I’m happy to document this in case some poor souls out there encounter the same problem as we did.

Apparently, MySQL running on most linux/unix-es is not kinda snappy when it comes to multilingual support. So, if you happen to be developing PHP web applications with multilingual support using Doctrine ORM and the MySQL DB, here are some pointers to help you out.

Configuring The MySQL DB

If I want my database to be geared towards supporting multilingual character sets, the first thing that we need to do is to make sure the engine runs with the support of UTF-8 character encoding. To have this, all you need to do is to edit the my.cnf file (typically found in /etc/my.cnf or /etc/mysql/my.cnf):

  • Under the [mysqld] section, just add this line:
  • Next, under the [mysql] section, just add this line:

Save the contents and restart the MySQL database service.

How To Create A Database With UTF-8 Support

In order for the database to support UTF-8, the create script used should be like the below. Enter this at the MySQL prompt:

How To Configure Doctrine ORM’s Entity Manager With UTF-8 Support

Configuring the connection options

Before you obtain the Entity Manager, you must include the charset property and the driverOptions property in the connection options. The below is an example it:

We always set the charset value as “utf8” and the driverOptions‘s value with a map of 1002=>”SET NAMES utf8″ (which is not quite documented).

Initializing the Entity Manager

Once you have the connection options defined, you need to add an event hook to the Entity Manager every time it initializes a MySQL connection session. Do this after you create the Entity Manager. For example:

Now, when you perform any query through the entity manager, the return value should not display mere question marks ???? and you should be able to get the proper data with the right character encoding represented.

Hope this helps.

Born and currently resides in Malaysia, a seasoned Java developer whom had held positions as Senior Developer, Consultant and Technical Architect in various enterprise software development companies.