// you're reading...

PHP, Doctrine ORM & MySQL – How To Deal With Chinese, Korean, Japanese & Other Non-English UTF-8 Character Encoding

There was a web application project, PHP in nature, that my team and I had developed for the past couple of months and it was time to setup and move all of the application to the production server which is running on Ubuntu Linux. The application data stored in MySQL is a combination of both English and the Chinese Simplified characters (because of bilingualism). Unfortunately, most of our selenium functional test failed right after we ran it and we’d discovered that any form of data gotten from the MySQL database was displayed only with ???? characters for the data in the Chinese language. This caused me to look for a solution and I’m happy to document this in case some poor souls out there encounter the same problem as we did.

Apparently, MySQL running on most linux/unix-es is not kinda snappy when it comes to multilingual support. So, if you happen to be developing PHP web applications with multilingual support using Doctrine ORM and the MySQL DB, here are some pointers to help you out.

Configuring The MySQL DB

If I want my database to be geared towards supporting multilingual character sets, the first thing that we need to do is to make sure the engine runs with the support of UTF-8 character encoding. To have this, all you need to do is to edit the my.cnf file (typically found in /etc/my.cnf or /etc/mysql/my.cnf):

  • Under the [mysqld] section, just add this line:
    character-set-server=utf8
    
  • Next, under the [mysql] section, just add this line:
    default-character-set=utf8
    

Save the contents and restart the MySQL database service.

How To Create A Database With UTF-8 Support

In order for the database to support UTF-8, the create script used should be like the below. Enter this at the MySQL prompt:

CREATE DATABASE mydb DEFAULT CHARACTER SET 'UTF8' COLLATE 'utf8_general_ci';

How To Configure Doctrine ORM’s Entity Manager With UTF-8 Support

Configuring the connection options

Before you obtain the Entity Manager, you must include the charset property and the driverOptions property in the connection options. The below is an example it:

 PHP |  copy code |? 
1
$connectionOptions = array(
2
    "driver" => "pdo_mysql",
3
    "user" => "mysql_admin",         //Please change
4
    "password" => "mysql_password",  //Please change
5
    "dbname" => "database_name",     //Please change
6
    "charset" => "utf8",
7
    "driverOptions" => array(1002=>"SET NAMES utf8")
8
);

We always set the charset value as “utf8” and the driverOptions‘s value with a map of 1002=>”SET NAMES utf8″ (which is not quite documented).

Initializing the Entity Manager

Once you have the connection options defined, you need to add an event hook to the Entity Manager every time it initializes a MySQL connection session. Do this after you create the Entity Manager. For example:

 PHP |  copy code |? 
1
/**
2
* Assuming the $connectionConfig variable stores the initialized Config object
3
* and the $connectionOptions should be like what was shown in the above example...
4
**/
5
$entityManager = EntityManager::create($connectionOptions, $connectionConfig);
6
 
7
// Add the event hook
8
$entityManager->getEventManager()->addEventSubscriber(new MysqlSessionInit("utf8", "utf8_unicode_ci"));

Now, when you perform any query through the entity manager, the return value should not display mere question marks ???? and you should be able to get the proper data with the right character encoding represented.

Hope this helps.

Previous Posts

Java LDAP/JNDI: 2 Ways Of Decoding And Using The objectGUID From Windows Active Directory

Java LDAP/JNDI: 2 Ways Of Decoding And Using The objectGUID From Windows Active Directory

October 13th, 2012

Windows Active Directory is a good way for many corporations to be used as a means of user managemen[...]

Quick Note: Unable To Perform LDAP Wildcard “*” Search On Windows Active Directory

Quick Note: Unable To Perform LDAP Wildcard "*" Search On Windows Active Directory

October 9th, 2012

In case you are searching high and low for a solution or an answer to why Windows Active Directory d[...]

Java JNDI/LDAP: Windows Active Directory Authentication, Organizational Unit, Group & Other Information Access

Java JNDI/LDAP: Windows Active Directory Authentication, Organizational Unit, Group & Other Information Access

October 4th, 2012

In today's IT environment, most mid-size corporation and above will have some form of centralized em[...]

MySQL Cluster NDB 7.2 on Solaris 10 Part 3 – Testing The Cluster

MySQL Cluster NDB 7.2 on Solaris 10 Part 3 - Testing The Cluster

September 22nd, 2012

We are back again to have fun with our cluster that we've setup written in our previous articles on [...]

MySQL Cluster NDB 7.2 on Solaris 10 Part 2 – Starting, Distributed Synchronized Users Management And Stopping The Cluster

MySQL Cluster NDB 7.2 on Solaris 10 Part 2 - Starting, Distributed Synchronized Users Management And Stopping The Cluster

September 18th, 2012

This is the continuation from the previous part of the tutorial MySQL Cluster NDB 7.2 on Solaris 10 [...]

MySQL Cluster NDB 7.2 on Solaris 10 Part 1 – How To Install, Setup and Configure

MySQL Cluster NDB 7.2 on Solaris 10 Part 1 - How To Install, Setup and Configure

September 18th, 2012

If you have landed on this page, we believe you might either had a bumpy ride in getting the MySQL c[...]

Quick Fix: How to Solve “Unable to read the logging configuration” on Netbeans7 with JBoss6

Quick Fix: How to Solve "Unable to read the logging configuration" on Netbeans7 with JBoss6

September 8th, 2012

This is just a quick fix post for those whom are having this problem when running JBoss 6.x with Net[...]

Making sense of EJB3.x Transaction Attributes – Part 4 (NEVER)

Making sense of EJB3.x Transaction Attributes – Part 4 (NEVER)

September 5th, 2012

This is the last part in the series of "Making sense of EJB3.x Transaction Attributes". So far, we'v[...]

Making sense of EJB3.x Transaction Attributes – Part 3 (Difference Between SUPPORTS and NOT_SUPPORTED)

Making sense of EJB3.x Transaction Attributes – Part 3 (Difference Between SUPPORTS and NOT_SUPPORTED)

September 5th, 2012

Oracle had extensively documented the behavior of each transaction attributes in the Java EE documen[...]