PHP is usually not configured for working well with UTF8 multilingual support and XHTML from start. To get it working correctly, you need to adjust a few settings.

UTF-8

There are two things to think about. First, make sure your php files are saved using UTF8 encoding. Even notepad can achieve this in the file save dialog. Then you have to set the php.ini settings to UTF8. This can be done with the ini_set() function. The ini_set() have to be called before outputting any text or HTML/XHTML in the document.

// UTF8 settings
ini_set('mbstring.language', 			'Neutral');
ini_set('mbstring.internal_encoding', 		'UTF-8');
ini_set('mbstring.http_input', 			'UTF-8');
ini_set('mbstring.http_output', 		'UTF-8');
ini_set('mbstring.encoding_translation',	'On');
ini_set('mbstring.detect_order', 		'auto');
ini_set('mbstring.substitute_character', 	'long');

Normally, the server sets the content type correctly if the file is coded in UTF8, but sometimes you need to force the content-type to have UTF8 charset. The header() function also have to be called before outputting any text or HTML/XHTML in the document.

// Set XHTML content type and character encoding of the document
// You may also use text/html as mime type instead of application/xhtml+xml
header('Content-Type: application/xhtml+xml; charset=utf-8');

Sometimes you may have problems with getting data from MySQL encoded correctly as UTF8. Most of the time this isn’t a problem, but you can also force some UTF8 settings on the MySQL server. Use the mysql_set_charset if you have PHP version 5.2 or higher and SET NAMES/SET CHARACTER SET if you have an older version of PHP.

$conn = mysql_connect('localhost', 'user', 'password');
 
// PHP 5.2 and above
mysql_set_charset('utf8',$conn); 
 
// PHP below v 5.2
mysql_query("SET NAMES utf8");
mysql_query("SET CHARACTER SET utf8");

At last, do not forget to add the Content-Type <meta> tag in the <head> section of your HTML/XHTML. Otherwise the web browsers will display your page incorrectly anyway.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

XHTML

To get correct XHTML compatibility you need to adjust the session settings of PHP so that sessions is integrated the right way into your code. The server is often configured to insert session id:s into url:s on your PHP page.

// XHTML compatibility
ini_set('arg_separator.output', '&amp;');
ini_set('url_rewriter.tags', 'a=href,area=href,frame=src,input=src,form=action,fieldset=');

For security reasons, I recommend you to disable the url rewriting, since it makes it possible to hijack another user’s session under certain circumstances. This makes the previous XHTML settings unnecessary, but keep them if you intend to use sessions by URL.

// Disable session by URL
ini_set('session.use_only_cookies', '1');

Still having problems?

To see your PHP version and how the settings are affected, simply call php_info() after changing the settings. Master value is the original server setting and Local value is the value after changing it using ini_set().

php_info();

These UTF8 settings works most of the time, but many servers may have other configuration issues. If you still have issues with UTF8 check the php_info() output for any settings set to ISO-8859-1 and change them to UTF-8 using the ini_set() function. If you still have problems, check that your document files really are saved using correct UTF-8 character encoding.