Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Friday, April 01, 2005

 

"My Korean Application Doesn't work even though DB characterset is UTF-8"

Internationalization is a big thing going on these days.

I received such a call for support from Korea yesterday. Apparently, the client believed that setting UTF-8 as the db_character_set will fix everything for once and all. Following was my response to him. Last I heard the issue was fixed.

"However, there are two sides of the character set as far as a database application is concerned.

A) Database side character set: UTF-8 supports Unicode 3.0 standard and should be able to hold most of the character sets.

(Oracle's newer "AL32UTF8" character set is a superset of UTF-8 and follows newer Unicode 3.1 standard. Though Oracle recommends this to be set from Oracle 9i onwards, we have not done enough testing to officially support it with ITG yet. )

B) Client side character set: A client-side parameter called NLS_LANG is used to perform character set conversion between server (database) and client.

NLS_LANG should be explicitly set in the following format if you're using UTF-8 as the database side character set -

NLS_LANG=_.

e.g., NLS_LANG could look like "AMERICAN_AMERICA.UTF8"

* NLS_LANG is set in the environment for linux/unix and in the registry for windows.

* NLS_LANG does NOT change the client machine's default character set.

It lets Oracle know what character set to expect from client, so that it can do necessary conversion.

So if NLS_LANG is AMERICAN_AMERICA.UTF8 and the database character set is also UTF-8 -- Oracle does not do *any* conversion. It then just expects UTF8 characters from client.

The "Language", "Territory" and "character set" supported by Oracle on Client side, to set up proper NLS_LANG, can be obtained from "Appendix A" of "Globalization Support Guide" available at -

http://otn.oracle.com/pls/db92/db92.docindex?remark=homepage (for 9i -- needs free login)

http://otn.oracle.com/pls/db10g/portal.portal_demo3?selected=3 (for 10g -- needs free login)

I see Korean (Language), Korea (Territory) and different Korean Character Sets are supported by Oracle.

NLS_LANG should be set up to the character set the client is actually using so Oracle can do proper conversions of that data into UTF-8.

I think it will then solve the "broken" data issue.

There are some docs at metalink.oracle.com (needs support login) to determine the actual "clients characterset". If you do not have the login, let me know - I will send the documents attached."

Comments: Post a Comment



<< Home

Archives

January 2005   February 2005   March 2005   April 2005   May 2005   March 2006   April 2006   August 2006   September 2006   October 2006   January 2007   May 2008  

This page is powered by Blogger. Isn't yours?