Monday, December 7, 2009

Collator : Locale sensitive String Comparison

If we wants to compare non-english characters for sorting, tabular then we do not have option to do that in normal string comparison(which will do straight away based on character to character comparison).

For Instance, I have Umlaut(ä, ë, ï and etc.,), if we compare with (au, eu, iu) then it will say both are different. But, in german/latin locale , these characters are equal respectively(au==ä). And additionally, 'ae' has to be treated by comparator as a single character.

From on, JDK 1.4 , java.text package introduced to get this job done.

Example from Sun JAVA DOC:

String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J" +
"< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T" +
"< u,U< v,V< w,W< x,X< y,Y< z,Z" +
"< \u00E5=a\u030A,\u00C5=A\u030A" +
";aa,AA< \u00E6,\u00C6< \u00F8,\u00D8";
RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);


Please find more detail from Sun JAVADOC, which is nicely written with samples.

java.text.Collator.java
java.text.RuleBasedCollator.java

No comments:

Post a Comment

Recent Posts

Unix Commands | List all My Posts

Texts

This blog intended to share the knowledge and contribute to JAVA Community such a way that by providing samples and pointing right documents/webpages. We try to give our knowledege level best and no guarantee can be claimed on truth. Copyright and Terms of Policy refer blogspot.com