<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
    <title>Posteet: charset</title> 
    <link>http://www.posteet.com/</link> 
    <description>Recent posteets posted to Posteet</description>
    <ttl>60</ttl>

    
    <item>
        <title>utf8_unicode_ci vs utf8_general_ci</title>
        <link>http://www.posteet.com/view/1340</link>
        <description>
        <![CDATA[<pre>You can check and compare sort orders provided by these two collations here:

http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
http://www.collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html

utf8_general_ci is a very simple collation. What it does - it just
- removes all accents
- then converts to upper case
and uses the code of this sort of &quot;base letter&quot; result letter to compare.

For example, these Latin letters: ÀÁÅåāă (and all other Latin letters &quot;a&quot; with any accents and in any cases) are all compared as equal to &quot;A&quot;.

utf8_unicode_ci uses the default Unicode collation element table (DUCET).

The main differences are:

1. utf8_unicode_ci supports so called expansions and ligatures, for example: German letter ß (U+00DF LETTER SHARP S) is sorted near &quot;ss&quot; Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near &quot;OE&quot;.

utf8_general_ci does not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in a wrong order.

2. utf8_unicode_ci is *generally* more accurate for all scripts. For example, on Cyrillic block: utf8_unicode_ci is fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian
are sorted not well.

+/- The disadvantage of utf8_unicode_ci is that it is a little bit slower than utf8_general_ci.

So when you need better sorting order - use utf8_unicode_ci, and when you utterly interested in performance - use utf8_general_ci.</pre> <a href="http://www.posteet.com/tags/charset">[charset]</a>  <a href="http://www.posteet.com/tags/collation">[collation]</a>  <a href="http://www.posteet.com/tags/interclassement">[interclassement]</a>  <a href="http://www.posteet.com/tags/mysql">[mysql]</a>  <a href="http://www.posteet.com/tags/unicode">[unicode]</a>  <a href="http://www.posteet.com/tags/utf8">[utf8]</a> ]]>        </description>
        <dc:creator>spirit</dc:creator>
        <pubDate>Tue, 28 Oct 2008 10:15:36 +0100</pubDate>

            <category>charset</category>
            <category>collation</category>
            <category>interclassement</category>
            <category>mysql</category>
            <category>unicode</category>
            <category>utf8</category>
    
    </item>


</channel>
</rss>
