helpinghand
search
needassistance
 

Mimer SQL Unicode Collation Charts

Languages - Predefined and Downloadable

Below are specifications on sorting adjustments for various languages, so called tailorings, needed to get the correct national sort order compared to the Unicode default sorting order.

In the table below, languages with their names bolded are among the predefined collations included in the current version of Mimer SQL.

For some of the languages that are not bolded, the collation definition can be found and easily used by copy/paste. Where applicable, see Uyghur for example, the respective language's page contains a Collation link (in the top of the page) that leads to the CREATE COLLATION statement used to define the collation.

Afrikaans
Albanian
Amharic
Arabic
Armenian
Arumanian
Assamese
Asturian
Azerbaijani

Basque
Belarusian
Bengali
Bosnian
Breton
Bulgarian

Catalan
Chinese (康熙 KangXi)
Chinese (拼音 PinYin)
Chinese (五笔画 WuBiHua)
Chinese (注音 ZhuYin)
Corsican
Croatian
Czech

Danish
Dari
Dutch
Dzongkha

Edo
Elfdalian
English
Esperanto
Estonian
Ewe

Faroese
Filipino
Finnish
French
Frisian
Friulian

Galician
Georgian
German
German (Phonebook)
Greek
Greek-Latin
Greenlandic
Guarani
Gujarati

Hausa
Hebrew
Hindi
Hungarian

Icelandic
Igbo
Indonesian
Irish Gaelic
Italian

Japanese
Javanese

Kannada
Kashmiri
Kazakh
Khmer
Kirghiz
Konkani
Korean
Kurdish

Lao
Lao (Traditional)
Latin
Latvian
Lithuanian
Luxembourgish

Macedonian
Malay
Malayalam
Maltese
Manipuri
Maori
Marathi
Moldavian
Mongolian
Moore
Myanmar

Ndebele
Nepali
Norwegian

Occitan
Oriya
Oromo

Pashto
Persian
Polish
Portuguese
Punjabi

Quechua

Romanian
Romansch
Russian

Sami
Sanskrit
Scots
Scottish Gaelic
Sepedi
Serbian
Sesotho
Sindhi
Sinhala
Slovak
Slovenian
Somali
Sorani
Sorbian (Lower)
Sorbian (Upper)
Spanish
Spanish (Traditional)
Swahili
Swati
Swedish

Tajik
Tamil
Tatar
Telugu
Thai
Tibetan
Tigrinya
Tongan
Tsonga
Tswana
Turkish
Turkmen

Ukrainian
Urdu
Uyghur
Uzbek

Venda
Vietnamese
Vietnamese (Traditional)

Welsh
Wolof

Xhosa

Yiddish
Yoruba

Zulu
Scripts

In this context a script is a collection of symbols used to represent textual information. The Unicode Character Database (UCD) provides data for a mapping from Unicode characters to script names.

European Ordering Rules (EOR) is a standard that defines how Latin, Greek and Cyrillic scripts should be sorted. It should provide guidance on sorting European repertoires in Unicode.

ISO/IEC 8859-1 (SQL datatype CHAR)
The following script for Latin-1 representation is used with the CHAR datatype in SQL.

Latin-1

Unicode (SQL datatype NCHAR)
Below are scripts for the Unicode representation, used with the NCHAR datatype in SQL. The Default Unicode Collation Element Table (DUCET) is provided in the AllKeys table, as stated in the specification for the Unicode Collation Algorithm (UCA). This table provides a mapping from characters to collation elements. The following scripts represent different parts of the table, given in the order they are defined.

Ignored  Secondary  Variable  Common 
Latin  Greek  Coptic  Cyrillic  Glagolitic  Old‑Permic 
Georgian  Armenian  Hebrew  Phoenician  Samaritan  Arabic 
Syriac  Mandaic  Thaana  NKo  Tifinagh  Ethiopic 
Devanagari  Bengali  Gurmukhi  Gujarati  Oriya  Tamil 
Telugu  Kannada  Malayalam  Sinhala  Meetei‑Mayek  Syloti‑Nagri 
Saurashtra  Kaithi  Mahajani  Sharada  Khojki  Khudawadi 
Multani  Grantha  Newa  Tirhuta  Siddham  Modi 
Takri  Ahom  Sundanese  Brahmi  Kharoshthi  Bhaiksuki 
Thai  Lao  Tai‑Viet  Tibetan  Marchen  Lepcha 
Phags‑Pa  Limbu  Tagalog  Hanunoo  Buhid  Tagbanwa 
Buginese  Batak  Rejang  Kayah‑Li  Myanmar  Chakma 
Khmer  Tai‑Le  New‑Tai‑Lue  Tai‑Tham  Cham  Balinese 
Javanese  Mongolian  Ol‑Chiki  Cherokee  Osage  Canadian‑Aboriginal 
Ogham  Runic  Old‑Hungarian  Old‑Turkic  Vai  Bamum 
Bassa‑Vah  Mende‑Kikakui  Adlam  Hangul  Hiragana‑Katakana  Bopomofo 
Yi  Lisu  Miao  Warang‑Citi  Pau‑Cin‑Hau  Pahawh‑Hmong 
Lycian  Carian  Lydian  Old‑Italic  Gothic  Deseret 
Shavian  Duployan  Osmanya  Elbasan  Caucasian‑Albanian  Sora‑Sompeng 
Mro  Linear‑B  Linear‑A  Cypriot  Old‑South‑Arabian  Old‑North‑Arabian 
Avestan  Palmyrene  Nabataean  Hatran  Imperial‑Aramaic  Inscriptional‑Parthian 
Inscriptional‑Pahlavi  Psalter‑Pahlavi  Manichaean  Ugaritic  Old‑Persian  Cuneiform 
Egyptian‑Hieroglyphs  Meroitic‑Hieroglyphs  Anatolian‑Hieroglyphs  Tangut  Nushu  CJK 

The Variable script above includes characters that may be set to Ignorable by using a collation option. Among these characters space, punctuation marks and most symbols can be found. The Common script above includes digits, currency symbols, etc.


 

Powered by Mimer SQL

Powered by Mimer SQL