SQL Unicode Collation Charts

Languages – Predefined and Downloadable

Below are specifications on sorting adjustments for various languages, so called tailorings, needed to get the correct national sort order compared to the Unicode default sorting order.

In the table below, languages with their names bolded are among the predefined collations included in the current version of Mimer SQL.

For some of the languages that are not bolded, the collation definition can be found and easily used by copy/paste. Where applicable, see Uyghur for example, the respective language’s page contains a Collation link (in the top of the page) that leads to the CREATE COLLATION statement used to define the collation.

Languages

[add-search-inside]

Afrikaans	Galician	Maori	Swati
Albanian	Georgian	Marathi	Swedish
Amharic	German	Moldavian	Tajik
Arabic	German (Phonebook)	Mongolian	Tamil
Armenian	Greek	Moore	Tatar
Arumanian	Greek-Latin	Myanmar	Telugu
Assamese	Greenlandic	Ndebele	Thai
Asturian	Guarani	Nepali	Tibetan
Azerbaijani	Gujarati	Norwegian	Tigrinya
Basque	Hausa	Occitan	Tongan
Belarusian	Hebrew	Oriya	Tsonga
Bengali	Hindi	Oromo	Tswana
Bosnian	Hungarian	Pashto	Turkish
Breton	Icelandic	Persian	Turkmen
Bulgarian	Igbo	Polish	Ukrainian
Catalan	Indonesian	Portuguese	Urdu
Chinese (康熙 KangXi)	Irish Gaelic	Punjabi	Uyghur
Chinese (拼音 PinYin)	Italian	Quechua	Uzbek
Chinese (五笔画 WuBiHua)	Japanese	Romanian	Venda
Chinese (注音 ZhuYin)	Javanese	Romansch	Vietnamese
Corsican	Kannada	Russian	Vietnamese (Traditional)
Croatian	Kashmiri	Sami	Welsh
Czech	Kazakh	Sanskrit	Wolof
Danish	Khmer	Scots	Xhosa
Dari	Kirghiz	Scottish Gaelic	Yiddish
Dutch	Konkani	Sepedi	Yoruba
Dzongkha	Korean	Serbian	Zulu
Edo	Kurdish	Sesotho
Elfdalian	Lao	Sindhi
English	Lao (traditional)	Sinhala
Esperanto	Latin	Slovak
Estonian	Latvian	Slovenian
Ewe	Lithuanian	Somali
Faroese	Luxembourgish	Sorani
Filipino	Macedonian	Sorbian (Lower)
Finnish	Malay	Sorbian (Upper)
French	Malayalam	Spanish
Frisian	Maltese	Spanish (Traditional)
Friulian	Manipuri	Swahili

For some language specific sorting and searching details, see the Linguistic Sorting and Searching in Mimer SQL document (pdf).

Scripts

In this context a script is a collection of symbols used to represent textual information. The Unicode Character Database (UCD) provides data for a mapping from Unicode characters to script names.

ISO/IEC 8859-1 (SQL datatype CHAR)

The following script for Latin-1 representation is used with the CHAR datatype in SQL.

Latin-1

Unicode (SQL datatype NCHAR)

Below are scripts for the Unicode representation, used with the NCHAR datatype in SQL. The Default Unicode Collation Element Table (DUCET) is provided in the AllKeys table, as stated in the specification for the Unicode Collation Algorithm (UCA). This table provides a mapping from characters to collation elements.

Two scripts deserves a special mention. The first one is the Variable script below that includes characters that may be set to Ignorable by using a collation option, i.e. by using one of the weighting methods Shifted or Shift-trimmed. Among the characters defined in this script, space, punctuation marks and most symbols can be found. See the article Character data, Unicode and Collations, the Alternate Weighting section, for details about weighting. The second one is the Common script below which includes digits, currency symbols, etc.

The following scripts represent different parts of the table, given in the order they are defined:

Ignored
Secondary
Variable
Common
Latin
Greek
Coptic
Cyrillic
Glagolitic
Old‑Permic
Georgian
Armenian
Hebrew
Phoenician
Samaritan
Arabic
Syriac
Mandaic
Thaana
NKo
Tifinagh
Ethiopic
Devanagari
Bengali
Gurmukhi
Gujarati
Oriya
Tamil
Telugu
Kannada
Malayalam
Sinhala
Meetei‑Mayek
Syloti‑Nagri
Saurashtra
Kaithi
Mahajani
Sharada
Khojki
Khudawadi
Multani
Grantha
Newa
Tirhuta
Siddham
Modi
Takri
Nandinagari
Dogra
Ahom
Masaram-Gondi
Gunjala-Gondi
Sundanese
Brahmi
Kharoshthi
Bhaiksuki
Thai
Lao
Tai‑Viet
Tibetan
Zanabazar-Square
Soyombo
Marchen
Lepcha
Phags‑Pa
Limbu
Tagalog
Hanunoo
Buhid
Tagbanwa
Buginese
Makasar
Batak
Rejang
Kayah‑Li
Myanmar
Hanifi-Rohingya
Chakma
Khmer
Tai‑Le
New‑Tai‑Lue
Tai‑Tham
Cham
Balinese
Javanese
Mongolian
Ol‑Chiki
Cherokee
Osage
Canadian‑Aboriginal
Ogham
Runic
Old‑Hungarian
Old‑Turkic
Vai
Bamum
Bassa‑Vah
Mende‑Kikakui
Medefaidrin
Adlam
Hangul
Hiragana‑Katakana
Bopomofo
Yi
Lisu
Miao
Warang‑Citi
Pau‑Cin‑Hau
Pahawh‑Hmong
Nyiakeng-Puachue-Hmong
Wancho
Lycian
Carian
Lydian
Old‑Italic
Gothic
Deseret
Shavian
Duployan
Osmanya
Elbasan
Caucasian‑Albanian
Sora‑Sompeng
Mro
Linear‑B
Linear‑A
Cypriot
Old‑South‑Arabian
Old‑North‑Arabian
Avestan
Palmyrene
Nabataean
Hatran
Imperial‑Aramaic
Inscriptional‑Parthian
Inscriptional‑Pahlavi
Psalter‑Pahlavi
Elymaic
Manichaean
Old-Sogdian
Sogdian
Ugaritic
Old‑Persian
Cuneiform
Egyptian‑Hieroglyphs
Meroitic‑Hieroglyphs
Anatolian‑Hieroglyphs
Tangut
Nushu
CJK
Unassigned