Languages – Predefined and Downloadable

Below are specifications on sorting adjustments for various languages, so called tailorings, needed to get the correct national sort order compared to the Unicode default sorting order.

In the table below, languages with their names bolded are among the predefined collations included in the current version of Mimer SQL.

For some of the languages that are not bolded, the collation definition can be found and easily used by copy/paste. Where applicable, see Uyghur for example, the respective language’s page contains a Collation link (in the top of the page) that leads to the CREATE COLLATION statement used to define the collation.

Scripts

In this context a script is a collection of symbols used to represent textual information. The Unicode Character Database (UCD) provides data for a mapping from Unicode characters to script names.

ISO/IEC 8859-1 (SQL datatype CHAR)

The following script for Latin-1 representation is used with the CHAR datatype in SQL.

Latin-1

Unicode (SQL datatype NCHAR)

Below are scripts for the Unicode representation, used with the NCHAR datatype in SQL. The Default Unicode Collation Element Table (DUCET) is provided in the AllKeys table, as stated in the specification for the Unicode Collation Algorithm (UCA). This table provides a mapping from characters to collation elements.

Two scripts deserves a special mention. The first one is the Variable script below that includes characters that may be set to Ignorable by using a collation option, i.e. by using one of the weighting methods Shifted or Shift-trimmed. Among the characters defined in this script, space, punctuation marks and most symbols can be found. See the article Character data, Unicode and Collations, the Alternate Weighting section, for details about weighting. The second one is the Common script below which includes digits, currency symbols, etc.

The following scripts represent different parts of the table, given in the order they are defined:

Ignored 
Secondary 
Variable 
Common
Latin 
Greek 
Coptic 
Cyrillic 
Glagolitic 
Old‑Permic
Georgian 
Armenian 
Hebrew 
Phoenician 
Samaritan 
Arabic
Syriac 
Mandaic 
Thaana 
NKo 
Tifinagh 
Ethiopic
Devanagari 
Bengali 
Gurmukhi 
Gujarati 
Oriya 
Tamil
Telugu 
Kannada 
Malayalam 
Sinhala 
Meetei‑Mayek 
Syloti‑Nagri
Saurashtra 
Kaithi 
Mahajani 
Sharada 
Khojki 
Khudawadi
Multani 
Grantha 
Newa 
Tirhuta 
Siddham 
Modi
Takri 
Nandinagari
Dogra
Ahom 
Masaram-Gondi
Gunjala-Gondi
Sundanese 
Brahmi 
Kharoshthi 
Bhaiksuki
Thai 
Lao 
Tai‑Viet 
Tibetan
Zanabazar-Square
Soyombo
Marchen 
Lepcha
Phags‑Pa 
Limbu 
Tagalog 
Hanunoo 
Buhid 
Tagbanwa
Buginese 
Makasar
Batak 
Rejang 
Kayah‑Li 
Myanmar 
Hanifi-Rohingya
Chakma
Khmer 
Tai‑Le 
New‑Tai‑Lue 
Tai‑Tham 
Cham 
Balinese
Javanese 
Mongolian 
Ol‑Chiki 
Cherokee 
Osage 
Canadian‑Aboriginal
Ogham 
Runic 
Old‑Hungarian 
Old‑Turkic 
Vai 
Bamum
Bassa‑Vah 
Mende‑Kikakui 
Medefaidrin 
Adlam 
Hangul 
Hiragana‑Katakana 
Bopomofo
Yi 
Lisu 
Miao 
Warang‑Citi 
Pau‑Cin‑Hau 
Pahawh‑Hmong
Nyiakeng-Puachue-Hmong
Wancho
Lycian 
Carian 
Lydian 
Old‑Italic 
Gothic 
Deseret
Shavian 
Duployan 
Osmanya 
Elbasan 
Caucasian‑Albanian 
Sora‑Sompeng
Mro 
Linear‑B 
Linear‑A 
Cypriot 
Old‑South‑Arabian 
Old‑North‑Arabian
Avestan 
Palmyrene 
Nabataean 
Hatran 
Imperial‑Aramaic 
Inscriptional‑Parthian
Inscriptional‑Pahlavi 
Psalter‑Pahlavi
Elymaic
Manichaean
Old-Sogdian
Sogdian
Ugaritic 
Old‑Persian 
Cuneiform
Egyptian‑Hieroglyphs 
Meroitic‑Hieroglyphs 
Anatolian‑Hieroglyphs 
Tangut 
Nushu 
CJK
Unassigned