Languages – Predefined and Downloadable
Below are specifications on sorting adjustments for various languages, so called tailorings, needed to get the correct national sort order compared to the Unicode default sorting order.
In the table below, languages with their names bolded are among the predefined collations included in the current version of Mimer SQL.
For some of the languages that are not bolded, the collation definition can be found and easily used by copy/paste. Where applicable, see Uyghur for example, the respective language’s page contains a Collation link (in the top of the page) that leads to the CREATE COLLATION statement used to define the collation.
Languages
[add-search-inside]
For some language specific sorting and searching details, see the Linguistic Sorting and Searching in Mimer SQL document (pdf).
Scripts
In this context a script is a collection of symbols used to represent textual information. The Unicode Character Database (UCD) provides data for a mapping from Unicode characters to script names.
ISO/IEC 8859-1 (SQL datatype CHAR)
The following script for Latin-1 representation is used with the CHAR datatype in SQL.
Unicode (SQL datatype NCHAR)
Below are scripts for the Unicode representation, used with the NCHAR datatype in SQL. The Default Unicode Collation Element Table (DUCET) is provided in the AllKeys table, as stated in the specification for the Unicode Collation Algorithm (UCA). This table provides a mapping from characters to collation elements.
Two scripts deserves a special mention. The first one is the Variable script below that includes characters that may be set to Ignorable by using a collation option, i.e. by using one of the weighting methods Shifted or Shift-trimmed. Among the characters defined in this script, space, punctuation marks and most symbols can be found. See the article Character data, Unicode and Collations, the Alternate Weighting section, for details about weighting. The second one is the Common script below which includes digits, currency symbols, etc.
The following scripts represent different parts of the table, given in the order they are defined:
Ignored
Secondary
Variable
Common
Latin
Greek
Coptic
Cyrillic
Glagolitic
Old‑Permic
Georgian
Armenian
Hebrew
Phoenician
Samaritan
Arabic
Syriac
Mandaic
Thaana
NKo
Tifinagh
Ethiopic
Devanagari
Bengali
Gurmukhi
Gujarati
Oriya
Tamil
Telugu
Kannada
Malayalam
Sinhala
Meetei‑Mayek
Syloti‑Nagri
Saurashtra
Kaithi
Mahajani
Sharada
Khojki
Khudawadi
Multani
Grantha
Newa
Tirhuta
Siddham
Modi
Takri
Nandinagari
Dogra
Ahom
Masaram-Gondi
Gunjala-Gondi
Sundanese
Brahmi
Kharoshthi
Bhaiksuki
Thai
Lao
Tai‑Viet
Tibetan
Zanabazar-Square
Soyombo
Marchen
Lepcha
Phags‑Pa
Limbu
Tagalog
Hanunoo
Buhid
Tagbanwa
Buginese
Makasar
Batak
Rejang
Kayah‑Li
Myanmar
Hanifi-Rohingya
Chakma
Khmer
Tai‑Le
New‑Tai‑Lue
Tai‑Tham
Cham
Balinese
Javanese
Mongolian
Ol‑Chiki
Cherokee
Osage
Canadian‑Aboriginal
Ogham
Runic
Old‑Hungarian
Old‑Turkic
Vai
Bamum
Bassa‑Vah
Mende‑Kikakui
Medefaidrin
Adlam
Hangul
Hiragana‑Katakana
Bopomofo
Yi
Lisu
Miao
Warang‑Citi
Pau‑Cin‑Hau
Pahawh‑Hmong
Nyiakeng-Puachue-Hmong
Wancho
Lycian
Carian
Lydian
Old‑Italic
Gothic
Deseret
Shavian
Duployan
Osmanya
Elbasan
Caucasian‑Albanian
Sora‑Sompeng
Mro
Linear‑B
Linear‑A
Cypriot
Old‑South‑Arabian
Old‑North‑Arabian
Avestan
Palmyrene
Nabataean
Hatran
Imperial‑Aramaic
Inscriptional‑Parthian
Inscriptional‑Pahlavi
Psalter‑Pahlavi
Elymaic
Manichaean
Old-Sogdian
Sogdian
Ugaritic
Old‑Persian
Cuneiform
Egyptian‑Hieroglyphs
Meroitic‑Hieroglyphs
Anatolian‑Hieroglyphs
Tangut
Nushu
CJK
Unassigned