Languages – Predefined and Downloadable
Below are specifications on sorting adjustments for various languages, so called tailorings, needed to get the correct national sort order compared to the Unicode default sorting order.
In the table below, languages with their names bolded are among the predefined collations included in the current version of Mimer SQL.
For some of the languages that are not bolded, the collation definition can be found and easily used by copy/paste. Where applicable, see Uyghur for example, the respective language’s page contains a Collation link (in the top of the page) that leads to the CREATE COLLATION statement used to define the collation.
For some language specific sorting and searching details, see the Linguistic Sorting and Searching in Mimer SQL document (pdf).
In this context a script is a collection of symbols used to represent textual information. The Unicode Character Database (UCD) provides data for a mapping from Unicode characters to script names.
ISO/IEC 8859-1 (SQL datatype CHAR)
The following script for Latin-1 representation is used with the CHAR datatype in SQL.
Unicode (SQL datatype NCHAR)
Below are scripts for the Unicode representation, used with the NCHAR datatype in SQL. The Default Unicode Collation Element Table (DUCET) is provided in the AllKeys table, as stated in the specification for the Unicode Collation Algorithm (UCA). This table provides a mapping from characters to collation elements. The following scripts represent different parts of the table, given in the order they are defined.
Ignored Secondary Variable Common
Latin Greek Coptic Cyrillic Glagolitic Old‑Permic
Georgian Armenian Hebrew Phoenician Samaritan Arabic
Syriac Mandaic Thaana NKo Tifinagh Ethiopic
Devanagari Bengali Gurmukhi Gujarati Oriya Tamil
Telugu Kannada Malayalam Sinhala Meetei‑Mayek Syloti‑Nagri
Saurashtra Kaithi Mahajani Sharada Khojki Khudawadi
Multani Grantha Newa Tirhuta Siddham Modi
Takri Ahom Sundanese Brahmi Kharoshthi Bhaiksuki
Thai Lao Tai‑Viet Tibetan Marchen Lepcha
Phags‑Pa Limbu Tagalog Hanunoo Buhid Tagbanwa
Buginese Batak Rejang Kayah‑Li Myanmar Chakma
Khmer Tai‑Le New‑Tai‑Lue Tai‑Tham Cham Balinese
Javanese Mongolian Ol‑Chiki Cherokee Osage Canadian‑Aboriginal
Ogham Runic Old‑Hungarian Old‑Turkic Vai Bamum
Bassa‑Vah Mende‑Kikakui Adlam Hangul Hiragana‑Katakana Bopomofo
Yi Lisu Miao Warang‑Citi Pau‑Cin‑Hau Pahawh‑Hmong
Lycian Carian Lydian Old‑Italic Gothic Deseret
Shavian Duployan Osmanya Elbasan Caucasian‑Albanian Sora‑Sompeng
Mro Linear‑B Linear‑A Cypriot Old‑South‑Arabian Old‑North‑Arabian
Avestan Palmyrene Nabataean Hatran Imperial‑Aramaic Inscriptional‑Parthian
Inscriptional‑Pahlavi Psalter‑Pahlavi Manichaean Ugaritic Old‑Persian Cuneiform
Egyptian‑Hieroglyphs Meroitic‑Hieroglyphs Anatolian‑Hieroglyphs Tangut Nushu CJK
The Variable script above includes characters that may be set to Ignorable by using a collation option, i.e. by using one of the weighting methods Shifted or Shift-trimmed. Among the characters defined in this script, space, punctuation marks and most symbols can be found. See the article Character data, Unicode and Collations, the Alternate Weighting section, for details about weighting.
The Common script above includes digits, currency symbols, etc.