Mimer SQL Unicode Collation Charts

Languages - Predefined and Downloadable

Languages – Predefined and Downloadable

Below are specifications on sorting adjustments for various languages, so called tailorings, needed to get the correct national sort order compared to the Unicode default sorting order.

In the table below, languages with their names bolded are among the predefined collations included in the current version of Mimer SQL.

For some of the languages that are not bolded, the collation definition can be found and easily used by copy/paste. Where applicable, see Uyghur for example, the respective language’s page contains a Collation link (in the top of the page) that leads to the CREATE COLLATION statement used to define the collation.

Scripts

In this context a script is a collection of symbols used to represent textual information. The Unicode Character Database (UCD) provides data for a mapping from Unicode characters to script names.

ISO/IEC 8859-1 (SQL datatype CHAR)

The following script for Latin-1 representation is used with the CHAR datatype in SQL.

Latin-1

Unicode (SQL datatype NCHAR)

Below are scripts for the Unicode representation, used with the NCHAR datatype in SQL. The Default Unicode Collation Element Table (DUCET) is provided in the AllKeys table, as stated in the specification for the Unicode Collation Algorithm (UCA). This table provides a mapping from characters to collation elements. The following scripts represent different parts of the table, given in the order they are defined.

Ignored  Secondary  Variable  Common
Latin  Greek  Coptic  Cyrillic  Glagolitic  Old‑Permic
Georgian  Armenian  Hebrew  Phoenician  Samaritan  Arabic
Syriac  Mandaic  Thaana  NKo  Tifinagh  Ethiopic
Devanagari  Bengali  Gurmukhi  Gujarati  Oriya  Tamil
Telugu  Kannada  Malayalam  Sinhala  Meetei‑Mayek  Syloti‑Nagri
Saurashtra  Kaithi  Mahajani  Sharada  Khojki  Khudawadi
Multani  Grantha  Newa  Tirhuta  Siddham  Modi
Takri  Ahom  Sundanese  Brahmi  Kharoshthi  Bhaiksuki
Thai  Lao  Tai‑Viet  Tibetan  Marchen  Lepcha
Phags‑Pa  Limbu  Tagalog  Hanunoo  Buhid  Tagbanwa
Buginese  Batak  Rejang  Kayah‑Li  Myanmar  Chakma
Khmer  Tai‑Le  New‑Tai‑Lue  Tai‑Tham  Cham  Balinese
Javanese  Mongolian  Ol‑Chiki  Cherokee  Osage  Canadian‑Aboriginal
Ogham  Runic  Old‑Hungarian  Old‑Turkic  Vai  Bamum
Bassa‑Vah  Mende‑Kikakui  Adlam  Hangul  Hiragana‑Katakana  Bopomofo
Yi  Lisu  Miao  Warang‑Citi  Pau‑Cin‑Hau  Pahawh‑Hmong
Lycian  Carian  Lydian  Old‑Italic  Gothic  Deseret
Shavian  Duployan  Osmanya  Elbasan  Caucasian‑Albanian  Sora‑Sompeng
Mro  Linear‑B  Linear‑A  Cypriot  Old‑South‑Arabian  Old‑North‑Arabian
Avestan  Palmyrene  Nabataean  Hatran  Imperial‑Aramaic  Inscriptional‑Parthian
Inscriptional‑Pahlavi  Psalter‑Pahlavi  Manichaean  Ugaritic  Old‑Persian  Cuneiform
Egyptian‑Hieroglyphs  Meroitic‑Hieroglyphs  Anatolian‑Hieroglyphs  Tangut  Nushu  CJK

The Variable script above includes characters that may be set to Ignorable by using a collation option, i.e. by using one of the weighting methods Shifted or Shift-trimmed. Among the characters defined in this script, space, punctuation marks and most symbols can be found. See the article Character data, Unicode and Collations, the Alternate Weighting section, for details about weighting.

The Common script above includes digits, currency symbols, etc.