Introduction
This article presents the Pinyin support provided by Mimer SQL.
The described examples are using the following Chinese names: 李三二 (LI SAN ER), 李三四 (LI SAN SI), 林五 (LIN WU), 赵六 (ZHAO LIU). We first create a table called chinese_name, and insert these names into the table:
SQL>create table chinese_name (name nvarchar(20)); SQL>insert into chinese_name values('李三二'); SQL>insert into chinese_name values('李三四'); SQL>insert into chinese_name values('赵六'); SQL>insert into chinese_name values('林五'); SQL>select * from chinese_name; name ==================== 李三二 李三四 赵六 林五
Searching by Pinyin
To find all names whose family names are 李(LI) or 林(LIN), you can use any of the following queries:
select * from chinese_name where builtin.begins_pinyin(name, 'L'); select * from chinese_name where builtin.begins_pinyin(name, 'LI');
The result is:
name ==================== 李三二 李三四 林五
To find all names whose family names are 林(LIN), you can use the following query:
select * from chinese_name where builtin.begins_pinyin(name, 'LIN');
The result is:
name ==================== 林五
To find all names starting with 李(LI)三(SAN), you can use any of the following queries:
select * from chinese_name where builtin.begins_pinyin(name, 'LS'); select * from chinese_name where builtin.begins_pinyin(name, 'LIS'); select * from chinese_name where builtin.begins_pinyin(name, 'LSA'); select * from chinese_name where builtin.begins_pinyin(name, 'LISA'); select * from chinese_name where builtin.begins_pinyin(name, 'LSAN'); select * from chinese_name where builtin.begins_pinyin(name, 'LISAN');
The result is:
name ==================== 李三二 李三四
To find the exact name 李(LI)三(SAN)四 (SI), you can use any of the following queries:
select * from chinese_name where builtin.begins_pinyin(name, 'LISANS'); select * from chinese_name where builtin.begins_pinyin(name, 'LISANSI');
The result is:
name ==================== 李三四
Indexing
To improve the performance of searching and sorting, one can create a pinyin index like this:
create index pinyin_idx on chinese_name(name for pinyin_start);
Then the records will be indexed by the pinyin.
Sorting by Pinyin
To sort Chinese characters according their Pinyin, one must specify the collation when doing the sorting. Most often, one should use the chinese_pinyin_3 collation like this:
select * from chinese_name order by name collate chinese_pinyin_3;
The result is:
name ==================== 李三二 李三四 林五 赵六
As you see, the names are sorted by their Pinyin.
Actually, for sorting Chinese names we would recommend to use the chinese_pinyin_name_3 collation. Some Chinese characters have special pronunciations when they are used in names (especially family names). For example, 单(DAN) is pronounced “SHAN” when it is a family name. Mimer SQL provides a special collation, chinese_pinyin_name_3, for this situation.
For example, if we insert a name 单晓国(SHAN XIAO GUO) into the chinese_name table:
insert into chinese_name values('单晓国');
We can then sort the table using the chinese_pinyin_name_3 collation like this:
select * from chinese_name order by name collate chinese_pinyin_name_3;
The result is:
name ==================== 李三二 李三四 林五 单晓国 赵六
So far we have introduced the use of a collation in the sorting query. Alternatively, one can specify the collation in the definition of the table. You can, for example, specify chinese_pinyin_3 for an address column, while specifying chinese_pinyin_name_3 for a name column. The specification is like this:
create table chinese_name (name nvarchar(20)
collate chinese_pinyin_name_3
);
Once the collation is specified in the table definition, it will be used automatically when a sorting query is executed. Now, one can sort the names properly like this:
select * from chinese_name order by name;
The result is:
name ==================== 李三二 李三四 林五 单晓国 赵六
More information about collations can be found in the article called Character Data, Unicode and Collations and in the Collations section in the Mimer SQL Documentation Set.
Also see the extensive support for different languages at Mimer SQL Unicode Collation Charts.
Pinyin T9 support in Mimer SQL
Many Contacts or Phone apps use T9 keyboard for searching for Chinese names. In this method, the pinyin of a Chinese name is converted into a series of digits. For instance, the T9 digits of “李(LI)三(SAN)四 (SI)” is “5472674”.
Some Android phone vendors have implemented their own support for T9 search, with very complex implementation, and unsatisfactory performance.
Mimer SQL provides built-in support for T9 search, where all inserting, indexing and searching can be done with simple SQL queries. No extra columns, functions, or implementations are need.
For example, the chinese_name table contains the Chinese name 李三四:
insert into chinese_name values('李三四');
One only needs to create a T9 index on the column name of the chinese_name table as follows:
create index t9_idx on chinese_name(name
for pinyin_start_t9
);
Then search for the contact whose name matches the T9 digits using the builtin.begins_pinyin_t9 function:
select * from chinese_name where builtin.begins_pinyin_t9(name, '5472674');
All the following searches will find 李三四, whose T9 string is 54 726 74:
--- full string select * from chinese_name where builtin.begins_pinyin_t9(name, '5472674'); --- first digits of all letters select * from chinese_name where builtin.begins_pinyin_t9(name, '577'); --- first digit of one letter select * from chinese_name where builtin.begins_pinyin_t9(name, '5'); --- first digit of one letter select * from chinese_name where builtin.begins_pinyin_t9(name, '7'); --- a string that matches the beginning of the T9 string select * from chinese_name where builtin.begins_pinyin_t9(name, '547'); --- a string that matches the beginning of the T9 string select * from chinese_name where builtin.begins_pinyin_t9(name, '5472'); --- a string that matches the beginning of the T9 string select * from chinese_name where builtin.begins_pinyin_t9(name, '54726'); --- a string that matches the beginning of the T9 string select * from chinese_name where builtin.begins_pinyin_t9(name, '547267');