PRI 418: Registration of additional sequences in the MSARG collection


A submission for the “Registration of additional sequences in the MSARG collection” has been received by the IVD Registrar. This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2020-09-11.

At the end of the review period, the submission has been incorporated into the 2020-11-06 version of the IVD as 133 additional registered IVSes for the MSARG collection. All comments received were considered and discussed between the IVD Registrar and the registrant, which resulted in adjustments to the representative glyphs for the following four sequence identifiers (the registered IVSes are provided in parentheses): ME_6A0B_001 (<6A0B, E0105>), ME_7D89_001 (<7D89, E0102>), ME_7DAB_001 (<7DAB, E0102>), and ME_9938_001 (<9938, E0101>).

Review instructions

Reviewers are encouraged to comment on any aspect of the submissions, but more particularly on:

  • whether the glyphic subset corresponding to a proposed sequence is indeed a glyphic subset of the base character for the sequence
  • whether the proposed sequences are congruent with the scope of their collection, or whether a new collection may be more appropriate

All comments should be sent via the reporting form and will be forwarded to the submitter. The content of the submission may be adjusted during the review period to account for the comments received.

Submission details

The content of this section has been provided by the submitter, but was edited by the IVD Registrar.


  • Name and address of registrant: Public Administration and Civil Service Bureau (SAFP), Macao Special Administrative Region, China, Rua do Campo, no. 162, Edificio Administracao Publica, 21-27 Andares, Macau
  • Names and email addresses of representatives: Mr. Chau Cheuk Kwan, Clement: [email protected] & Ms. Cheang Pui Pui: [email protected]
  • URL of the website describing the collection: http://www.iso10646hk.net/ivd/MSARG/ (NOTE: This is a temporary web site and it will be changed to another URL in the future.)
  • Identifier for the collection: MSARG
  • Pattern for the sequence identifiers (unchanged): M([AB]_[0-9A-F]{4}|C_[0-9]{5}|D_[0-9A-F]{4,5}|E_[0-9A-F]{4,5}_[0-9]{3})


Macao Special Administrative Region Government (MSARG) is in the process of establishing Macao SAR Information Systems Chinese Character Encoding Scheme (hereinafter referred to as the “Scheme”). The Scheme sets up the exchange framework to define characters used in Macao for information processing and exchange. To address the issue of Macao specific characters, the Scheme includes a Macao Supplementary Character Set, abbreviated as MSCS. MSCS-2020 will be used as the information exchange encoding standard among all departments of MSARG.

The Scheme includes the use of three character sets: 1) the Big-5 character set; 2) Hong Kong Supplementary Character Set (HKSCS) - 2008; and 3) Macao Supplementary Character Set (MSCS). The Big-5 character set has been used in Macao since Macao uses the traditional Chinese system. Due to the close connection with Hong Kong SAR, HKSCS characters are also commonly used in Macao and thus should be supported.

Under the ISO/IEC 10646 international encoding standard, the source references of the Scheme are as follows:

  • MB-hhhh is used to refer to all characters in the Big-5 character set, in which “hhhh” is the hexadecimal Big-5 code. MB0-hhhh, MB1-hhhh, and MB2-hhhh denote symbols, frequently-used ideographs, and less–frequently-used ideographs, respectively, in terms of how they are referenced in ISO/IEC 10646.
  • MA-hhhh is used to refer to all characters already encoded in HKSCS-2008, in which “hhhh” is the corresponding hexadecimal Big-5 code in HKSCS-2008. HKSCS-2008 is the last version of the HKSCS that was published with Big-5 code points.
  • MC-nnnnn is used for characters vertically extended to ISO/IEC 10646, in which “nnnnn” is an MSCS-assigned source reference code between 00001 and 99999, and assigned in sequence.
  • MD-hhhh[h] is used for characters horizontally extended to ISO/IEC 10646, in which “hhhh[h]” is the four- or five-digit hexadecimal code of the character in the ISO/IEC 10646 international standard. For characters in the Basic Multilingual Plane (BMP or Plane 0), four hexadecimal digits are used. For characters in other planes, five hexadecimal digits are used. HKSCS-2016, the latest version of HKSCS, includes 23 ideographs and one symbol horizontally extended to ISO/IEC 10646. Since MSCS also needs to horizontally extend these characters, MDH-hhhh[h] is used as the source reference for these characters to differentiate them from other horizontally-extended characters proposed by MSARG.
  • ME-hhhh[h]-nnn is used for character variants with registered IVSes, in which “hhhh[h]” is the four- or five-digit hexadecimal code of the base character in ISO/IEC 10646, and “nnn” is an MSCS-assigned number between 001 and 999. For variants that share the same base character, “nnn” is assigned in sequence.

MSCS includes the following three parts: 1) MSARG’s Vertical Extension to ISO/IEC 10646 (source reference: MC-nnnnn); 2) MSARG’s Horizontal Extension to ISO/IEC 10646 (source reference: MD-hhhh[h]); and 3) Macao’s variants with registered IVSes (source reference: ME-hhhh[h]-nnn).

The format of the sequence identifiers differs slightly from their source reference: 1) The sequence identifiers use underscores in lieu of hyphens per Section 3 of UTS #37; and 2) for the base characters in Big-5, MB is used, and not further distinguished as MB1 or MB2.

MSCS-2020 includes 79 variants, 11 of which were registered in the 2016-08-15 version of the IVD, along with the MSARG IVD collection itself. The 68 remaining variants still need to be registered. Because both the variants and their corresponding base characters need to be registered, there are 65 base characters and 68 variants included in this submission. This submission therefore includes 133 proposed new sequences to be added to the registered MSARG IVD collection. All of the variants are included in MSCS-2020. Some base characters are in MSCS-2020 proper, but some are also in Big-5 and HKSCS.

The IVD Registrar and MSARG kindly request experts to review these proposed sequences and their representative glyphs, and to submit their feedback via the reporting form prior to the end of the review period (2020-09-11).

List of proposed sequences

A data file listing the proposed sequences is available at https://www.unicode.org/ivd/pri/pri418/IVD_Sequences_MSARG_2020.txt & http://www.iso10646hk.net/ivd/MSARG/2020/IVD_Sequences_MSARG_2020.txt.

Representative Glyph Charts

Representative glyphs for the proposed sequences are available in PDF format at https://www.unicode.org/ivd/pri/pri418/Glyphs_List_MSARG_2020.pdf & http://www.iso10646hk.net/ivd/MSARG/2020/Glyphs_List_MSARG_2020.pdf.

Updates and Comments received

2020-06-22 Update: The following two sequences were added to the list of proposed sequences and representative glyph charts based on recent feedback for IRG N2430:

  • 8B67; MSARG; MB_F4D4
  • 8B67; MSARG; ME_8B67_001

The IVD Registrar would like to bring to the attention of the reviewers the following comments:

2020-08-23: Comments received from Jaemin Chung:

① GlyphWiki u826f-01-var-001
② GlyphWiki u7680-08-var-001

U+5ACF 嫏 is ⿰女郎; its middle component is ①, which is a variant form of 良.
The middle component of ME_5ACF_001 is ②, which is a variant form of 皀.
These two are different components. So ME_5ACF_001 should not be under U+5ACF.

2020-08-25: Response from Registrant:

The following example shows that these two components are unified, for your reference:

2020-08-25: Response from IVD Registrar:

UCV (Unifiable Component Variations) #355, excerpted below, explicitly covers this particular unification:
UCV 355

2020-09-11: Comments received from Henry Chan

2020-09-14: Response from IVD Registrar:

For the record, I agree with Henry's comments.
For #1, in the context of a single IVD collection, the IVD doesn't prioritize or favor one IVS over another when they share the same Base Character. But, because the sequence identifiers effectively specify which IVS represents a so-called default form, this may become a point of confusion for implementers.
For #2, it would also be helpful for implementers if the representative glyphs, both in the IVD charts and in the forthcoming MCSC standard, were to be harmonized in an appropriate way.
For #3, I feel strongly about this issue, particularly because it is mixing styles that are generally not mixed within the context of a regional standard nor in fonts that support such regional standards. I am particularly troubled by the double-dot version of ⻎, which I consider to be a misinterpretation of ⻍.
For the benefit of implementers, and for the MSCS standard itself, it would be helpful to resolve these issues prior to registering the new sequences in the IVD.