L2/01-321R
Variation Selectors
M. Davis, 2001-08-17
There are 3 variation selectors in Unicode 3.1.1 (180B..180D). 256 others
have been accepted by the UTC, and are being submitted to WG2: FE00..FE0F,
E0110..E01FF (FE00 has already been accepted by WG2).
Here are the properties of the variation selectors:
- General category = Cm (combining mark)
- This assignment percolates into other tied properties, such as in BIDI
class, LineBreak, etc.
- These share characteristics of Cf characters, but Cm provides better
backwards compatibility.
- Combining Class = 0
- Joining Class = Transparent
- Default Representative Glyph = zero-width, invisible glyph.
- The chart glyph will have a dotted block.
- As with other zero-width invisible glyphs, implications may allow the
option of displaying the VS characters visibly, such as in a "Show
Hidden" option.
- Special Behavior
- When a specific VS occurs immediately after a specific base character,
as specified in StandardizedVariants.html in the Unicode character
database, the base character should be displayed with the variant glyph
specified in that file if possible. If not possible, the VS shall have
no effect on the selection of the glyph for that base character.
- If a VS occurs after any other character, it shall have no effect on
the selection of the glyph for that character.
- Policy Invariant: StandardizedVariants.html will not contain
associations between non-base characters and variation selectors.
- Transcoding Implications
- Where legacy standards incorporate glyph variants, the conversion into
Unicode may generate two Unicode code points from one legacy code point,
and the conversion from Unicode may generate one legacy code point from
two Unicode code points.
- Default Collation behavior: completely ignorable: [.0000.0000.0000.0000]
- Note: a contracting sequence of <base, VS> can be tailored for
specific uses, although this is discouraged. A case where such tailoring
might be done is to reflect legacy practice.