General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.

General Punctuation
RangeU+2000..U+206F
(112 code points)
PlaneBMP
ScriptsCommon (109 char.)
Inherited (2 char.)
Symbol setsPunctuation
Spaces
Format controls
Assigned111 code points
Unused1 reserved code points
6 deprecated
Unicode version history
1.0.0 (1991)67 (+67)
1.1 (1993)76 (+9)
3.0 (1999)83 (+7)
3.2 (2002)95 (+12)
4.0 (2003)97 (+2)
4.1 (2005)106 (+9)
5.1 (2008)107 (+1)
6.3 (2013)111 (+4)
Unicode documentation
Code chart ∣ Web page
Note: [1][2]

Additional punctuation characters are in the Supplemental Punctuation block and sprinkled in dozens of other Unicode blocks.

Block edit

General Punctuation[1][2][3]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+200x NQ
 SP 
MQ
 SP 
EN
 SP 
EM
 SP 
 3/M 
SP
 4/M 
SP
 6/M 
SP
F
 SP 
P
 SP 
TH
 SP 
H
 SP 
ZW
 SP 
ZW
 NJ 
 ZW 
J
 LRM   RLM 
U+201x  NB 
U+202x L
 SEP 
P
 SEP 
 LRE   RLE   PDF   LRO   RLO   NNB 
SP
U+203x
U+204x
U+205x MM
  SP  
U+206x  WJ   ƒ()    ×     ,     +    LRI   RLI   FSI   PDI  I
 SS 
A
 SS 
I
 AFS 
A
 AFS 
NA
 DS 
NO
 DS 
Notes
1.^ As of Unicode version 15.1
2.^ Grey area indicates non-assigned code point
3.^ Unicode code points U+206A - U+206F are deprecated as of Unicode version 3.0

Several characters in this block are usually not rendered with a directly visible glyph. Ten whitespace characters U+2002 through U+200B (fixed en or 12em, em, 13em, 14em, 16em, figure and punctuation space, variable thin or 15em and hair space, fixed zero-width space) and U+205F (math medium or 29 em space) differ by horizontal width, while U+2000 and U+2001 (en and em quad) are effectively aliases of U+2002 and U+2003, respectively; another two, U+202F and U+2060 (ill-termed word joiner) are variants of U+2009 or U+2004 and U+200B that prohibit line-breaks. Three zero-width characters U+200B through U+200D (space, non-joiner and joiner) differ in how they affect ligation and shaping of adjacent letters such as contextual forms in Arabic. Eleven invisible characters U+200E, U+200F (left-to-right and right-to-left mark), U+202A through U+202E (embeds, pops and overrides) and U+2066 through U+2069 (isolates) control the directionality of text unless higher-level markup overrides them. There are explicit line and paragraph separators at U+2028 and U+2029.

Emoji edit

The General Punctuation block contains two emoji: U+203C and U+2049.[3][4]

The block has four standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the two emoji, both of which default to a text presentation.[5]

Emoji variation sequences
U+ 203C 2049
base code point
base+VS15 (text) ‼︎ ⁉︎
base+VS16 (emoji) ‼️ ⁉️

History edit

The following Unicode-related documents record the purpose and process of defining specific characters in the General Punctuation block:

References edit

  1. ^ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. ^ "UTR #51: Unicode Emoji". Unicode Consortium. 2023-09-05.
  4. ^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023-02-01.
  5. ^ "UTS #51 Emoji Variation Sequences". The Unicode Consortium.