24 Character entity references in HTML 4

Note by Nick Urbanik. I made this page from http://www.w3.org/TR/1999/REC-html401-19991224/sgml/entities.html and then realised that I should have used Latin-1 characters, Special characters and Symbols instead, probably writing a simple Perl program to parse them into a nice web page.

However, there mainly seems to be a difference with the Euro symbol, "€", according to this, though it says, "but have been modified to be valid XML 1.0 entity declarations."

24.1 Introduction to character entity references

A [17]character entity reference is an SGML construct that references a character of the [18]document character set.

This version of HTML supports several sets of character entity references:

The following sections present the complete lists of character entity references. Although, by convention, [24][ISO10646] the comments following each entry are usually written with uppercase letters, we have converted them to lowercase in this specification for reasons of readability.

24.2 Character entity references for ISO 8859-1 characters

The character entity references in this section produce characters whose numeric equivalents should already be supported by conforming HTML 2.0 user agents. Thus, the character entity reference ÷ is a more convenient form than ÷ for obtaining the division sign (÷).

To support these named entities, user agents need only recognize the entity names and convert them to characters that lie within the repertoire of [25][ISO88591].

Character 65533 (FFFD hexadecimal) is the last valid character in UCS-2. 65534 (FFFE hexadecimal) is unassigned and reserved as the byte-swapped version of ZERO WIDTH NON-BREAKING SPACE for byte-order detection purposes. 65535 (FFFF hexadecimal) is unassigned.

24.2.1 The list of characters

Portions © International Organization for Standardization 1986 Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies.

Character entity set. Typical invocation:

     
     %HTMLlat1;

24.3 Character entity references for symbols, mathematical symbols, and Greek letters

The character entity references in this section produce characters that may be represented by glyphs in the widely available Adobe Symbol font, including Greek characters, various bracketing symbols, and a selection of mathematical operators such as gradient, product, and summation symbols. To support these entities, user agents may support full [26][ISO10646] or use other means. Display of glyphs for these characters may be obtained by being able to display the relevant [27][ISO10646] characters or by other means, such as internally mapping the listed entities, numeric character references, and characters to the appropriate position in some font that contains the requisite glyphs. When to use Greek entities. This entity set contains all the letters used in modern Greek. However, it does not include Greek punctuation, precomposed accented characters nor the non-spacing accents (tonos, dialytika) required to compose them. There are no archaic letters, Coptic-unique letters, or precomposed letters for Polytonic Greek. The entities defined here are not intended for the representation of modern Greek text and would not be an efficient representation; rather, they are intended for occasional Greek letters used in technical and mathematical works.

24.3.1 The list of characters

Mathematical, Greek and Symbolic characters for HTML

Character entity set. Typical invocation:

     
     %HTMLsymbol;

Portions © International Organization for Standardization 1986: Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies.

Relevant ISO entity set is given unless names are newly introduced. New names (i.e., not in ISO 8879 list) do not clash with any existing ISO 8879 entity names. ISO 10646 character numbers are given for each character, in hex. CDATA values are decimal conversions of the ISO 10646 values and refer to the document character set. Names are ISO 10646 names.

Latin Extended-B

Greek

General Punctuation

Letterlike Symbols

Arrows

Mathematical Operators

Miscellaneous Technical

Geometric Shapes

Miscellaneous Symbols

24.4 Character entity references for markup-significant and internationalization characters

The character entity references in this section are for escaping markup-significant characters (these are the same as those in HTML 2.0 and 3.2), for denoting spaces and dashes. Other characters in this section apply to internationalization issues such as the disambiguation of bidirectional text (see the section on [28]bidirectional text for details). Entities have also been added for the remaining characters occurring in CP-1252 which do not occur in the HTMLlat1 or HTMLsymbol entity sets. These all occur in the 128 to 159 range within the CP-1252 charset. These entities permit the characters to be denoted in a platform-independent manner. To support these entities, user agents may support full [29][ISO10646] or use other means. Display of glyphs for these characters may be obtained by being able to display the relevant [30][ISO10646] characters or by other means, such as internally mapping the listed entities, numeric character references, and characters to the appropriate position in some font that contains the requisite glyphs.

24.4.1 The list of characters

Special characters for HTML

Character entity set. Typical invocation:

     
     %HTMLspecial;

Portions © International Organization for Standardization 1986: Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies.

Relevant ISO entity set is given unless names are newly introduced. New names (i.e., not in ISO 8879 list) do not clash with any existing ISO 8879 entity names. ISO 10646 character numbers are given for each character, in hex. CDATA values are decimal conversions of the ISO 10646 values and refer to the document character set. Names are ISO 10646 names.

C0 Controls and Basic Latin

Latin Extended-A

Spacing Modifier Letters

General Punctuation