HTML Reference Manual

Complete list of HTML tags

HTML ASCII Reference Manual HTML Color Matching

HTML Character Sets

The character set determines how the bytes representing the text of your HTML document are translated into readable characters. It can be based on ISO010646 code points explain numeric or hexadecimal character references ("〹" or "ሴ 2.0 is consistent and independent of the selected character set.

HTML Character Sets

To display HTML pages correctly, the browser must know which character set to use.

The character set used in the early days of the World Wide Web was ASCII. ASCII supports 0-9 numbers, uppercase and lowercase English alphabet, and some special characters.

Complete ASCII Reference Manual.

Since many countries use characters that do not belong to ASCII, the default character set of modern browsers is ISO-8859-1.

Complete ISO-8859-1 Reference Manual.

If the web page uses a character set different from ISO-8859-1 character sets, should be specified in the <meta> tag.

ISO character sets

ISO character sets are international standards for different alphabets/Standard character sets defined by language.

The following lists different character sets used worldwide:

Character Set	Description	Scope of use
ISO-8859-1	Latin alphabet part 1	North America, Western Europe, Latin America, Caribbean, Canada, Africa
ISO-8859-2	Latin alphabet part 2	East Europe
ISO-8859-3	Latin alphabet part 3	SE Europe, Esperanto, other miscellaneous
ISO-8859-4	Latin alphabet part 4	Scandinavian/Baltic (and other languages not included in ISO-8859-1 of which part)
ISO-8859-5	Latin/Cyrillic part 5	languages using the ancient Slavic alphabet, such as Bulgarian, Belarusian, Russian, Macedonian
ISO-8859-6	Latin/Arabic part 6	languages using the Arabic alphabet
ISO-8859-7	Latin/Greek part 7	Modern Greek, as well as mathematical symbols derived from Greek
ISO-8859-8	Latin/Hebrew part 8	languages using Hebrew
ISO-8859-9	Latin 5 part 9	Turkish. In addition to the Turkish characters replacing the Icelandic script, the others are the same as ISO-8859-1 .
ISO-8859-10	Latin 6	Laplandic, Germanic, Inuit North American languages
ISO-8859-15	Latin 9 (also known as Latin 0)	with ISO 8859-1 Similarly, the euro symbol and some other characters have replaced some less commonly used symbols
ISO-2022-JP	Latin/Japanese part 1	Japanese
ISO-2022-JP-2	Latin/Japanese part 2	Japanese
ISO-2022-KR	Latin/Korean part 1	Korean

Unicode standard

Since all the character sets listed above have capacity limits and are not compatible with multilingual environments, the Unicode Consortium has developed the Unicode standard.

The Unicode standard covers all characters, punctuation, and symbols in the world.

Unicode can handle text data processing, storage, and exchange on any platform, program, or language.

Unicode Consortium

The Unicode Consortium has developed the Unicode standard. Their goal is to replace the existing character sets with the standard Unicode Transformation Format (UTF).

The Unicode standard has been successful, in XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0 In WML, Unicode has been implemented. Unicode is also supported in many operating systems and all modern browsers.

The Unicode Consortium collaborates with leading standard development organizations, such as ISO, W3C and ECMA.

Unicode can be compatible with different character sets. The most commonly used encoding method is UTF-8 and UTF-16:

Character Set	Description
UTF-8	UTF8 characters can be 1-4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 Backward compatible with ASCII. UTF-8 is the preferred encoding for web pages and emails.
UTF-16	16 bit Unicode transformation format is a Unicode variable character encoding that can encode the entire Unicode instruction table. UTF-16 Mainly used in operating systems and environments such as Microsoft's Windows 2000/XP/2003/Vista/CE and Java and .NET bytecode environments.

Tip: the leading 256 Unicode character set characters correspond to 256 ISO-8859-1 characters.

Tip: All HTML 4 processors have supported UTF-8All XHTML and XML processors support UTF-8 and UTF-16!

HTML ASCII Reference Manual HTML Color Matching