English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

HTML Reference Manual

Complete list of HTML tags

HTML Character Sets

The character set determines how the bytes representing the text of your HTML document are translated into readable characters. It can be based on ISO010646 code points explain numeric or hexadecimal character references ("〹" or "ሴ 2.0 is consistent and independent of the selected character set.

HTML Character Sets

To display HTML pages correctly, the browser must know which character set to use.

The character set used in the early days of the World Wide Web was ASCII. ASCII supports 0-9 numbers, uppercase and lowercase English alphabet, and some special characters.

Complete ASCII Reference Manual.

Since many countries use characters that do not belong to ASCII, the default character set of modern browsers is ISO-8859-1.

Complete ISO-8859-1 Reference Manual.

If the web page uses a character set different from ISO-8859-1 character sets, should be specified in the <meta> tag.

ISO character sets

ISO character sets are international standards for different alphabets/Standard character sets defined by language.

The following lists different character sets used worldwide:

Character SetDescriptionScope of use
ISO-8859-1Latin alphabet part 1North America, Western Europe, Latin America, Caribbean, Canada, Africa
ISO-8859-2Latin alphabet part 2East Europe
ISO-8859-3Latin alphabet part 3SE Europe, Esperanto, other miscellaneous
ISO-8859-4Latin alphabet part 4Scandinavian/Baltic (and other languages not included in ISO-8859-1 of which part)
ISO-8859-5Latin/Cyrillic part 5languages using the ancient Slavic alphabet, such as Bulgarian, Belarusian, Russian, Macedonian
ISO-8859-6Latin/Arabic part 6languages using the Arabic alphabet
ISO-8859-7Latin/Greek part 7Modern Greek, as well as mathematical symbols derived from Greek
ISO-8859-8Latin/Hebrew part 8languages using Hebrew
ISO-8859-9Latin 5 part 9Turkish. In addition to the Turkish characters replacing the Icelandic script, the others are the same as ISO-8859-1 .
ISO-8859-10Latin 6Laplandic, Germanic, Inuit North American languages
ISO-8859-15Latin 9 (also known as Latin 0)with ISO 8859-1 Similarly, the euro symbol and some other characters have replaced some less commonly used symbols
ISO-2022-JPLatin/Japanese part 1Japanese
ISO-2022-JP-2Latin/Japanese part 2Japanese
ISO-2022-KRLatin/Korean part 1Korean

Unicode standard

Since all the character sets listed above have capacity limits and are not compatible with multilingual environments, the Unicode Consortium has developed the Unicode standard.

The Unicode standard covers all characters, punctuation, and symbols in the world.

Unicode can handle text data processing, storage, and exchange on any platform, program, or language.

Unicode Consortium

The Unicode Consortium has developed the Unicode standard. Their goal is to replace the existing character sets with the standard Unicode Transformation Format (UTF).

The Unicode standard has been successful, in XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0 In WML, Unicode has been implemented. Unicode is also supported in many operating systems and all modern browsers.

The Unicode Consortium collaborates with leading standard development organizations, such as ISO, W3C and ECMA.

Unicode can be compatible with different character sets. The most commonly used encoding method is UTF-8 and UTF-16:

Character SetDescription
UTF-8UTF8 characters can be 1-4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 Backward compatible with ASCII. UTF-8 is the preferred encoding for web pages and emails.
UTF-1616 bit Unicode transformation format is a Unicode variable character encoding that can encode the entire Unicode instruction table. UTF-16 Mainly used in operating systems and environments such as Microsoft's Windows 2000/XP/2003/Vista/CE and Java and .NET bytecode environments.

Tip: the leading 256 Unicode character set characters correspond to 256 ISO-8859-1 characters.

Tip: All HTML 4 processors have supported UTF-8All XHTML and XML processors support UTF-8 and UTF-16!