Decode any character.

The complete reference for Unicode and character encoding. Look up any of 138,571+ characters and see exactly how it's encoded in UTF-8, UTF-16, ASCII, Latin-1, Windows-1252, Shift-JIS, and more.

Try: é &

Common Questions

What is UTF-8?

UTF-8 is a variable-width character encoding that can represent every character in Unicode. It uses 1 to 4 bytes per character and is backwards-compatible with ASCII. It is the dominant encoding on the web, used by over 98% of websites.

UTF-8 reference →

What is mojibake?

Mojibake (文字化け) is garbled text that results from decoding bytes with the wrong character encoding. For example, reading UTF-8 text as Windows-1252 produces sequences like é instead of é.

Fix mojibake →

UTF-8 vs UTF-16

UTF-8 is variable-width (1–4 bytes) and ASCII-compatible. UTF-16 uses 2 bytes for most characters and 4 for supplementary ones. UTF-8 is preferred for files and web; UTF-16 is used internally by Windows, Java, and JavaScript engines.

UTF-16 reference →

What is a Unicode codepoint?

A codepoint is a number that uniquely identifies a character in the Unicode standard. Written as U+XXXX in hex. For example, U+0041 is the Latin capital letter A. Unicode defines over 140,000 codepoints across 17 planes.

Browse characters →

What is ASCII?

ASCII (American Standard Code for Information Interchange) is a 7-bit encoding defining 128 characters: English letters, digits, punctuation, and control codes. Every ASCII character has the same byte value in UTF-8, Latin-1, and Windows-1252.

ASCII reference →

What is a BOM?

A Byte Order Mark (BOM) is a special character (U+FEFF) placed at the start of a file to indicate encoding and byte order. UTF-8 BOM is EF BB BF; UTF-16 LE BOM is FF FE. BOMs are optional in UTF-8 but often cause problems.

Encode text →

Character Encodings

All encodings →

A character encoding is the rule that maps a character's number (codepoint) to the bytes stored on disk or sent over a network. Different encodings cover different languages and use different numbers of bytes — choosing the wrong one is the most common cause of garbled text.

UTF-8 Unicode

The dominant encoding for the web. Variable-width (1–4 bytes). Fully backwards-c...

Since 1993
UTF-16 LE Unicode

Little-endian UTF-16. Used internally by Windows, Java, and .NET. Variable-width...

Since 1996
UTF-16 BE Unicode

Big-endian UTF-16. Network byte-order variant of UTF-16. Used in some network pr...

Since 1996
UTF-32 LE Unicode

Fixed-width encoding using 4 bytes per character. Simple to process but memory-i...

Since 2003
UTF-32 BE Unicode

Fixed-width encoding using 4 bytes per character. Big-endian byte order. Rarely...

Since 2003
ASCII

The original 7-bit character encoding standard. Covers 128 characters: English l...

Since 1963
Latin-1 (ISO-8859-1)

Extends ASCII to 256 characters, covering most Western European languages. The f...

Since 1987
Windows-1252

Microsoft's extension of Latin-1. Assigns printable characters to the C1 control...

Since 1985
ISO-8859-2 (Latin-2)

Covers Central and Eastern European languages using Latin script: Polish, Czech,...

Since 1987
ISO-8859-5 (Cyrillic)

ISO standard for Cyrillic script. Covers Russian, Bulgarian, Serbian, Macedonian...

Since 1988
KOI8-R

Russian character encoding widely used in Unix systems and early internet. Desig...

Since 1993
Shift-JIS

Variable-width encoding for Japanese. Single-byte for ASCII and half-width kana,...

Since 1982
EUC-JP

Extended Unix Code for Japanese. Variable-width encoding common in Unix/Linux Ja...

Since 1991
GBK

Chinese national standard encoding for Simplified Chinese. Superset of GB2312. V...

Since 1993
Big5

Traditional Chinese encoding used in Taiwan, Hong Kong, and Macau. Variable-widt...

Since 1984

Browse by Block

All blocks →

Unicode organises its 138,571+ characters into named blocks — contiguous ranges of codepoints grouped by script or purpose. Select a block to browse every character it contains, or click any character to see how it's encoded across all supported encodings.

Got garbled text?

Paste it into our Mojibake Decoder to diagnose the encoding mismatch.