Unicode
Navigate Language Fundamentals topic: ) |
Most Java program text consists of ASCII characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, π (which is the Greek Lowercase Letter pi) is a valid Java identifier:
![]() | Code section 3.100: Pi.doubleπ=Math.PI; |
and in a string literal:
![]() | Code section 3.101: Pi literal.Stringpi="π"; |
Unicode escape sequences
[edit | edit source]Unicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequences may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).
Unicode escape sequences consist of
- a backslash '
\
' (ASCII character 92, hex 0x5c), - a '
u
' (ASCII 117, hex 0x75) - optionally one or more additional '
u
' characters, and - four hexadecimal digits (the characters '
0
' through '9
' or 'a
' through 'f
' or 'A
' through 'F
').
Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.[1]
Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler - in addition, they are not very compact.
One can find a full list of the characters here.
π may also be represented in Java as the Unicode escape sequence\u03C0
. Thus, the following is a valid, but not very readable, declaration and assignment:
![]() | Code section 3.102: Unicode escape sequences for Pi.double\u03C0=Math.PI; |
The following demonstrates the use of Unicode escape sequences in other Java syntax:
![]() | Code section 3.103: Unicode escape sequences in a string literal.// Declare Strings pi and quote which contain \u03C0 and \u0027 respectively:Stringpi="\u03C0";Stringquote="\u0027"; |
Note that a Unicode escape sequence functions just like any other character in the source code. E.g., \u0022
(double quote, ") needs to be quoted in a string just like ".
![]() | Code section 3.104: Double quote.// Declare Strings doubleQuote1 and doubleQuote2 which both contain " (double quote):StringdoubleQuote1="\"";StringdoubleQuote2="\\u0022";// "\u0022" doesn't work since """ doesn't work. |
International language support
[edit | edit source]The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.
The following is thus perfectly valid Java code; it contains Chinese characters in the class and variable names as well as in a string literal:
![]() | Code listing 3.50: 哈嘍世界.javapublicclass哈嘍世界{privateString文本="哈嘍世界";} |