Character Encoding maps a Character Set to units of a specific width and defines Byte Serialization and ordering rules. Many character sets have more than one encoding. For example, Java programs can represent Japanese character sets using the EUC-JP or Shift-JIS encodings, among others. Each encoding has rules for representing and serializing a character set.
Java uses two mechanisms to represent supported encodings: the initial mechanism was via String IDs. Later Java 1.4 introduced the type-safe Charset class. Note that the two mechanisms use different canonical names to represent the same encodings. Java 6 implementations are required to support only six encodings (US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16), but, in practice, they tend to support many more. Every platform has a default encoding which may differ.
If a standard Java API converts between byte and char data, there is a high probability of using the default encoding. Examples of such calls are as follows:
- String(byte[])
- String.getBytes()
- InputStreamReader(InputStream)
- OutputStreamWriter(OutputStream)
- Anything in FileReader or FileWriter
The problem with using the above calls is that you cannot predict whether data written on one machine will be read correctly on another even if you use the same application. An English Windows PC will use Windows-1252, a Russian Windows PC will use Windows-1251, and an Ubuntu machine will use UTF-8.
Using encoding methods like OutputStreamWriter(OutputStream, Charset) lets you set the encoding explicitly.
This property can be set programatically in a JVM as below:
System.setProperty("file.encoding", "UTF-8")
The alternate way to set the default file.encoding is by specifying encoding type in runtime arguments as below:
java -Dfile.encoding=<ENCODING NAME> com.x.Main
To use a specific encoding in component JVM, the file.encoding property has to be specified in JVM_PARAMS under Runtime Arguments as shown above. Default file encoding will be used if file.encoding is not provided in component runtime arguments which varies depending on operating system.