Displaying Japanese Text File with Java

update 16/Jul/1997 [Japanese]

How to display Japanese text file

(1) java.version=1.1 environment only.
Use InputStreamReader to read text files with Japanese native code.

(2) 1.1 development environment (JDK1.1.x etc.) and 1.0.2 displaying environment (Netscape, MSIE etc.).
Use Native2Utf.java to convert Japanese native code to UTF-8 (compacted unicode).
This tool itself never analyzes encoding owing to InputStreamReader.

(3) 1.0.2 environment only.
First, use J2Uc (by Yasushi Kuno) to convert Japanese native code to unicode escape. J2Uc is almost compatible with native2ascii included with JDK 1.1.x.
Second, use Ue2Utf.java to convert unicode escape to UTF-8.
In cases of (2) and (3), you'll have UTF-8 files.
UTF-8 is a flexible length format of unicode (UCS). See more detail of UTF-8 in DataInputStream on JDK 1.1.x API documentation .
readUTF() method of DataInputStream class produces Strings of ordinary fixed length unicode.

Display Test Applet with UTF-8 File

test102.utf
generated by Ue2Utf.java.

test11x.utf
generated by Native2Utf.java

ReadTest.java / DataInputStream2.java
Appearances on Sparc and WinNT

The difference between these display results is comes from difference of handling backslashes.
There is a bug in DataInputStream#readUTF(). DataInputStream2 is a modified class. (update 30/Jun/1997)

File Collection

Ue2Utf.java (Ue2Utf.class)
unicode-escape to UTF-8 converter for JDK 1.0.2
Replace Backslashes "\" in native file with "\\" before converting to unicode escape. Otherwise real "\n" strings and unicode escape can't be distinguished.
Native2Utf.java (Native2Utf.class)
Japanese native to UTF-8 converter for JDK 1.1.x
No special pre-process is required.
ReadTest11x.java ( ReadTest11x.class ReadTest11x$1.class )
Display test for JDK 1.1.x.
There is no need for reading UTF-8 with JDK 1.1.x?
JDK 1.0.2 can't display Japanese.

test.sj Japanese native test file (SJIS).
test.uni Unicode escape file (test.sj + native2ascii).
test102.utf (binary) UTF-8 file (test.uni + Ue2Utf).
test11x.utf (binary) UTF-8 file (test.sj + Native2Utf).