This page contains an archived post to the Java Answers Forum made prior to February 25, 2002.
If you wish to participate in discussions, please visit the new
Artima Forums.
Message:
SHIFT_JIS to EUC_JP conversion does not work for some Kana characters.
Posted by Sanjay Agnani on February 02, 2001 at 1:16 AM
Hi, I am trying to convert from SHIFT_JIS to EUC_JP in a Servlet on a Windows 2000 PC. The conversion code I am using is as follows : try { temp = request.getParameter("PrivateArea"); privateArea = new String(temp.getBytes"8859_1"), "JISAutoDetect"); System.out.println( "PrivateArea :" + �@�@�@�@ privateArea + ":"); printBytes(privateArea.getBytes(), �@�@ "PrivateArea"); System.out.println(); eucString = new String(privateArea.getBytes("EUC_JP")); System.out.println("eucString :" + �@�@ �@�@�@�@�@�@�@�@�@�@�@eucString +�@":"); System.out.println("eucString.getBytes().length :" �@�@ �@�@+ eucString.getBytes().length + �@�@�@�@�@�@�@�@�@�@�@�@�@�@":"); printBytes(eucString.getBytes(), "EUC"); System.out.println(); encodeString = URLEncoder.encode(eucString); System.out.println("encodeString :" + �@�@ �@encodeString + ":"); System.out.println("encodeString.getBytes).length�@:" +�@ �@�@ �@�@�@�@ encodeString.getBytes().length + �@�@�@�@�@�@�@�@�@�@�@�@�@ ":"); } catch(java.io.UnsupportedEncodingException ex) { System.err.println(ex); } ****************************************** I am able to convert most characters like �����������@or �A�C�E�G�I. The output is as follows : PrivateArea :�A�C�E�G�I: PrivateArea[0] = 0x83 PrivateArea[1] = 0x41 PrivateArea[2] = 0x83 PrivateArea[3] = 0x43 PrivateArea[4] = 0x83 PrivateArea[5] = 0x45 PrivateArea[6] = 0x83 PrivateArea[7] = 0x47 PrivateArea[8] = 0x83 PrivateArea[9] = 0x49 eucString :����������: eucString.getBytes().length :10: EUC[0] = 0xa5 EUC[1] = 0xa2 EUC[2] = 0xa5 EUC[3] = 0xa4 EUC[4] = 0xa5 EUC[5] = 0xa6 EUC[6] = 0xa5 EUC[7] = 0xa8 EUC[8] = 0xa5 EUC[9] = 0xaa encodeString :%A5%A2%A5%A4%A5%A6%A5%A8%A5%AA: encodeString.getBytes().length :30: ******************************************************* PrivateArea:����������: PrivateArea[0] = 0x82 PrivateArea[1] = 0xa0 PrivateArea[2] = 0x82 PrivateArea[3] = 0xa2 PrivateArea[4] = 0x82 PrivateArea[5] = 0xa4 PrivateArea[6] = 0x82 PrivateArea[7] = 0xa6 PrivateArea[8] = 0x82 PrivateArea[9] = 0xa8 eucString :����������: eucString.getBytes().length :10: EUC[0] = 0xa4 EUC[1] = 0xa2 EUC[2] = 0xa4 EUC[3] = 0xa4 EUC[4] = 0xa4 EUC[5] = 0xa6 EUC[6] = 0xa4 EUC[7] = 0xa8 EUC[8] = 0xa4 EUC[9] = 0xaa encodeString :%A4%A2%A4%A4%A4%A6%A4%A8%A4%AA: encodeString.getBytes().length :30: ******************************************************* But I am not able to convert some characters like : ������ or ����������. The output of my Servlet is as follows : PrivateArea :����������: PrivateArea[0] = 0x83 PrivateArea[1] = 0x89 PrivateArea[2] = 0x83 PrivateArea[3] = 0x8a PrivateArea[4] = 0x83 PrivateArea[5] = 0x8b PrivateArea[6] = 0x83 PrivateArea[7] = 0x8c PrivateArea[8] = 0x83 PrivateArea[9] = 0x8d eucString :: eucString.getBytes().length :0: encodeString :: encodeString.getBytes().length :0: ******************************************************* PrivateArea:������: PrivateArea[0] = 0x82 PrivateArea[1] = 0xe7 PrivateArea[2] = 0x82 PrivateArea[3] = 0xe8 PrivateArea[4] = 0x82 PrivateArea[5] = 0xe9 PrivateArea[6] = 0x82 PrivateArea[7] = 0xea PrivateArea[8] = 0x82 PrivateArea[9] = 0xeb eucString :: eucString.getBytes().length :0: encodeString :: encodeString.getBytes().length :0: ******************************************************* What I would like to know is : 1. Am I doing something wrong in my code ? 2. If answer to Q. 1 is No, then is this a bug in Sun's JDK 1.3 for Windows and Solaris (I tried this on Solaris also and I got the same results)? 3. Is there any solution or work-around for this problem that you'll know of ? Your opinion and help is highly appreciated. Thank you. Sanjay.
Replies:
|