The Artima Developer Community
Sponsored Link

Java Buzz Forum
CP850 charset - still in use :(

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Marc Logemann

Posts: 594
Nickname: loge
Registered: Sep, 2002

Marc Logemann is founder of www.logentis.de a Java consultancy
CP850 charset - still in use :( Posted: Mar 9, 2005 2:43 AM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Marc Logemann.
Original Post: CP850 charset - still in use :(
Feed Title: Marc's Java Blog
Feed URL: http://www.logemann.org/day/index_java.xml
Feed Description: Java related topics for all major areas. So you will see J2ME, J2SE and J2EE issues here.
Latest Java Buzz Posts
Latest Java Buzz Posts by Marc Logemann
Latest Posts From Marc's Java Blog

Advertisement

I am currently developing a program which interacts with data from Deutsche Post World Net. We speak of one of the largest logistic providers worldwide. For this program to work, i have to read in about 500MB of flatfile data i got on CD from Deutsche Post. I thought this is a perfect choice for NIO (i have not been using NIO so far and was excited).

So i ve written a small Testprogram to read in the data from filesystem and wondered why i dont get the german umlauts like öäü correctly. A first check with a HEX editor showed that the "ü" for example had a hex representation of 81. I was quite sure that in ISO-8859-1 the "ü" is not at 81. And indeed, it seems i am dealing with a different charset. After some more investigation i found out that they used CP850, a charset with its momentum at MS-DOS times. Great.

I though i can just switch the encoding in my sourcode, but then i realized that NIO doesnt support CP850, only plain java.io does. This is the end of the story regarding NIO usage and its even more frustrating because reading in 500mb of flatfile data would need any performance boost i can get, but ok.

It seems they didnt change the way of data distribution since the beginning of computing. I recently heard that they offer an alternative way of obtaining the data, perhaps via FTP and perhaps they can offer different charset of their files via this route. Let see. Dealing with encoding issues is allways a pleasure, because its never fast to solve and allways includes checking charset tables on some obsure sites in the internet.

Read: CP850 charset - still in use :(

Topic: Railing on performance with lighttpd Previous Topic   Next Topic Topic: [Feb 28, 2005 12:10 PST] 4 Links

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use