The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Closing in on Unicode with Jcode

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.
Closing in on Unicode with Jcode Posted: Jun 12, 2005 12:14 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: Closing in on Unicode with Jcode
Feed Title: RedHanded
Feed URL: http://redhanded.hobix.com/index.xml
Feed Description: sneaking Ruby through the system
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Red Handed
Latest Posts From RedHanded

Advertisement

Patrick Hall has a great article on using the Jcode module for Ruby, which provides a more natural support for hacking Unicode strings. He has a few simple unit tests that illustrate failings in the Jcode library and leaves right there for us to glare at.

 def test_reverse
   s = "Καλημέρα κόσμε!" 
   srev = s.reverse
   assert_equal(s,srev) # fails
 end

 def test_index
   # String#index isn't Unicode-aware, it's counting bytes
   # there are ways aorund this, but...
   s = "Καλημέρα κόσμε!" 
   assert_equal(0, s.index('Κ')) # passes
   assert_equal(1, s.index('α')) # fails!
   assert_equal(3, s.index('α')) # passes; 3rd byte!
 end

Sure, we’ll have all the answers in the future, but, for now, I’d say some patches to Jcode are in order. Or, to spirit up some Python mimickry:

 class UString < String
   # Show u-prefix as in Python
   def inspect; "u#{ super }" end

   # Count multibyte characters
   def length; self.scan(/./).length end

   # Reverse the string
   def reverse; self.scan(/./).reverse.join end
 end

 module Kernel
   def u( str )
     UString.new str.gsub(/U\+([0-9a-fA-F]{4,4})/u){["#$1".hex ].pack('U*')}
   end
 end 

 str = u"Ruby-語" 
 str.length   #=> 6
 str.reverse  #=> u"語-ybuR" 

Anyway, Patrick’s blog is a great tour through easy digestable tidbits about Unicode. (Thanks, Jonas!)

Read: Closing in on Unicode with Jcode

Topic: Happy thoughts on IRC Previous Topic   Next Topic Topic: How Powerful Pathname Is

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use