Ruby Buzz Forum - Closing in on Unicode with Jcode

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Closing in on Unicode with Jcode

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Red Handed

Posts: 1158
Nickname: redhanded
Registered: Dec, 2004

Red Handed is a Ruby-focused group blog.

Closing in on Unicode with Jcode

Posted: Jun 12, 2005 12:14 AM

This post originated from an RSS feed registered with Ruby Buzz by Red Handed.
Original Post: Closing in on Unicode with Jcode Feed Title: RedHanded Feed URL: http://redhanded.hobix.com/index.xml Feed Description: sneaking Ruby through the system	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Red Handed Latest Posts From RedHanded

Patrick Hall has a great article on using the Jcode module for Ruby, which provides a more natural support for hacking Unicode strings. He has a few simple unit tests that illustrate failings in the Jcode library and leaves right there for us to glare at.

 def test_reverse
   s = "ÎÎ±Î»Î·Î¼ÎÏÎ± ÎºÏÏÎ¼Îµ!" 
   srev = s.reverse
   assert_equal(s,srev) # fails
 end

 def test_index
   # String#index isn't Unicode-aware, it's counting bytes
   # there are ways aorund this, but...
   s = "ÎÎ±Î»Î·Î¼ÎÏÎ± ÎºÏÏÎ¼Îµ!" 
   assert_equal(0, s.index('Î')) # passes
   assert_equal(1, s.index('Î±')) # fails!
   assert_equal(3, s.index('Î±')) # passes; 3rd byte!
 end

Sure, we’ll have all the answers in the future, but, for now, I’d say some patches to Jcode are in order. Or, to spirit up some Python mimickry:

 class UString < String
   # Show u-prefix as in Python
   def inspect; "u#{ super }" end

   # Count multibyte characters
   def length; self.scan(/./).length end

   # Reverse the string
   def reverse; self.scan(/./).reverse.join end
 end

 module Kernel
   def u( str )
     UString.new str.gsub(/U\+([0-9a-fA-F]{4,4})/u){["#$1".hex ].pack('U*')}
   end
 end 

 str = u"Ruby-èª" 
 str.length   #=> 6
 str.reverse  #=> u"èª-ybuR"

Anyway, Patrick’s blog is a great tour through easy digestable tidbits about Unicode. (Thanks, Jonas!)

Read: Closing in on Unicode with Jcode

Previous Topic

Next Topic


	Web Artima.com