This post originated from an RSS feed registered with Ruby Buzz
by Red Handed.
Original Post: Closing in on Unicode with Jcode
Feed Title: RedHanded
Feed URL:
Feed Description: sneaking Ruby through the system
Patrick Hall has a great article on using the Jcode module for Ruby, which provides a more natural support for hacking Unicode strings. He has a few simple unit tests that illustrate failings in the Jcode library and leaves right there for us to glare at.
def test_reverse
s = "ÎαλημÎÏα κÏÏμε!"
srev = s.reverse
assert_equal(s,srev) # fails
def test_index
# String#index isn't Unicode-aware, it's counting bytes
# there are ways aorund this, but...
s = "ÎαλημÎÏα κÏÏμε!"
assert_equal(0, s.index('Î')) # passes
assert_equal(1, s.index('α')) # fails!
assert_equal(3, s.index('α')) # passes; 3rd byte!
Sure, we’ll have all the answers in the future, but, for now, I’d say some patches to Jcode are in order. Or, to spirit up some Python mimickry:
class UString < String
# Show u-prefix as in Python
def inspect; "u#{ super }" end
# Count multibyte characters
def length; self.scan(/./).length end
# Reverse the string
def reverse; self.scan(/./).reverse.join end
module Kernel
def u( str ) str.gsub(/U\+([0-9a-fA-F]{4,4})/u){["#$1".hex ].pack('U*')}
str = u"Ruby-èª"
str.length #=> 6
str.reverse #=> u"èª-ybuR"
Anyway, Patrick’s blog is a great tour through easy digestable tidbits about Unicode. (Thanks, Jonas!)