This post originated from an RSS feed registered with Ruby Buzz
by Daniel Berger.
Original Post: Windows, Unicode and C programming tips
Feed Title: Testing 1,2,3...
Feed URL: http://djberg96.livejournal.com/data/rss
Feed Description: A blog on Ruby and other stuff.
I've recently been going back through my C extensions for Windows, updating them to be Unicode friendly. In part, this was inspired by Austin Ziegler, where he rightly points out that several of the current core classes choke if they come across a file that isn't ASCII.
Austin's vision (I think - correct me if I'm wrong Austin) is that extensions would be written in such a way as to use the ASCII or Wide versions of functions, based on a command line option. Let's say, a non-existant "-U". So, you as an extension writer, would be expected to write your (pseudo) code like so:
if("-U"){
SomeFuncW(); // Wide character version
}
else{
SomeFuncA(); // Standard version
}
This would work, but I have a problem with it. First, it's a pain in the arse to write code this way - it makes my code longer. Second, we would have to rewrite a *ton* of code (which we'll have to do anyway, though), and enforce this style on 3rd party developers. Lastly, there is no "-U" option. We'll have to add it to the core, or rely on something that already exists, such as -Ku, though that option is normally meant for Japanese encoding.
My solution recently, instead, has been to adopt an "always on" approach, meaning I define the UNICODE macro in my C extensions. Since the wide character versions of functions still work just fine with plain ASCII, I don't see a downside. I'm sure someone will jump in here and scold me for this, so I've worn my flame retardant underpants today, just in case.
Whether or not you agree with my approach, there are a couple of things you'll always want to do in your C extensions for Windows:
Always use TCHAR, not char
Wrap your Ruby to C string functions in the TEXT macro
How each of these behave depends on whether or not the UNICODE macro is set, and do the right thing either way. So, your Ruby extension should look something like this:
static VALUE some_func(VALUE self, VALUE rbString){
TCHAR* string = TEXT(StringValuePtr(rbString));
...
}
That's about it, really, but a little can go a long way. :)