This post originated from an RSS feed registered with Ruby Buzz
by Ryan Davis.
Original Post: Space vs Time
Feed Title: Polishing Ruby
Feed URL: http://blog.zenspider.com/index.rdf
Feed Description: Musings on Ruby and the Ruby Community...
Everyone knows that ruby isn't fast. The real irony is that our profiler is dreadfully slow, making profiling a task that some don't want to deal with. Shugo answered this concern recently by writing a new profiler that is implemented in C. It is much much faster. Running 50,000 iterations of my simple factorial benchmark you get:
Not bad, eh? While Shugo's is faster, it is also rather... shall we say, inelegant? The native profiler clocks in at an elegant 65 lines of fairly readable code. That is what makes it so slow actually. It is sacrificing speed for simplicity. Shugo's code is a total of 701 lines of C & ruby. With a 10x increase in size but a 20x increase in performance (for my very pathological example), Shugo's profiler is sacrificing simplicity for speed. When you need the absolute fastest profiler out there, Shugo's profiler is the way to go, but I wouldn't want to maintain it.
I wanted to experiment with this thought: why can't you have both, or at least sacrifice a little of both for an overall bigger gain? I started by trying to port Shugo's code back to ruby. Turns out that Shugo, being a ruby-internals guru, makes such use of ruby's deep innards that I couldn't fully port it back with my feeble ruby-internals skills. I got a fair portion of it done, but didn't want to attempt to pull in some of the internal data structures that he was using. By the time I gave up, I did drop the size of the code by about one third.
OK. I'm no master like Shugo. I can live with that. No really... I'm not feeling the least bit self-conscious about it. :)
So, I went the other route, using what I now knew of Shugo's code as a guide. I started with the 65 line pure-ruby profiler and started porting it forward to C. I used RubyInline for this. Turns out all I had to port was the proc that you register with set_trace_func, and the code is fairly simple. As a result I have fairly readable code clocking in at 182 lines and a time of 6.441s. That is a off of some simplicity for some speed. I think I can live with that.
You can even see the difference, and to some extent, the trade-offs made (the data is a few days old):
I'm in the process of packing it up with a bunch of my other smaller hacks and will be publishing it soon. I hope to get more people looking at it and giving me feedback. Thanks.