This post originated from an RSS feed registered with Ruby Buzz
by Patrick Lenz.
Original Post: Slow queries: Rails habtm gotcha
Feed Title: poocs.net
Feed URL: http://feeds.feedburner.com/poocsnet
Feed Description: Personal weblog about free and open source software, personal development projects and random geek buzz.
Back in the old days when some brave souls started with Rails, the framework would automatically supply helper methods to check the availability of associated records for every has_many or has_and_belongs_to_many association in the form of has_foos?.
For example, given a declaration like this:
class User
has_and_belongs_to_many :interests
end
would get you User#has_interests? back in the days.
Fast forward to the present, these methods have long been removed from Rails core. There’s no 1:1 replacement, really. When you want to check a collection in a conditional for whether or not it’s empty, there are several things you can do.
Association Proxies, a quick review
Basically, every collection is a derivative of the Array class, as such most of Arrays methods have been (re-)implemented for the respective collection proxies.
First of all, there’s collection#length, which is the most brutal of them all because it simply loads the whole collection into memory and runs #size on it.
Next up is collection#count, which uses an SQL COUNT() statement to just fetch the number of associated records and is usually the quickest way at that.
Finally, there’s collection#size, which is basically a combination of both length and count since it behaves differently depending on whether or not the collection has already been loaded at the time it is being called. Given the collection has already been loaded (by a prior statement in your code, what have you) it’s identical to collection#length. If it hasn’t been loaded, its behavior is identical to collection#count.
Best practice
Judging from the rundown above, it’s usually best practice to simply use collection#size and trust it to do the right thing. In case you’re planning to use a collection as soon as you found out there are records associated with it, use collection#length to load the connection right away and then loop over it. Beware, however, that this is advisable only for those cases that don’t use pagination. (You’re using will_paginate for that, aren’t you?)
In a conditional, it makes most sense to use collection#empty?, which uses size#zero? internally.
unless user.interests.empty?
# ...
end
The problem
99% of what I said above applies to both has_many and has_and_belongs_to_many associations. Admittedly, poor little habtm has had its momentum and seems to be on the way out of the cool kids’ block. Still, there are a few cases where it’s still the best fit (such as the User#interests example from first section of this article) and some people *do* have Rails code that’s been in existence for a few years. (Well, maybe only I do.)
So, the 1% that behaves differently is that size on a habtm collection is not as intelligent as its little brother from the has_many camp. In fact, it’s pretty dumb because it’s in every case behaving identical to length meaning it’s loading the whole collection. Ouch.
>> user.interests.size
Join Table Columns (0.001284) SHOW FIELDS FROM interests_users
Interest Load (0.001263) SELECT * FROM interests
INNER JOIN interests_users
ON interests.id = interests_users.interest_id
WHERE (interests_users.user_id = 142841 )
The solution
You have to fill in the gaps of habtm#size’s lack of intelligence yourself. If you trap over habtm queries that take way too long for your taste, it might be about time to swap out that size call for an always speedy count call.
>> u.interests.count
Interest Columns (0.001006) SHOW FIELDS FROM interests
SQL (0.009749) SELECT count(*) AS count_all FROM interests
INNER JOIN interests_users
ON interests.id = interests_users.interest_id
WHERE (interests_users.user_id = 142841 )
You would then also avoid using collection#empty? in your conditionals, since that would then resort back to using size internally. Instead, use the count#zero? form or define your own instance methods on the association.
if user.interests.count.zero?
# ..
end
Postscript
To actually find out what Rails is transforming your method calls into (in terms of generated SQL) I prefer to use this technique by the always-wise Jamis Buck. Redirecting the ActiveRecord logger to the shell’s standard output will intermix the SQL statements with the output of your method calls, making everything relevant visible at first glance.