ActiveRecord model sorting
Sep 24, 2018 · 3 minute read · Commentsruby-on-railsruby
Ruby on Rails comes with ActiveRecord for persistence of data. It provides a powerful querying interface every developer working with Rails is familiar with. The methods are chainable and the most common queries can be expressed in a pretty readable way. For the others, you can drop one level lower and write Arel queries which is the library powering ActiveRecord under the hood.
Basic sorting
One of the tips for performance when fetching data is to rely on the database to do the sorting and not do it in Ruby. So while
User.where('created_at > ?', 6.months.ago).order(:created_at)
results in a query
SELECT "users".*
FROM "users"
WHERE (created_at > '2018-03-21 17:18:31.541280')
ORDER BY "users"."created_at" ASC
returning the user records sorted so that they can be just converted to instances of ActiveRecord models,
User.where('created_at > ?', 6.months.ago).sort_by(&:created_at)
will return identical data, the underlying query will be only
SELECT "users".*
FROM "users"
WHERE (created_at > '2018-03-21 17:18:31.541280')
and the sorting will be performed in Ruby. Depending on the number of records returned the performance difference might be noticeable.
Unusual sorting
Because database is better at sorting doesn’t necessarily mean that every collection of ActiveRecord model instances will be sorted by a database. There are situations when the instances come from different sources and can be sorted only in Ruby. This happens for us with certain types of charts where a set of model instances is collected and then sorted for display.
Recently, we touched this code and while testing it discovered a surprising property of sorting. Let’s see an example
u1 = User.new id: 5, name: 'Jiří'
u2 = User.new id: 7, name: 'Marie'
u3 = User.new id: 3, name: 'Petr'
[u1, u2, u3].shuffle.sort.map(&:name)
# => ["Petr", "Jiří", "Marie"]
That’s pretty reasonable. sort
relies on <=>
implemented by the sorted items. ActiveRecord models implement this method by delegating to the primary column key which is a pretty reasonable behaviour. So what happens when the model instances don’t have IDs, e.g. in tests which try not to touch database if possible:
u1 = User.new name: 'Jiří'
u2 = User.new name: 'Marie'
u3 = User.new name: 'Petr'
[u1, u2, u3].shuffle.sort.map(&:name)
The order is random. More precisely, the order is random because of the shuffle
call. Leaving it out, it will always be ["Jiří", "Marie", "Petr"]
, i.e. the original order. That is given by the fact that <=>
compares nil
s against each other which yields nil
and sort
treats the two values as incomparable, not changing their order. OK, that makes sense, it’s maybe bit strange that it doesn’t fail in any way but alright.
Let’s up the ante and mix instances with and without ID. This scenario seems very unlikely but since we’re already playing with the language…
u1 = User.new name: 'Jiří'
u2 = User.new id: 7, name: 'Marie'
u3 = User.new id: 3, name: 'Petr'
[u1, u2, u3].shuffle.sort.map(&:name)
# => ArgumentError: comparison of User with User failed
Well, this is surprising! As long as you compare only instances with ID it works, if you compare only instances without an ID, it’s a silent noop, but mix those two together and it blows up. I find it rather confusing.
To be fair, the issue is not specific to ActiveRecord, it is a Ruby thing:
[nil, nil, nil].sort
# => [nil, nil, nil]
[5, 7, 3].sort
# => [3, 5, 7]
[5, 7, nil].sort
# => ArgumentError: comparison of Integer with nil failed
The difference here is the error message. Integers and nils are not comparable, that makes sense. But intuitively a User
should be comparable with another User
.
Takeaway
The takeaway is: if you sort
in Ruby, be prepared for some surprising behaviour here and there. Luckily, it is not likely to come up frequently.
We're looking for developers to help us save energy
If you're interested in what we do and you would like to help us save energy, drop us a line at jobs@enectiva.cz.