ActiveRecord model sorting

Ruby on Rails comes with ActiveRecord for persistence of data. It provides a powerful querying interface every developer working with Rails is familiar with. The methods are chainable and the most common queries can be expressed in a pretty readable way. For the others, you can drop one level lower and write Arel queries which is the library powering ActiveRecord under the hood.

Basic sorting

One of the tips for performance when fetching data is to rely on the database to do the sorting and not do it in Ruby. So while

User.where('created_at > ?', 6.months.ago).order(:created_at)

results in a query

SELECT "users".* 
  FROM "users" 
  WHERE (created_at > '2018-03-21 17:18:31.541280')  
  ORDER BY "users"."created_at" ASC

returning the user records sorted so that they can be just converted to instances of ActiveRecord models,

User.where('created_at > ?', 6.months.ago).sort_by(&:created_at)

will return identical data, the underlying query will be only

SELECT "users".* 
  FROM "users" 
  WHERE (created_at > '2018-03-21 17:18:31.541280')  

and the sorting will be performed in Ruby. Depending on the number of records returned the performance difference might be noticeable.

Unusual sorting

Because database is better at sorting doesn’t necessarily mean that every collection of ActiveRecord model instances will be sorted by a database. There are situations when the instances come from different sources and can be sorted only in Ruby. This happens for us with certain types of charts where a set of model instances is collected and then sorted for display.

Recently, we touched this code and while testing it discovered a surprising property of sorting. Let’s see an example

u1 = User.new id: 5, name: 'Jiří'
u2 = User.new id: 7, name: 'Marie'
u3 = User.new id: 3, name: 'Petr'
[u1, u2, u3].shuffle.sort.map(&:name)
# => ["Petr", "Jiří", "Marie"]

That’s pretty reasonable. sort relies on <=> implemented by the sorted items. ActiveRecord models implement this method by delegating to the primary column key which is a pretty reasonable behaviour. So what happens when the model instances don’t have IDs, e.g. in tests which try not to touch database if possible:

u1 = User.new name: 'Jiří'
u2 = User.new name: 'Marie'
u3 = User.new name: 'Petr'
[u1, u2, u3].shuffle.sort.map(&:name)

The order is random. More precisely, the order is random because of the shuffle call. Leaving it out, it will always be ["Jiří", "Marie", "Petr"], i.e. the original order. That is given by the fact that <=> compares nils against each other which yields nil and sort treats the two values as incomparable, not changing their order. OK, that makes sense, it’s maybe bit strange that it doesn’t fail in any way but alright.

Let’s up the ante and mix instances with and without ID. This scenario seems very unlikely but since we’re already playing with the language…

u1 = User.new name: 'Jiří'
u2 = User.new id: 7, name: 'Marie'
u3 = User.new id: 3, name: 'Petr'
[u1, u2, u3].shuffle.sort.map(&:name)
# => ArgumentError: comparison of User with User failed

Well, this is surprising! As long as you compare only instances with ID it works, if you compare only instances without an ID, it’s a silent noop, but mix those two together and it blows up. I find it rather confusing.

To be fair, the issue is not specific to ActiveRecord, it is a Ruby thing:

[nil, nil, nil].sort
# => [nil, nil, nil]

[5, 7, 3].sort
# => [3, 5, 7]

[5, 7, nil].sort
# => ArgumentError: comparison of Integer with nil failed 

The difference here is the error message. Integers and nils are not comparable, that makes sense. But intuitively a User should be comparable with another User.

Takeaway

The takeaway is: if you sort in Ruby, be prepared for some surprising behaviour here and there. Luckily, it is not likely to come up frequently.

We're looking for developers to help us save energy

If you're interested in what we do and you would like to help us save energy, drop us a line at jobs@enectiva.cz.

comments powered by Disqus