Adding rubocop-rspec to a legacy codebase

In a recent article I described how to use rspec --bisect tix flaky tests. Apart from the test failures described in that post, we’ve also encountered another problem:

#<Double (anonymous)> was originally created in one example but has leaked into another example and can no longer be used. rspec-mocks' doubles are designed to only last for one example, and you need to create a new one in each example you wish to use it for.

This warning is thrown by RSpec itself. Since I didn’t have a recent occurrence of this problem to track down, I decided to at least improve the test suite by installing RuboCop RSpec extension. Enectiva is a not-small code base with a long history, so adding a linter takes effort. Below, are my comments from the process.


First, I installed the gem and disabled all new cops in .rubocop-todo.yml by:

  Enabled: false

  Enabled: false

  Enabled: false

  Enabled: false

Commit by commit, I enabled Capybara, FactoryBot and Rails groups of cops with few updates of the code base.

Then, the fun part started. I disabled all RSpec cops individually and started reviewing them one by one. While this process is tiresome, it gives me an opportunity to document each decision clearly and individually. Also, some cops lead to mass updates and separating them into commits keep each change set reasonable-ish. Having a giant ball of changes would be a madness. Gradual enabling allows to run the cops individually by:

rubocop --auto-correct --only RSpec/... spec/

which makes each run significantly faster.

I ended up enabling most of the cops. If it supported multiple styles, I preferred the one creating smaller diff and opened it up to vote in close cases or when I had a negative gut reaction to the enforced style.

The cops cap be divided into five rough groups:

  1. Already satisfied ones - I just enabled them :tada:
  2. Not satisfied but auto-correctable - Rubocop did all the work for me :rocket:
  3. Not satisfied and not auto-correctable - depending on the complexity, they could take anything from a minute to hours to fix
  4. Not satisfied and not auto-correctable and with too many violations - I didn’t enable them because it would take forever to manually fix all the violations
  5. The rest

What follows are my comments mostly about categories 2-5.


This was the motivating cop and it was rough. It uncovered:

  1. Use of double instead of instance_double or class_double
  2. Instance and class methods being mocked at the same time, typically an ActiveRecord model
  3. Incorrectly named mocks, e.g. double('ClassThatHasBeenRenamedToSomethingElse')
  4. Dead code
  5. Mocked lambdas
  6. Abstract methods not implemented in the parent class: ```ruby class Greeter def initialize(name) @name = name end end

class FormalGreeter < Greeter def hello “Good morning, #{@name}” end end

instance_double(Greeter, hello: ‘fixed string’) `` 0. Message chains mocked as a single object, relying onas_null_objectbehaviour, this often appeared when a method on a class returns a dumb data object, the original class is mocked as a null object and mocks getter methods on the result object as well 0. Deeply nestedcontextblocks which define a double at the top and then addallows inbefore` blocks on deeper levels. Big red flag with “bad test design” printed across

It took a long time, I ended up disabling the cop for a bunch of files with too many offenses but it still improved the test suite.


Surprisingly, there were some instances. Probably started out as message chains and then got refactored into a single method.


There were so many instances of scattered lets, auto-corrected.


We had pretty even split of formats: allow(SomeClass).to_receive(:method).and_return(5) and allow(SomeClass).to_receive(:method) { 5 }. After a discussion, we ended up unifying on and_return syntax.


There were 14 offenses. Some of them were copy & paste with the actual body different (just the name was incorrect) but there were few completely duplicated tests.


This cop uncovered several cases which were actually missed. Clearly, they were created by copy & paste but then never changed and remained duplicated. There was one case of a combinatoric test which defines several contexts and then combines them to test the outcome. While the form of the test is not ideal, I think it’s still a test worth keeping, so I ignored this file+cop combination.


Uncovered only two pairs of cheap tests, but it’s still a win.


Found two dozen cases that could be replaced by have_key. be_[something], be_a Type.


This cop is a friction point on our team. I’ve been a user of nested and deeply nested contexts for years but since I’ve been working on the same code base for long enough, it has gotten to bite me. Our test suite has literally thousands of tests with deep nesting, most of them are unreadable and changing anything in them is impossible without breaking other tests. I clearly remember the value the tests brought when writing the tests, but they are no longer beneficial. Unfortunately for me, other colleagues don’t share this view and keep introducing deeply nested contexts into the test suite.

This cop came up with 1018 offenses with nesting deeper than three levels, maxing out at 8! It’s unrealistic to resolve everything in a timely manner, so I had to keep it disabled.


Oh boy! This returned ~1400 offenses. They have a large overlap with RSpec/NestedGroups offenses but mostly come from our lack of knowledge that subject can be named. Disabled.


Is the exact same story - deeply nested contexts breed let blocks and they just multiply…


Another flame war inducing cop. I didn’t enable it. We had around 2.5K offenses, most of them with two expectations per test. Some of them could surely be rewritten using have_attributes but others not so much. I agree with the rule saying test only one thing in a test but there’s a subtlety: testing one thing doesn’t necessarily imply having a single expectation. We could narrow the divide between the two by writing loads of custom RSpec matchers. I have to admit, it’s not my favourite activity. Wiring a custom matcher that’s really useful takes time and if it gets used once or twice and doesn’t make the test significantly easier to read, it’s just not worth it.

One place, where I find pragmatic reasons for multiple expectations are feature tests that walk through a sequence of actions testing along the way. A typical test will be: go to the index action, see there’s the “no data” placeholder, click on a button to add a record, fill in the form, verify that it’s listed on the index page, click on edit, save, verify, click on delete, verify it’s gone. A purist would split this into a bunch of tests where the final state of one is the setup of another and say that this makes it clear which step fails and they would be right! Except the cost of the setup tends to be very high in a feature test, so I’ll rather write a single long multi-step test and sprinkle it with expectations in between.


Another cop with too many offenses. It points to bad test design. But there are cases when avoiding it very hard, namely when interfacing with a third-party library. Disabled.


This produced over 60 offenses. Some of them were fixable, but the majority was the result of ActiveRecord’s flow interface, e.g. allow(User).to receive_message_chain(:where, :order, :limit) { ... }. I excluded the files to make the cop pass but not to spread the test pattern. The solution is to declare a scope or a class method in the model.


Another good cop with an embarrassing number of offenses.


Around a dozen cases which might be a problem or a contributor to flaky tests. Took a bit of effort to manually fix all of them.


Had many offenses but all were auto-corrected.


This uncovered a handful of inconsistent test files. As i fixed them, more and more offenses turned up. I also had to add a transformation rule for an unusually named module.


This cop shows another sneaky side effect of nested contexts. Disabled for now due to the number of offenses.


Yielded three offenses. One case was a test making sure a block was not being executed by running expect(true).to eq false in it - converted to flunk. Other two are legitimate, yet still weird, tests of YAML configuration files making sure a specific literal appears in a collection.


I love that this cop fixed ~900 old tests :heart:.


Yeah, no :rolling_on_the_floor_laughing:. This cop has nearly 2K offenses. This cop promotes the use of let blocks or helper methods. let blocks lead to a bad test design, so the only option would be helper methods. It’s a good idea for simple unit tests but feature or request tests tend to have more complex setup. Shuffling it off to a helper function would require a very long descriptive name and Rubocop would still complain about method length. Disabled.


I managed to enable this cop but with a bit of fiddling to make it accept groups which use a helper method to generate boiler-plate tests.


Again, too many offenses to fix it in one go for very, very weird reasons.


Enabled with the explicit option which makes it easier to read individual tests (and makes looking up usage simpler as well).


Disabled because there are so many exceptions (feature, request, controller, view tests) that it doesn’t add enough value to justify its execution.


There are too many offenses, and I don’t agree with Rubocop about a significant portion of them. I try to go for the wording which makes the most sense in the documentation output format, so if it reads well I’m not including extra verbiage just to please Rubocop.

This still leaves quite a few cops partially or completely disabled, but it was still a good effort. Obviously, the state of your codebase and the sensibilities of your team will be different, but I hope this gives you an idea of what you might encounter if you decide to install rubocop-rspec.

We're looking for developers to help us save energy

If you're interested in what we do and you would like to help us save energy, drop us a line at

comments powered by Disqus