Good test, bad test

I ❤ tests! They give me confidence that the code does what it’s supposed to do, they document behaviour, they let me discover edge cases when writing the code in the first place. All tests are not created equal, though, and I don’t love them all the same.

I’d like to put into words what are the properties which make tests good but… it depends. It depends on many factors: language, type of test, trust in cooperating components, criticality of the code under test, cost of running a test, time pressure and many more. There are definitely attributes which transcend all aspects (e.g. clarity, focus) but they might manifest differently based on circumstances of the test. What I can do is present two versions of a specific test and describe what I find valuable.

The test is fictional but heavily inspired by a test I discussed with a colleague during a review few weeks back. It’s in Ruby, using RSpec as the testing framework and FactoryBot for fixtures.

Bad test

describe PlayerCreator do
  describe 'validations' do
    before(:each) do
      @c = create :customer
      @s = create :sport, customer: @c
      @t = create :team, customer: @c
      @current_user = build_stubbed(:admin)
      @params = {
        player: {
          sport_idd: [@s.id],
          team_id: @t.id,
          nick: 'Alice',
        }
      }
    end

    it 'should allow leading and trailing whitespace in nick' do
      @params[:player][:nick] = "     Alice\n"
      result = PlayerCreator.(
        current_user: @current_user, params: @params
      )
      expect(result).to be_success
    end

    it 'should prohibit whitespace within nick value' do
      @params[:player][:nick] = "Al       ice"
      result = PlayerCreator.(
        current_user: @current_user, params: @params
      )
      expect_error(
        result,
        :nick,
        'Spaces, new lines and other whitespace characters are prohibited'
      )
    end
  end
end

Good test

describe PlayerCreator do
  it 'strips off leading and trailing whitespace(s) from the nick and succeeds' do
    result = PlayerCreator.(
      current_user: build_stubbed(:admin),
      params: player_attributes(nick: "     Alice\n"),
    )

    expect(result).to be_success
    expect(result[:model].nick).to eq 'Alice'
  end

  it 'prohibits whitespace within the nick' do
    result = PlayerCreator.(
      current_user: build_stubbed(:admin),
      params: player_attributes(nick: "Al  ice"),
    )

    expect_error(
      result,
      :nick,
      'Spaces, new lines and other whitespace characters are prohibited'
    )
  end

  def player_attributes(**fields)
    c = create :customer
    s = create :sport, customer: c
    t = create :team, customer: c

    attrs = attributes_for(:player, **fields) do |attrs|
      attrs.merge!(team_id: [t.id], sport_ids: [s.id])
    end
    { player: attrs }
  end
end

The differences

Test case names

RSpec has a built-in documentation output format which prints the full test case names and “paths”:

# bad test
PlayerCreator
    validations
        should allow leading and trailing whitespace in nick
        should prohibit whitespace within nick value

# good test
PlayerCreator
    strips off leading and trailing whitespace(s) from the nick
    prohibits whitespace within the nick

This format is great for understanding a code you’re not familiar with. Ideally, it should tell you what the code under test does. While it’s possible to discern it from the bad test’s output, the second example is more comprehensible in my opinion.

It is also in line with the BDD (behaviour driven development) nature of RSpec - to test (and describe) behaviour rather than implementation. It might be hard to accomplish with unit tests of a low level component, but I strive for as much documentational value as possible.

A related point, reflected in the preferred matcher style in RSpec is to write test names as statements of facts instead of wishes, e.g. prohibits whitespace within the nick instead of should prohibit whitespace within nick value.

Nesting

The good test doesn’t wrap the test cases in describe/context labelling them as validation test cases. My stance here is weaker, more nuanced and less dogmatic.

Often, the nesting wrapper breaks the flow of test case names discussed in the previous section. This needs to be balanced against the benefit of grouping the tests. This test file has only five test cases in total so I lean towards not including it. If the test file has thirty cases which fall into four groups, I’d be much more likely to use nesting.

Nesting contexts often leads to nesting lets which override each other on every level and make it hard to figure out what is the setup of a particular test. It doesn’t have to happen but it’s oh so tempting. I know I’ve written test files nested eight (maybe more) levels trying to test all combinations of inputs. They were extremely useful when untangling the existing spaghetti code with low test coverage but touching them months later feels like

via GIPHY

Less nesting has one weird technical benefit - it leaves you more characters per line for the test code. Most projects have some limit for line length. The lower it is, the more you feel every level of tabs you sacrifice to nesting.

Setup data

The test needs associated records. The bad test sets up this data in a before block and stores it in a bunch of instance variables referenced by the test itself. While instance variables are an extreme approach which might outrage many Ruby-ists, using the aforementioned lets might fly under the radar:

let(:c) { create :customer }
let(:s) { create :sport, customer: c }
let(:t) { create :team, customer: c }
let(:current_user) { build_stubbed(:admin) }
let(:params) do
  {
    player: {
      sport_idd: [s.id],
      team_id: t.id,
      nick: 'Alice',
    }
  }
end

It is basically the same thing as instance variables but with visual clutter, slightly worse support in IDEs and editors (e.g. jump to definition) and a promise that “it won’t even get run unless it’s needed”.

Tangent 1: As mentioned in the previous section, lets and nested contexts tend to make tests harder to read. This example is short and not deeply nested so it’s just more typing and visual clutter but I still prefer not having to deal with it.

Tangent 2: The promise that lets are evaluated only when used by the actual test is used as an argument for lets being cheaper than a helper function, if the test no longer needs that value, the test will get faster. That might be true but as a reader of the test, I’m going to wonder why it’s in there. What needs it? It leads to dead code which IDEs have much harder time pointing out for you because it’s a DSL, not plain Ruby.

What the bad test achieves is separating the test setup from the code execution and assertions. That’s a great! As long as there are only a bunch of tests sharing the setup. When you get need to understand test number five or more, you have to figure out that the setup is outside of the test and then scroll up and down between the shared setup and the test itself to grasp the interaction.

The good test moves the test setup into a helper method. The method lives within the test file and is contended to be very specific to that file. Its advantage is that it’s normal function call that can be followed with any editor. If it becomes unused, the tooling might highlight it. It also doesn’t change with every layer of nesting, you simply get what you see.

Setup, execute, assert

A commonly advised test structure splits a test into three steps: setup, execute, assert. The bad test follows the order but could make it clearer by blank lines in between the steps. The good test kinda merges setup and execution. Personally, I don’t mind because it’s so simple. However, a more complex test might benefit from being a bit more explicit:

player_params = player_attributes(nick: "     Alice\n")

result = PlayerCreator.(
  current_user: build_stubbed(:admin),
  params: player_params,
)

expect(result).to be_success
expect(result[:model].nick).to eq 'Alice'

Assertions

The first bad test case is called should allow leading and trailing whitespace in nick and then asserts expect(result).to be_success. Apart from the wording issues described earlier, the test case name and the assertion match. However, the code actually strips the whitespace before creating the player. It’s a kinda trivial transformation but why not test and document it?

The good test case is called strips off leading and trailing whitespace(s) from the nick and succeeds and adds an assertion expect(result[:model].nick).to eq 'Alice'. It describes the transformation and tests it as well.

Most developers who have heard about testing have heard about a single assertion per test and might start splitting the test in two in order to have a single expect call per test. They can knock themselves out, as long as we don’t work in the same codebase. I’m not saying “stuff as many assertions into a single test” but I see several legitimate reasons to have more than one expect call per test.

A pragmatic reason is performance. Each test adds to the suite execution time and slow suites are a drag. Complex software needs large suites with many tests but let’s not add unnecessary tests because of misunderstanding of a recommendation.

Another reason is the overlooked distinction between a single assertion and a single expect call. Often, people take them to mean the same thing but consider a built-in RSpec matcher have_attributes:

expect(person).to have_attributes(name: 'Tiff', age: 32)

It is a single expect call but technically, it compares name and age. I have never heard a person object to a test using have_attributes and trying to rewrite it into two (or more) tests. Let’s test that the attributes have changed and consider it an atomic operation for the purposes of testing. 1

The example test expects

expect(result).to be_success
expect(result[:model].nick).to eq 'Alice'

One can argue that a success and an attribute of a model are distinct enough to not fit under the umbrella of a single assertion.2 Actually, this is where some familiarity with other languages and paradigms comes in handy. Let’s imagine an equivalent expectation in Elm:

playerParams [(Name, "   Alice   "]
    |> createPlayer
    |> Result.map (.model >> .nick)
    |> Expect.equal (Ok 'Alice')

The result of the Elm function equivalent to PlayerCreator would be a Result which can be one of the values Ok or Err, each carrying data relevant only for that result. It clearly is a single expectation which encompasses both pass/fail and the actual value/error. Ruby doesn’t allow us to express the result via types in a similar way but the underlying behaviour is the same.3

Focus

A rule of thumb for structuring code I like is: each function should have a single level of abstraction. Measurable approximation of this abstract concept is the number of nesting levels: one layer if/case/switch/loop per function. Both versions of the test are similar in this - they don’t have any conditionals (as most tests don’t have them) or any loops. However, the good test has a slightly more complex call of the code under test:

result = PlayerCreator.(
  current_user: build_stubbed(:admin),
    params: player_attributes(nick: "Al  ice"),
)

It’s not terrible and pretty standard RSpec so I consider it good enough. In a more complex test though, I might write another helper to get to:

params = player_attributes nick: 'Al  ice'

result = create_player params

expect_error(
  result,
  :nick,
  'Spaces, new lines and other whitespace characters are prohibited'
)

This matches the setup-execute-assert structure to three function calls. I don’t consider it a neccessity but the advantage is that the reader’s focus is pulled directly towards the important things in the test:

Even the most tired reader will grasp what is important in this test without being boggled down in ceremony of calling a method with many/complex arguments and processing complex types. If they need more detail, they can navigate to the part they need to dig into. The longer you stick with a codebase, the more you appreciate its readability and readability of its test suite.


These are by no means all properties of good tests as I value them. Being a human, my opinions are shaped by my experiences and keep evolving, this post is just a snapshot. You might have a different perspective or different reasons for the same approach - I’m not saying my way is the only right way and I’m happy to hear about your testing philosophy and reasoning behind it. By explaining mine approach, I hope to help developers at an earlier point on their path and save from running in some unpleasant directions.


  1. There’s a non-trivial number of developers who might try to reconcile the difference by using or writing complex matchers. I don’t find the RSpec API for complex matchers easy to navigate but if it’s your jam, go ahead. ↩︎

  2. You could get down to a single expect call if you added some implicit assumptions about the code: expect(Player.find_by(name: 'Alice)).to exist. The code assumes that (a) there is no other record matching that criteria and (b) the record being persisted implies success. ↩︎

  3. There’s dry-monads gem implementing precisely this behaviour. ↩︎

We're looking for developers to help us save energy

If you're interested in what we do and you would like to help us save energy, drop us a line at jobs@enectiva.cz.

comments powered by Disqus