In my day job, I spend quite a lot of time writing – and reading – automated tests written in RSpec. At some point, I realised that I have Opinions™ about automated testing and how to do it better, and a few personal bugbears. It’s worth noting that I can’t claim vast experience in this area, but we have a proud industry tradition of not letting things like that stop us.
So, any_instance. This feature (properly of rspec-mocks) has the rather
interesting position of being advised against in its own documentation. I feel
that that never suggests good things to start with, but across my team I
probably have the most visceral reaction. any_instance worries me. Not because
I think its unreasonable (sure, there are some odd semantics), but because it
suggests some things about the code being tested.
Partial Mocks
The obvious issue I take with any_instance and friends is that they construct
partial mocks of objects that are – typically – generated during the course of
the test. The remainder of the object is left intact, and this does my head in:
part of the object has just been arbitrarily rewritten, and the remainder may be
completely incompatible with the particular stub. Generally, I see this when
trying to understand some strange action-at-a-distance phenomenon that has left
apparently unrelated tests in tatters.
It would be nice to think that whenever any_instance is deployed, the test
cases in which it is implicated would be carefully examined for other calls that
could be made liars by the override; my view is that the ease of partial mocks
of semideterminate objects make this unlikely: this is the sort of “just be
disciplined” approach that although true is better handled by tooling.
Systematic Truncation
Every now and then I find myself considering the value of simply overriding specific methods on a class by default. To be more concrete, imagine a system that maintains an audit trail, and that this incurs a significant overhead on every action in a particular domain (we might, for instance, invoke an external dependency as a transaction precondition). That system is generally not the entity under test, so the risks of omitting the code path are possibly low, and the benefits (especially if the test instance, if it even exists, is aggressively rate-limited) can be significant.
This is the sort of situation where I really feel that what I want to do is
provide a complete fake class. But if an existing codebase does rely on the
specific outcomes of methods on the real object, you have something of a
problem. Given this, using any_instance for truncation suggests to me
overcoupling in a particular manner (although I acknowledge that my perspective
on this is likely to be fairly idiosyncratic).
Aesthetics
In a lot of scenarios I’ve seen, invocations of any_instance can be replaced
as follows:
allow_any_instance_of(Foo).to receive(:bar).and_return(1)
allow(Foo).to receive(:new).and_wrap_original do |m, *args|
original = m.call(*args)
allow(original).to receive(:bar).and_return(1)
original
end
This is almost painfully unpleasant (although if you’re overriding several methods, you at least only pay this once). If you have an effectively total override, then the following form can be sufficient:
allow_any_instance_of(Foo).to receive(:bar).and_return(1)
allow(Foo).to receive(:new) { instance_double(Foo, bar: 1) }
This form looks a bit less painful at first glance, but is of course solving a different problem – it just might happen that that problem is the one that would actually be preferable to solve.
The Core Problem
So, I think there’s something central about any_instance that bothers me, and
I alluded to it above. any_instance serves a clear (and potentially useful!)
role, but it enables. By providing a mechanism by which the developer can
easily provide arbitrary variations on entities in a fundamentally ad hoc
manner, understanding how various tests interlock becomes difficult.
The absence of a canonical, uniform mechanism to handle this (à la “the audit fake”) makes it difficult to understand what parts of the existing code is causing test difficulty; it fragments solutions to what could be a meaningful set of problems; it makes changes to the overridden methods fraught (consider a clarifying rename of a method that, when not stubbed, and in one test, breaks on Tuesday); it requires the changing developer (and reviewer) to think about the various options.
I’m unsure if constructing canonical fakes will actually make matters better – my experience here is really in cases where the alternative was clearly going to be a technical disaster – but it feels like a strong possibility. Then again, I certainly tend to solve a lot of these problems by imposing structure on them, and maybe that’s not always compatible with other people’s approaches to design. If nothing else, there’s a chance to experiment here.