System tests haven't failed

by Jared Norman
published June 18, 2024

The topic of system tests (a.k.a. end-to-end tests or feature tests) is making the rounds right now, and I’m a bit confused by it. They’ve always been a bit of a contentious topic. I’ve seen people say they only write system tests. I’ve seen people say they don’t write any system tests. I don’t want to get called a “centrist”, but the best approach is definitely somewhere in between those two poles.

System tests have failed

When we introduced a default setup for system tests in Rails 5.1 back in 2016, I had high hopes. In theory, system tests, which drive a headless browser through your actual interface, offer greater confidence that the entire machine is working as it ought. And because it runs in a black-box fashion, it should be more resilient to imple...

DHH started the conversation by announcing that system tests have “failed”. He claims that he has seen very little benefit to having a large suite of system tests. He reports having spent too much time getting such tests to work for the minimal benefit he has seen from them.

I don’t doubt any of that. System tests are harder to maintain than any other kind of test. He’s entirely right that these tests are prone to false negatives and browser timing issues unless they are very carefully written. I agree that debugging these tests is much more difficult than other kinds of tests.

For once, I’m mostly on the same page as David. In fact, part way through the article, we find this statement:

System tests work well for the top-level smoke test.

This is true. System tests work best when they’re just making sure that things vaguely work. You don’t want to be testing the details of anything unless you’ve got critical business logic that can’t be tested at other levels of the system. This all comes back to the testing pyramid.

The Practical Test Pyramid

Find out what kinds of automated tests you should implement for your application and learn by examples what these tests could look like.

martinfowler.com

The higher-level (and slower) a test is, the fewer of that kind of test you should have and the more general that test should be. The fact that “HEY today has some 300-odd system tests” is a red flag. That’s not the most ridiculous number of system tests I’ve encountered, but my gut says that’s too many for that size of application. These are the tests you should be using the most sparingly.

Especially when you have a good test harness in place it can be easy to reach for them when adding new functionality. You’re adding a variation of some existing feature, so you encode the new functionality in a variation of an existing test. This gives you immediate, automated feedback as you work through building your feature.

The problem is that you don’t want every variation tested at that level. Instead you have two options.

You can simply not write the system test. Instead, home in on where you can exercise this functionality in more localized (and faster) tests. This works best if you know the system well.

Alternatively, you can write the system test and not commit it. It is totally fine to write a system test to aid you in developing a feature and then throw that test away. Whether or not a test is useful to you now is not the same as whether the test is valuable to the project forever.

This is the same idea behind why REPLs are great. It’s why prototypes are great. It’s why iterating on design is great. There’s thousands of reasons to write all kinds of code that useful for you in the moment, but ultimately doesn’t need to be carried forward and maintained. That goes for tests too.

So, if you’ve got too many system tests, what do you do? You can delete them, but part of the hypothetical benefit of these tests is that they give you confidence that you haven’t broken anything. If you wholesale delete them without examining the lower-level tests for the logic they exercise, you may be making your system more brittle. We want to keep the confidence without keeping the slow tests.

Replacing system tests with unit tests | Everyday Rails

Aaron Sumner argues that you should make sure you keep the tests that provide you the most value, and that means the ones that test the most important parts of your application. I work in eCommerce, so my system tests are focused around the core purchase flow and the checkout. We do test variations in there, because we need to know that all of those variations work every time we make a change to the site.

Aaron also points out that code coverage tools can be useful for finding the gaps that removing feature tests removes. This is a great idea and one of the few places that I’ve seen code coverage used to solve a real, practical problem.

The article is a little reductive, though. I agree that when testing ActiveRecord models that the database becomes part of the unit, but the testing world is richer than the article makes it seem. You shouldn’t necessarily be replacing your system tests specifically with unit tests. Instead, you should use a mix of integration and unit tests, depending on your needs.

System tests haven’t failed; they’re just overused. Writing maintainable, reliable system tests is an admittedly difficult skill, but luckily you don’t need many of them, so a little investment goes a long way. Put some effort into the most important ones to make sure they are reliable. Go ahead and axe the variations and flaky system tests. I’m sure you’ve picked up a bunch of slow tests that aren’t providing you any value. Never keep tests that aren’t providing value.