Back to overview

The Testing Trophy and Testing Classifications

June 3rd, 2021 7 min read

by Fauzan Saari
by Fauzan Saari
No translations available.Add translation
💿 This blog post involves React, but was written before Remix was launched. Learn how Remix drastically simplifies React applications from the post:
Remix: The Yang to React's Yin ☯

Allow me to indulge in a little personal history. If you're unfamiliar with the testing trophy, here it is:

Illustration of a trophy separated into 4 sections labeled from top to bottom: End to End, Integration, Unit, Static

I initially introduced this in a tweet with a quick drawing I made with Google Drive:

I came up with this idea after publishing a blog post titled "Write tests. Not too many. Mostly integration.":

Which was my take on Guillermo Rauch's tweet from about a year earlier:

I can't speak for Guillermo, but I agreed so strongly with what he said because of my experience as a UI engineer and how I personally had come to understand the term "integration" in this context.

Especially at that time in my career, almost all the code I wrote either ran directly in a browser or was intended for a tool that would help me run code in a browser. So for me naturally the terms "unit", "integration", and "end-to-end" would be viewed through the lens of that experience. In fact, I added "static" to the trophy because in the world of JavaScript that's not a given like it is in the predominant languages when the testing pyramid was introduced.

The reason I explain this background is to help you understand the way the Testing Trophy is intended to be interpreted. I never considered whether it applied to microservices or even backend services at all. I considered my codebase in isolation and attempted to categorize the types of tests I could write within the confines of my own code ownership. I always thought of end-to-end tests as the place where you attempt to validate that things work without any (or more practically "as little as possible") mocking in place.

So that left me with categorizing tests on my own code into either "unit" or "integration". I consider a "unit" to be a single function, class, or object that contains logic. So here's how I decided to (loosely) categorize them:

  • Unit tests are those which test units which either have no dependencies (collaborators) or which have those mocked for the test.
  • Integration tests are those which test multiple units integrating with one another.

Eventually, I created Testing Library to encourage the kinds of testing practices that worked best for me:

By my own definition, Testing Library can be used to test individual React components (unit tests), entire pages with HTTP requests mocked via MSW (integration tests), the full app with very few mocks (end-to-end tests), and even individual React hooks if necessary (lower level unit tests). And Testing Library is now the most popular and de facto standard... er... testing library for React apps and increasingly the same is happening wherever the DOM can be found. In May 2020, Testing Library received the "Adopt" distinction on the ThoughtWorks Technology Radar.

I expect some will reply to this blog post with: "Why did you have to make up your own definitions in the first place? Just use the ones that exist." So I'll respond before you ask: "Which of the two dozen different definitions would you like me to have chosen for my own definition?" 😂 😭 In his post about test shapes, Martin Fowler approximates a quote of a "test expert" who was asked in the 1990s how they define "unit test":

“in the first morning of my training course I cover 24 different definitions of unit test”.

This is a sad state of affairs, and it's been that way since the 90s unfortunately. It is what it is. I had to choose something that made sense for me and as an educator, I had to choose something that would make the most sense for the people I'm teaching. Judging by the response from people who have implemented my recommendations, my decision was a good one.

When discussing whether you can prove that testing is effective, Tim Bray (in his article Testing in the Twenties), correctly says:

let's not kid ourselves that our software-testing tenets constitute scientific knowledge.

I would say this applies to everything about testing–not just whether it's effective (it can be). Any attempt to come to a single definition for all these terms is a futile endeavor. I remember speaking at Assert(JS) (where I gave my talk Write Tests. Not too many. Mostly Integration.) and I observed how wildly different each talk was with regards to their recommendations on testing. But as I think about it now, I think lots of the difference could be attributed to our definitions of the terms of testing and less on how we strive to achieve confidence.

Justin Searls (who incidentally also spoke at Assert(JS) that year) said it best when he tweeted:

Classification is important so we can have conversations about this. It's unfortunate that you pretty much need to come to a consensus on how you define these terms before having a productive conversation. But ultimately it really doesn't matter. As Justin says, it's a distraction. Especially when so many codebases are living life on the edge without an automated way to have confidence their changes are safe to deploy.


Anyway, hopefully this helps to clear things up a bit. To sum up: When trying to apply the testing trophy to your situation, think of it within the code of an individual codebase. It definitely has applicability in backends, but I've only considered it for monoliths not microservices or even serverless functions (and I agree with Tim, most of us should probably be writing monoliths if we can).

The testing trophy (when understood) has given me (and countless other) clarity on where to focus testing efforts. When properly interpreted, it helps me keep this critical principle in mind:

This is the guiding principle for Testing Library and it's how I think about every testing problem I face.

Remember, it's all about getting a good return on your investment where "return" is "confidence" and "investment" is "time." If we had unlimited time, then trying to classify things wouldn't be necessary, we'd just write tests forever! But we don't, so I hope this helps you when trying to decide where to put your efforts.

P.S. If you'd like more of my thoughts on testing, I have a lot of posts on the subject on my blog. Here are a few specific articles I recommend you read next:

💿 Don't forget to checkout Remix: The Yang to React's Yin ☯
Kent C. Dodds
Written by Kent C. Dodds

Kent C. Dodds is a JavaScript software engineer and teacher. He's Co-Founder and Director of Developer Experience at Remix! Kent's taught hundreds of thousands of people how to make the world a better place with quality software development tools and practices. He lives with his wife and four kids in Utah.

Learn more about Kent

Want to learn more?

Join Kent in a live workshop

If you found this article helpful.

You will love these ones as well.