The What, When, Why, and How of Testing (Part 2)

woman coding on computer
Photo by ThisIsEngineering on

(Authors Note: The following is an extract from the Advanced Unit Testing Techniques chapter of my upcoming book: Beyond Effective Go – Part 2 – Striving for High-Quality Code)

In the previous post, we examined the Why and When of testing in this post, we will build on that foundation and look at How much we should be testing.

How much should we test?

We should test just enough and no more. While we want to write tests to work faster, writing and maintaining tests, have a cost. If we have too many tests, then their costs can outweigh the value they bring.

If you pushed me for a test coverage number, I’d say anything over 70%. 

There are two reasons for this: 

First, test coverage is measured by lines of code. It does not matter how many lines of code we have or how many of those lines are run during our tests, what matters is behavior coverage. For each of our code’s behaviors, we should have one or more tests that confirm it.

Second, there are often times when we have code that cannot reasonably be tested, and any attempt to do so would damage the quality of the code and result in test-induced damage. Consider the following code:

func GetUserAPI(resp http.ResponseWriter, req *http.Request) {
	ID := getID(req)

	user := loadUser(ID)

	payload, err := json.Marshal(user)
	if err != nil {

	_, err = resp.Write(payload)
	if err != nil {

Testing every single line of this code is somewhere between extremely hard and impossible. The json.Marshal() and http.ResponseWriter.Write() both return errors, but these errors should never happen. We could pass in a mock implementation of http.ResponseWriter that returns an error, but what would we be testing? We’d be testing the mock and the fact that we handled the error. Both fall into the “too simple to get wrong” category. Additionally, we would end up with a test that returns little value, a mock, and a test that we now have to maintain. This is a form of test-induced damage that we will examine more at the end of this chapter.

Returning to our question of how much we should test, we should also acknowledge that tests come in many forms and consider how much time we should devote to each form. The most common forms of tests are Unit Tests, User Acceptance Tests (UAT), and End-To-End (E2E) tests. Each of these test forms has different goals, strengths, and weaknesses that we need to be mindful of when using them.

Unit Tests

Unit tests aim to confirm the existence of a particular behavior in the unit-under-test. Please note that I am choosing my words very carefully here. A unit is not necessarily a single function; it is not necessarily a single struct; a unit can be these things, but it can also be a collection of structs and functions that collaborate to add a behavior to a module or package.

Let’s explore an example. Assume we have a bank package responsible for interaction for an API provided by an external company. Inside our bank package is a struct called Account with the following method:

func (a *Account) Transfer(amount int, to string) error

Without looking at the implementation, we can define our expected behaviors as follows:

  • When I try to transfer a negative amount, we should receive an error.
  • When the API is down, we should receive an error.
  • We should receive an error when the API returns an unexpected or garbled response.
  • When we make a valid request and the API works, we should not receive an error.

We should have at least one unit test for each of these behaviors. By doing so, we ensure that all our intended behaviors are present and document these behaviors for posterity.

Returning to our definition of “units”, if all of the code required for interacting with our external API existed within a single struct, our unit tests would only involve this single struct, but what happens if our implementation looks like this?

If we defined our units as structs, we would have to test Account, requestEncoder, and requestDecoder separately. This is unnecessary and wastes time and effort. Take another look at our desired behaviors above; request encoding or decoding is not mentioned. This is not because it is not essential; it is because it is a small part of the broader behavior. By acknowledging that our three structs would not exist without each other and they must collaborate to achieve our goal, we should treat them as a single unit.

With this definition of unit, we can see both the strengths and weaknesses of unit tests. The main strength and weakness of unit tests is their small scope. Because the scope is small, the tests are easy to write and understand. These unit tests will serve to document and enforce the author’s intentions on a small scale. Also, because of this small scope, these tests are fast to execute. Conversely, the test’s small scope is also a weakness as they confirm a unit’s behavior and not the system’s behavior (or features) as a whole. For this, we need to take a system-level perspective.

User Acceptance Tests

User acceptance tests focus on confirming that the system behaves as we expect it to. The main difference between unit tests and UAT tests is the scope of the tests. In our unit tests, we focused on an individual unit of the code, perhaps introducing mocks or stubs to isolate that unit from the others. In UAT tests, we are testing most (or all) of the codebase. 

Test scope and isolation are still vital considerations for UAT tests. It is important to remember that we are testing our system in isolation, so our tests should not be reliant on any external systems. I should note I do not mean that these tests must mock databases, filesystems, caches, or any resources that can reasonably be expected to exist in a development environment or CI build slave. But rather any third-party systems. 

You can mock the database and caches, but it often has a terrible cost-to-value ratio.

UAT scenarios should be constructed from the perspective of the system’s users with minimal understanding of the implementation details. For example, if we had a login API, the scenarios might be:

  • When the database is running correctly, and the username and password are correct, we should receive success.
  • When the database is down, then we should receive an error.
  • When the database is running correctly, and the username or password is missing, we should receive an error.
  • When the database is running correctly, and the username or password is wrong, we should receive an error.

As you can see, the only implementation detail in these scenarios is the existence of the database. As written, our scenarios are somewhat generic, we could tighten them up to match our API contract and include aspects like expected response codes. Doing this would enforce and document our API contract more thoroughly. However, please don’t take this too far as enforce responses to the minute detail; doing this would make our tests brittle and more troublesome to maintain. How you strike this balance between scenario strictness and brittleness will vary from project to project and personal preference. Try to find the minimum you can get away with and then be more strict when bugs or deficiencies are discovered or as the risk and cost of mistakes increases.

When considering the coverage for UAT tests, it is essential to note that we are not looking from a lines-of-code perspective but rather from a scenario or use-case perspective.

Moving on to the strengths and weaknesses of UATs. The strength of UATs is that they confirm that the system does what the user expects. 

As we have constructed our UATs to be independent of external systems, this is both a strength and a weakness. It is a strength because our tests are completely independent of external resources and, therefore, are under our control and completely reliable. It is a weakness because we are testing against mocks and not the actual external dependencies. There is, therefore, a risk that our mocks and the external dependency have different behavior. We can address this weakness with End-to-End tests, as we will see in a moment.

However, the main weakness of UATs is the scope of the tests. Because the scope is broad, it can be time-consuming to locate the underlying cause when there are problems.

End-to-End Tests

End-to-End (E2E) tests are essentially UATs performed with all external dependencies. These tests aim to build on the behavior confirmed by the UATs and verify that the system has the desired behaviors when involving all of the external dependencies.

When constructing our E2E scenarios, we only need to look as far as our UAT scenarios. If we are time-constrained, we can reduce the scenarios to only those that involve these external dependencies.

The strength of E2E tests is also their weakness; They involve external dependencies. They will confirm our system performs as expected in a production-like environment. Because our tests rely on these external resources, they will only be as reliable as these resources and the test environment. Therefore we may see test failures that are not caused by our code.

The Test Pyramid

Now that we have defined the different types of tests and examined their strengths and weaknesses, we can return to our original question: How much should we test? As an industry, we are always time-constrained, and as such, we need to spend our time as efficiently as possible.

The Test Pyramid is a handy visual mnemonic first introduced by Mike Cohn and often discussed by Martin Fowler and others, whose goal is to remind us where and how we should spend our testing effort. This is my version of the Test Pyramid:

From this diagram, you can see that I recommend spending most (70%) of our testing effort on unit testing. This is because adding behaviors to our units of code is where we spend most of our programming effort. Unit tests, therefore, support our primary task. The fact that these tests are also the easiest to understand and the cheapest to write and run are pleasant additional benefits.

I recommend spending most of the remaining effort (28%) verifying that the system has the behaviors that our users expect by adding UAT tests. At the end of the day, adding user-focused behavior is what we are being paid for. Compared to unit tests, UATs tend to have a higher cost of construction, debugging, and maintenance. This is why, despite the fact they are aligned with user value, we should be spending less effort on UAT’s than unit tests.

Allocating the remaining 2% of our effort allocated to E2E tests may be shocking, but these tests have very high construction and maintenance costs. Additionally, they have a poor signal-to-noise ratio, given that they will often fail for reasons unrelated to our code. In my experience, when a system has sufficient unit and UAT tests, the majority of issues not caught by these tests are configuration or intra-team issues, which are not addressed well by E2E tests. Testing how our system responds to missing or inappropriate configuration should be done at the unit or UAT level; I prefer to test with unit tests and then have the rest of the codebase assume that the config is sane. This will result in less code and a faster system overall. As to intra-team issues, I am afraid I don’t have an automated test for that.

When issues are caused by external systems not performing as expected, this is not a reason to add more E2E tests. Instead, we should add these failures as mocked responses in our UAT tests. This way, we ensure that our system can account for this unexpected behavior and respond predictably.

The primary thing to remember about the Test Pyramid is that it is a mnemonic. While I have given you concrete numbers of 70, 28, and 2, the actual percentage of effort depends on your application, your team, and the deployment environment. It is possible to have a successful application without any E2E tests; I have seen many successful applications like this. If you currently have no tests and want to start, start with unit tests. After you are comfortable with testing, adding some UAT tests will be significantly easier and further improve your confidence in your application.

You can check out the continuation of this topic in this follow-up post.

If you like this content and would like to be notified when there are new posts or would like to be kept informed regarding the upcoming book launch please join my Google Group (very low traffic and no spam).