The What, When, Why, and How of Testing (Part 3)

woman wearing red and black checkered blouse using macbook
Photo by Christina Morillo on Pexels.com

(Authors Note: The following is an extract from the Advanced Unit Testing Techniques chapter of my upcoming book: Beyond Effective Go – Part 2 – Striving for High-Quality Code)

In the previous posts, we examined the Why, When, and How Much of testing. In this post, we will complete the thread with an examination of What we should be testing and What we should not be testing.

What should we be testing?

After choosing whether or not to test, choosing what to test is the most impactful decision. It might surprise you, but I am not talking about a choice between unit and user acceptance tests but rather the tests’ entry points. Consider the following struct:

You will notice that this struct has four methods, one public, and three private methods. The public method forms the API or entry point from which others parts of the system interact, and the three private methods are implementation details.

If our goals were test isolation and writing the smallest, simplest possible tests, we could test all four of these methods individually. With the exception of the tests for Process() tests, all of our tests would have a small scope and be easy to understand. This sounds great, right?

Consider what happens when we want to make changes. We may decide to have multiple payment options. At a minimum, the tests for chargeCustomer() need to be drastically changed, perhaps even discarded. However, if we have written all our tests against the Process() method, only a few (preferably none) of the existing tests would break because of adding this new feature. When tests break, there are two causes. The first is regression. Highlighting and preventing this regression is one of the primary purposes of the tests. 

The second occurs when the test implementation is too tightly coupled with the implementation of the code under test. This form of test breakage indicates an issue with the tests rather than the code. In this case, the problem is our choice of “unit” for our unit tests. The existence of chargeCustomer() is an implementation detail, not a behavior. Because it is an implementation detail, it has a high potential to change. Any tests relying on the current implementation are likely to break during refactoring and, therefore, introduce resistance to this change. As a result, we may become reluctant to refactor our code, leading to technical debt. If we perform the refactoring, we will incur additional work to maintain the tests.

To avoid these problems, we can take our unit as the object or objects that process an order and then test the unit from its entry point. With this approach, our tests will be more stable and less a reflection of the current implementation. In this example, the unit is the entire OrderManager struct, and the entry point is the Process() method. Additionally, as Process() is the API contract between this unit and our other units, it is naturally resistant to change. 
If you are familiar with the TDD or BDD ideologies, then you will notice some parallel’s here; we are attempting to verify the behavior of our code without any coupling to the implementation details. This focus on behavior and avoidance of implementation details will ensure that our tests are resilient to refactoring and, therefore, low cost. This is best demonstrated with some examples. Let’s assume the validate() method of our OrderManager looks like this:

func (o *OrderManager) validate(order Order) error {
	if order.CustomerName == "" {
		return errors.New("customer name cannot be empty")
	}

	if order.ShippingAddress == "" {
		return errors.New("shipping address cannot be empty")
	}

	return nil
}

Now, consider the following test scenario for name validation:

func TestOrderManager_Process_sadPath_nameValidation(t *testing.T) {
	// inputs
	order := Order{
		CustomerName: "",
	}
	expectedErr := errors.New("customer name cannot be empty")

	// call object under test
	orderManager := &OrderManager{}
	resultErr := orderManager.Process(order)

	// validation
	require.Equal(t, expectedErr, resultErr, "expected error")
}

At first glance, this looks fine, but there are two issues. Firstly, this test only passes due to its intimate knowledge of the implementation details. If we were to change the validation order, this test would break and need to be changed. The second issue is that the test validates the exact error thrown. This is a prevalent mistake; if the error changes for any reason, this test will break. 

This is a simple example, but hopefully, you can see that these tests will likely break when we refactor our OrderManager implementation. Tests suffering from this problem should be considered brittle tests because they will break easily and require continued attention. Maintaining brittle tests upsets the cost/benefit ratio of our testing and refactoring efforts which, in turn, discourages us from testing and refactoring. Fortunately, the fix for these issues is simple, compare the following code with the previous implementation:

func TestOrderManager_Process_sadPath_nameValidationImproved(t *testing.T) {
	// inputs
	order := Order{
		CustomerName:    "",
		ShippingAddress: "123 Sesame Street",
	}
	expectedErr := true

	// call object under test
	orderManager := &OrderManager{}
	resultErr := orderManager.Process(order)

	// validation
	require.Equal(t, expectedErr, resultErr != nil, "expected error. err: %s", resultErr)
}

You can see that our order input is now completely valid, except for the condition we are testing, and instead of testing for a specific error, we are testing only that an error was returned. Additionally, our test only focuses on the behavior described as: “When the customer name is missing from the order, we should throw an error.”

This same approach can and should be applied to the configuration of mocks. The following are some tips relating to the use of mocks:

  1. Configure mocks even if we know they will not be called by the current implementation. The goal being to be resilient to refactoring.
  2. We should only verify that a mock was called or not called when the call itself is significant. For example, suppose our OrderManager had a mock for charging the customer. In that case, we might want to verify that it was not called when validation failed to ensure we were not charging customers inappropriately.
  3. In most cases, we should only verify that the mock was called and not check every individual parameter sent to the mock. Again the yardstick is the significance of the field and the likelihood of mistakes. If we verify a parameter’s value, we avoid doing so for all the test scenarios as this adds construction and maintenance costs to the tests. For example, if our OrderManager had a mock for emailing the receipt, we might decide to verify that the email address was passed. However, we should avoid validating the email body as this is likely to change, and failing to pass it is an unlikely mistake.

The line between behavior and implementation detail is tough to determine, but I offer three additional tips to address this:

  1. It does get more comfortable with experience.
  2. When in doubt, remember that our goal is to write tests that confirm behavior but are reasonably resilient to refactoring.
  3. It is more efficient to default to loosely coupled tests and only add specificity when necessary. With necessary to be determined by experience, business requirements, or bug reports.

When we focus on testing behavior and construct our tests to be ignorant of the implementation as possible, the result will be cheap, resilient, and valuable tests.

What should we not be testing?

After reading the previous section, you may have thought, “If we are testing behaviors, then why don’t we skip unit tests and only write UATs?”. This is not an irrational thought. However, as we have discussed, the weakness of UATs is their broad scope. Their broad scope makes for larger, more complicated tests and slower debugging. In addition, UATs confirm the system’s behavior as a whole and not a particular unit (struct or module). Without unit tests to ensure the behaviors of our units, code reuse would be risky.

Given that I have spent most of this series of posts so far trying to convince you to test and discussing the best options for writing your tests, it might seem strange to assert that there are things we should not be writing tests for. But here goes. The following is a list of aspects of our system we should not test or not test directly:

  1. Non-API methods and functions – Any methods or functions not directly called other parts of the code should not be tested directly. They are implementation details and likely to change. We will test this code indirectly via tests that of our code unit’s entry points. There is no need to be too dogmatic about this point. If you have a particularly complex function that you want to test in isolation, applying tests directly to this function can be efficient.
  2. Anything too simple for us to get wrong – Writing a test for code that we cannot possibly have got wrong is just a waste of effort. Once in a while, we might have to write a test to appease the company’s unit-test coverage requirements, but if we can, we should save time.
  3. Other people’s code – Similarly to the previous point, writing and maintaining tests for code we do not own is a waste of time. We should trust the libraries and modules provided by others. If there is ever a case when we don’t trust and need to test, it is most effective to open a pull request with tests for the external party. There is an exception to this point. When dealing with unknown or undocumented code writing tests to explore and confirm the behavior can be helpful. That said, these are tests we want to either not check into our project or skip running most of the time.
  4. Behavior covered by other unit tests – When we consider different parts of the code as different units, each unit should be tested independently. This is often where mocks and stubs come in. We should trust that the other unit tests are sufficient and not waste effort by testing this more than once.
  5. Generated code – Extending from the previous point, when we generate code, we should consider the result “other people’s code”. It is the owner of the code generation tool’s responsibility to ensure that the code they generate works as expected and not ours. Yes, it is possible to also generate tests to test generated code, but most of the time, this wastes time and effort.

Hopefully, throughout this series of posts, you have seen the underlying threads of productivity and value. The what, why, when, and how of testing presented here is all about engineering a situation where we spend the minimum possible effort writing and maintaining tests but still extract maximum value from this effort.


If you have enjoyed this content and would like more, please pick up a copy of Beyond Effective Go – Part 1 – Achieving High-Performance Code, and don’t forget to also check out the book’s companion source code.

Also, if you would like to be notified when there are new posts or to be kept informed regarding the upcoming book launch please join my Google Group (very low traffic and no spam).