First Steps in Python Testing
Programming is a craft, and in data science we often spend countless hours coding. There isn’t a magic shortcut to improving your programming skills. But, like any craft, improvement comes from practice: challenging yourself, exploring related skills, learning from others, and teaching.
Testing code using automated tools is common throughout the software development industry. This technique can improve the quality of the code you write as a data scientist. Testing helps refine your code, supports redesign, prevents errors, and makes it harder to write single-use code.
Here, we introduce the pytest framework and show how it can be used to test Python functions. If you don’t use a testing framework as part of your daily workflow, try experimenting with the techniques presented here the next time you write or extend a function.
About pytest
pytest is a software testing framework, it is a command-line tool that automatically finds tests you’ve written, runs the tests, and reports the results. In general, pytest is known for its simplicity, scalability, and powerful features such as fixture support and parameterization, it has a concise syntax and a rich plugin ecosystem compared to python standard libraries.
Getting started with pytest
Before we start writing tests, it’s important to set up a clean, isolated environment where we can install and manage packages. This is done using a virtual environment.
We first navigate to the project directory and then create a virtual environment for our project. Then we activate the virtual environment as in the second row of the code, and install pytest.
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install pytest
We have everything set up to use pytest in our project. When we are done working in the virtual environment, we can deactivate it by simply running:
$ deactivate
Now that your environment is set up, let’s explore the basics of pytest.
What is a test?
A test is a small piece of code (usually a function) that checks whether another piece of code is working as expected. For example, imagine you wrote a function to calculate the mean of a list of numbers. A test would check if the function correctly computes the mean for different inputs.
Let’s create a simple function that calculates the mean of a list of numbers x
and save it in my_functions.py
:
# ./my_functions.py
def calculate_mean(x):
return sum(x) / len(x)
A very nice property of pytest is something called test discovery, a series of naming conventions
that tell pytest how to go and search for tests and execute them.
Any file that contains test functions should start with test_
and also the tests functions in this
file should be named in the same way.
Then, pytest will automatically search and find these functions and run them.
Now, let us write a test for this function using pytest.
Create a file named test_my_functions.py
:
# ./test_my_functions.py
from my_functions import calculate_mean
def test_calculate_mean():
x = [1, 2, 3, 4, 5]
result = calculate_mean(x)
expected = 3.0
assert result == expected
In this example, test_calculate_mean()
is a test function.
It checks if calculate_mean([1, 2, 3, 4, 5])
returns 3.0
.
When we run pytest, it will check if the assert
statement holds true.
$ pytest test_my_functions.py
============================= test session starts ==============================
test_my_functions.py . [100%]
============================== 1 passed in 0.01s ===============================
We can see that the test has successfully passed. In the output, the dot $(.)$ after
test_my_functions.py
indicates that the test has passed.
Now, let’s have a look at an example of a failing test.
Consider the following test function which is in the file test_failing.py
.
# ./test_failing.py
def test_addition():
result = 2 + 2
expected = 5
assert result == expected
We run pytest from the command-line and investigate the output.
$ pytest -v test_failing.py
============================= test session starts ==============================
test_failing.py::test_addition FAILED [100%]
=================================== FAILURES ===================================
________________________________ test_addition _________________________________
def test_addition():
result = 2 + 2
expected = 5
> assert result == expected
E assert 4 == 5
test_failing.py:4: AssertionError
=========================== short test summary info ============================
FAILED test_failing.py::test_addition - assert 4 == 5
============================== 1 failed in 0.03s ===============================
This time pytest provides us with a message giving information on the error and also highlights any
reasons that have caused the test to fail. The -v
or --verbose
command-line flag is used to
reveal more verbose output.
The assert statement
The assert
statement is used to verify that a given condition is True
.
If the condition is False
, the test fails.
In our first example the statement, assert result == expected
asserts that the result from calculate_mean(x)
should equal 3.0
. If the assert statement is not true, pytest reports a failure.
Pytest fixtures
Suppose you had written several functions that all work on some non-trivial dataset, and you want to write a test-function for each. In each test-function, you would have to create a dataset of the required form, pass it into the function-under-test, and then compare the output to some expected value. The code for creating a test-dataset may get duplicated between the different test-functions.
Fixtures in pytest are helper functions which are used to set up conditions that we want to be available for multiple tests. This might involve putting together some test data, or preparing some other state before a test runs (connecting to a database, creating a temporary file). Fixtures are run before (and sometimes after) the actual test functions. The @pytest.fixture decorator is used to tell pytest that a function is a fixture. Fixtures can perform actions (like setting up a database connection), and can inject data into a test function.
To illustrate let us consider a fixture that provides us with a list of numbers
in our test file test_my_functions.py
:
# ./test_my_functions.py
import pytest
from my_functions import calculate_mean
@pytest.fixture
def sample_numbers():
return [1, 2, 3, 4, 5]
def test_calculate_mean(sample_numbers):
result = calculate_mean(sample_numbers)
expected = 3.0
assert result == expected
By using @pytest.fixture, we have defined a sample_numbers
fixture that returns the list
[1, 2, 3, 4, 5]
.
This fixture can be used in any test function by adding it as an argument. Fixtures are especially
useful when you need to set up more complex objects that multiple tests will use.
The test output would be:
$ pytest -vv test_my_functions.py
============================= test session starts ==============================
test_my_functions.py::test_calculate_mean PASSED [100%]
============================== 1 passed in 0.00s ===============================
Parametrization
Parametrization is an important feature of pytest which allows us to run a test with multiple sets of parameters. This is helpful when we want to check the same logic under different conditions without writing separate test functions.
Here is how we can test calculate_mean
from the test_my_functions.py
file, by considering
multiple inputs using parametrization:
# ./test_my_functions.py
import pytest
from my_functions import calculate_mean
@pytest.mark.parametrize("numbers, expected", [
([1, 2, 3, 4, 5], 3.0),
([10, 20, 30], 20.0),
([7, 14, 21], 14.0),
([5, 5, 5, 5], 5.0),
])
def test_calculate_mean_parametrized(numbers, expected):
result = calculate_mean(numbers)
assert result == expected
In this example, @pytest.mark.parametrize allows us to test calculate_mean
with four different
lists.
Each tuple in the list passed to parametrize represents a different test case with its own numbers
and expected values.
Then to run the test we use:
$ pytest -v test_my_functions.py
========================================= test session starts =========================================
test_my_functions.py::test_calculate_mean_parametrized[numbers0-3.0] PASSED [ 25%]
test_my_functions.py::test_calculate_mean_parametrized[numbers1-20.0] PASSED [ 50%]
test_my_functions.py::test_calculate_mean_parametrized[numbers2-14.0] PASSED [ 75%]
test_my_functions.py::test_calculate_mean_parametrized[numbers3-5.0] PASSED [100%]
========================================== 4 passed in 0.01s ==========================================
The output is slightly different here because we are testing for different scenarios and the result is given for each of them.
Test organization
In the above, our test scripts (test_my_functions.py
and test_failing.py
) and python modules
(my_functions.py
) were all in the same directory. We used this approach for simplicity (as our
focus was on how to write and run tests). In a larger project you may have many test scripts and
python modules, and this approach will quickly become difficult to manage.
To keep your project organised, it’s a good practice to place all tests in a tests/
directory.
This way, when we run pytest we receive a summary of all the project’s tests. On making this change,
the file structure for the above example is:
./intro-to-python/
├── my_functions.py
├── tests/
│ ├── test_failing.py
│ └── test_my_functions.py
└── venv/
However, there is a small problem here. The my_functions.py
module must be imported by the
test_my_functions.py
test script. But if we call pytest tests/
from the project root,
my_functions.py
isn’t automatically included in the python search path (a collection of
directories from which packages and modules can be imported by the running python session) so it
can’t be imported by test_my_functions.py
.
A simple solution for this is to use the following command instead of pytest tests/
:
$ python -m pytest tests/
When we call python
directly, any python modules in the current directory are made available on
the python search path.
A more robust solution (and one we would recommend for larger projects) is to place your python modules in a package structure, though that is beyond the scope of this introduction to pytest.
Ready to start testing your code? Enjoy your journey into Python testing, and happy coding!