Testing and Test-Driven Development

Testing is the primary safety net for software under change. In maintenance contexts — where you modify existing code without always understanding it fully — automated tests are what allow you to move with confidence.

Motivation: Errors, Defects, and Failures

Error: An inappropriate or erroneous decision made by a developer that introduces a defect. Errors are human mistakes — misunderstanding requirements, logic mistakes, off-by-one bugs. Errors live in the developer’s mind; they cannot be directly detected by any tool.

Defect: An imperfection in the system that may contribute to one or more failures. A defect is the artifact left in the code by an error. Note that sometimes several defects must combine to trigger a failure — a single defect may be dormant for years.

Failure: An unacceptable behaviour observed during execution. The frequency of failures reflects system reliability. A failure is what users and operators actually experience.

Understanding the chain Error → Defect → Failure is key: testing cannot find errors (human intent), but it can detect defects by triggering failures.

What is Testing?

The IEEE-STD 729 standard (1983) defines testing as follows:

Testing is a manual or automated process that aims to check that a system satisfies properties requested by its specifications, or to detect differences between results produced by the system and those expected by the specifications. — IEEE-STD 729, 1983

Testing is fundamentally about two complementary activities: verification — does the system do what it should? — and detection — does it behave differently from what was expected? These two goals drive every testing strategy, from the simplest unit test to the most elaborate acceptance campaign.

What Are We Testing?

Properties a system may need to satisfy:

Functionality: does it do what it is supposed to do?
Security and integrity: is data safe from corruption or unauthorized access?
Usability: can users interact with it effectively?
Robustness: does it handle unexpected inputs gracefully?
Maintainability: is the code structured well enough to be modified safely?
Efficiency: does it use resources (CPU, memory, network) appropriately?
Coherence: are internal data states always consistent?

Static vs. Dynamic Testing

There are two broad approaches to testing, and a rigorous process uses both.

Static testing examines the code without executing it:

Code reviews and inspections
Automated rule checkers (style, security patterns)
Formal analysis tools
Advantage: catches issues before the program runs, at no execution cost

Dynamic testing runs the program with specific inputs and observes outputs:

Requires executable code
Can detect runtime failures that static analysis misses
The focus of most automated test frameworks (JUnit, pytest, etc.)

Neither approach is sufficient alone. Static testing cannot observe runtime behaviour; dynamic testing cannot guarantee coverage of all possible paths.

Black Box vs. White Box Testing

Black box (functional) testing is based on the specification, not the implementation. The tester knows what the system should do, but not how it does it. Inputs and expected outputs are derived from requirements documents, user stories, or contracts. This approach is well-suited to acceptance tests and system-level tests, and can be conducted without access to the source code.

White box (structural) testing is based on the internal structure of the program. The tester has access to the source code and designs tests to exercise specific code paths, branches, and conditions. The goal is to ensure that every meaningful path through the logic is executed at least once. White box testing is the basis of code coverage analysis and is used heavily in unit testing.

Test Hierarchy

Different test levels validate different stages of development. In practice, test execution flows bottom-up: you validate units first, then their integration, then the full system.

| Level | What is Validated | Corresponds to | |---|---|---| | Unit tests | Individual classes and methods | Detailed design | | Integration tests | Interactions between modules | Global design | | System tests | The full system as a whole | Technical specifications | | Acceptance tests | User requirements and use cases | Requirement definition |

Types of Testing

Unit Testing validates individual methods or classes in isolation. Unit tests are white-box, typically written by the same developer who wrote the code. They are the most granular level and the fastest to run, making them the foundation of any continuous integration pipeline.

Integration Testing validates the interactions between modules. Finding the right testing order matters — if dependencies form a tree, test from leaves up to the root. Cycles in dependencies require stubs or mocks to break them artificially. Integration defects are often invisible to unit tests because they live in the interfaces, not the internals.

System Testing validates the full system end-to-end, including GUI, performance, and security. It is typically black-box — testers exercise the system as a user would, without knowledge of internal structure.

Non-Regression Testing ensures that after any change — a bug fix, a refactoring, a new feature — previously working behaviour still works. This is especially critical in maintenance: every change risks breaking something else. A comprehensive non-regression suite is the single most important tool for safe maintenance.

Stress / Load / Performance Testing answers the question: how many users, transactions, or events can the system handle before degrading? These tests expose scalability limits and are essential before major deployments or architectural changes.

Acceptance Testing validates that the system does what the customer actually wants. It is conducted by end-users, not developers, and it is the final gate before release.

Test-Driven Development

The TDD Cycle

TDD is built around one short loop, repeated continuously:

Red — Write a failing test. The test must fail because the feature does not exist yet.
Green — Write the minimum code necessary to make the test pass. No more.
Refactor — Clean up both production code and test code without changing behaviour.

TDD is about design, not testing

“TDD is the craft of producing automated tests for production code, and using that process to drive design and programming. For every tiny bit of functionality in the production code, you first develop a test that specifies and validates what the code will do.”

Automated tests are a valuable side-effect — not the primary goal.

Advantages of TDD

Writing the test first means the program is used (called) before it exists — this forces good API design
Keeps design decisions small and reversible — you only build what the test demands
Builds a non-regression suite automatically as a by-product
Increases confidence when refactoring: if tests still pass, behaviour is preserved
Provides a measurable velocity indicator: passing tests = done features

What TDD Is NOT

Common misconceptions:

Not “write all tests first, then build the system” — tests and code must alternate, one increment at a time
Not “do automated testing” — automated tests can exist without TDD; TDD is a design discipline
Not a process (like Scrum or Waterfall) — it is a practice (like pair programming or code reviews)
Not about writing lots of tests — it is about writing the right test at the right time

TDD Example: PasswordValidator (Java + JUnit 5)

Requirements: Passwords must be 6–10 characters long, contain at least one digit, and contain at least one uppercase letter.

Step 1 — Write the Failing Test (Red)

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

class PasswordValidatorTest {

    @Test
    void validPasswordShouldPass() {
        assertTrue(PasswordValidator.isValid("Abc123"));
    }
}

This does not compile yet — PasswordValidator does not exist. That is intentional: the test defines the interface before the implementation.

Step 2 — Write Minimal Code (Green)

public class PasswordValidator {
    public static boolean isValid(String password) {
        return true; // stub — just enough to compile and make the test pass
    }
}

The test now passes. But one passing test is not a specification.

Step 3 — Add More Tests, Trigger Red Again

Add these tests inside the existing PasswordValidatorTest class:

@Test
void tooShortPasswordShouldFail() {
    assertFalse(PasswordValidator.isValid("Ab1"));
}

@Test
void tooLongPasswordShouldFail() {
    assertFalse(PasswordValidator.isValid("Abc123456789"));
}

@Test
void noDigitShouldFail() {
    assertFalse(PasswordValidator.isValid("Abcdef"));
}

@Test
void noUppercaseShouldFail() {
    assertFalse(PasswordValidator.isValid("abc123"));
}

The stub return true now fails three of these. Back to Red.

Step 4 — Implement Properly (Green)

import java.util.regex.Pattern;

public class PasswordValidator {

    private static final int MIN_LENGTH = 6;
    private static final int MAX_LENGTH = 10;

    private static boolean isValidLength(String password) {
        return password.length() >= MIN_LENGTH && password.length() <= MAX_LENGTH;
    }

    private static final Pattern DIGIT_PATTERN    = Pattern.compile(".*\\p{Digit}.*");
    private static final Pattern UPPERCASE_PATTERN = Pattern.compile(".*\\p{Upper}.*");

    private static boolean containsDigit(String password) {
        return DIGIT_PATTERN.matcher(password).matches();
    }

    private static boolean containsUppercase(String password) {
        return UPPERCASE_PATTERN.matcher(password).matches();
    }

    public static boolean isValid(String password) {
        return isValidLength(password)
            && containsDigit(password)
            && containsUppercase(password);
    }
}

All tests pass. Note the naming: each private method expresses a single rule, making isValid read like a specification.

Step 5 — Refactor

The magic numbers 6 and 10 are now named constants. The method chain in isValid is readable. Could we simplify further? Yes — but YAGNI applies: refactor only what improves clarity, not speculatively.

Step 6 — Integration Tests

Unit tests validate PasswordValidator in isolation. But in a real application, password validation is used by a UserRegistrationService — and that service interacts with a user repository, sends confirmation emails, and checks for duplicate accounts. An integration test validates those component interactions together:

The following example assumes a UserRegistrationService backed by a real database and a RegistrationResult value object — the exact implementation is intentionally left abstract; the test structure is what matters.

class UserRegistrationServiceTest {

    private final UserRegistrationService service = new UserRegistrationService();

    @Test
    void registeringWithValidPasswordSucceeds() {
        RegistrationResult result = service.register("alice@example.com", "Secure1x");
        assertEquals(RegistrationStatus.SUCCESS, result.status());
    }

    @Test
    void registeringWithWeakPasswordFails() {
        RegistrationResult result = service.register("alice@example.com", "weak");
        assertEquals(RegistrationStatus.INVALID_PASSWORD, result.status());
        assertEquals("Password does not meet requirements", result.message());
    }

    @Test
    void registeringWithDuplicateEmailFails() {
        service.register("alice@example.com", "Secure1x");
        RegistrationResult result = service.register("alice@example.com", "Secure2y");
        assertEquals(RegistrationStatus.EMAIL_ALREADY_EXISTS, result.status());
    }
}

Key distinctions from unit tests:

Integration tests exercise real component interactions — not stubs
They may involve a database, file system, or HTTP layer
They are slower and more brittle than unit tests, but catch integration bugs that unit tests cannot
In TDD, they are written at a higher level and serve as the acceptance criteria for a feature

The Test Pyramid

A healthy test suite follows the pyramid model: many unit tests (fast, cheap, isolated), fewer integration tests, and even fewer end-to-end tests. Inverting this — many slow integration tests, few unit tests — leads to slow feedback and fragile test suites.

JUnit 5 Reference

Annotations

| Annotation | Description | |---|---| | @Test | Marks a method as a test case | | @BeforeEach | Runs before each test method | | @AfterEach | Runs after each test method | | @BeforeAll | Runs once before all tests in the class (must be static) | | @AfterAll | Runs once after all tests in the class (must be static) | | @Disabled | Skips the test (with optional reason) | | @DisplayName("...") | Sets a human-readable test name | | @ParameterizedTest | Runs the test with multiple sets of arguments |

Assertions

| Method | Purpose | |---|---| | assertEquals(expected, actual) | Checks equality | | assertNotEquals(a, b) | Checks inequality | | assertTrue(condition) | Checks the condition is true | | assertFalse(condition) | Checks the condition is false | | assertNull(object) | Checks the reference is null | | assertNotNull(object) | Checks the reference is not null | | assertThrows(ExType.class, () -> ...) | Expects an exception of the given type | | assertAll(executables...) | Groups assertions — all are checked even if one fails |