Stabilizing Continuous Delivery by Eliminating Flaky UI Tests
To meet market and economic demands, the software delivery process is constantly evolving. New technologies enable organizations to create software more quickly, which necessitates equal improvements in other areas of delivery such as testing to prevent unnecessary gaps in quality or velocity.
Top-performing software teams are taking a modern approach to this challenge by incorporating testing into short incremental development iterations rather than treating software quality as a separate phase of the delivery cycle. With shorter timelines, testing must be quick, adaptable, and flexible, and highly reliable in implementation and operation.
Organizations are increasingly incorporating automated testing earlier in their development process as a major element of their Continuous Workflow to reduce the downstream repercussions of defects and their magnified price in later stages of shipping.
The challenge with flaky tests
Flaky automated tests cause chaos in the software delivery process. Inconsistently failing tests stymie the process of incorporating changes into a build that is always green, preventing software teams from deploying on an as-needed basis. In this sense, test flakiness symbolizes a tangible roadblock between downstream development and deployment activities. There are a few common causes of test flakiness:
Test dependencies: One way to get a great result is to make sure your tests are interdependent. If one test fails, they all fail, and you know where to begin your investigation. However, if your tests are overly reliant on one another, your automation may become unstable.
Unstable third-party APIs: Reduced control over your test environment increases the risk of test uncertainty. When your test suite is dependent on untrustworthy third-party APIs or functionality managed by another team, flaky tests can occur. These tests may fail intermittently due to third-party system errors, untrusted internet connectivity, or changes in third-party contracts.
UI tests: UI tests are used to validate visual logic, functionality, graphics, and so on. Because they begin at the browser level, they can be quite flaky for a variety of reasons ranging from missing HTML components, cookie modifications, and so on to true system problems. If you see your test suite as a triangle, UI tests are at the top. They should only represent a small portion of your test library since they are usually the most unstable tests in any automated framework, both expensive and hard to maintain, and time-consuming to run.
Test order dependency: A test may fail as a result of another test that starts either before or after it. This occurs when several tests use common data, such as state variables, inputs, and dependencies, at the same time. To increase precision and decrease flakiness, we must eliminate or decrease the correlations between these tests. Use stubs and mocks whenever your test relies on another module. Stubs are objects that respond to queries with predetermined replies. Mocks (also known as fakes) are things that closely resemble the working representation but do not match it exactly. Mocking and stubbing generate tests that run independently.
Using hardcoded test data: There are numerous advantages to automating your tests; however, using hardcoded data may eliminate those advantages. Instead of being supplied by an external source, hardcoded data is explicitly written into the code as part of the automated test. Almost any test automation expert will explain to you that this can lead to inconsistent outcomes.
Inconsistent assertion timing: When the state of your application varies between test runs, expect/assert statements fail at random. Write tests that wait for the application to be in a consistent state before asserting. I'm not even referring to "wait" phrases. There should be predicates in place to poll the application state until it reaches a known good state from which you may assert.
Lack of communication: Changes to the code are not reflected in the test design. Workflow changes are not communicated among team members, and development and QA release activities do not follow the same schedule.
A feedback system that provides inconsistent or incorrect information is a horrible thing, especially in rapid development cycles. Not only must you halt development, but you must also determine whether the failure was caused by the code under test or by the test itself. The more frequently a test fails incorrectly, the less trust there is in the validation system.
To maintain team velocity, development teams either flag and disable flaky tests or simply revert back to manual testing. Both of these anti-patterns promote production risks by disregarding early warning signs of larger quality problems. Furthermore, a rise in defects undermines developer productivity when writing code, requiring teams to devote more resources to troubleshooting and much less time to the project plans.
Empathy is the foundation of a stable environment
It is simpler for developers to write code that ultimately complies with real-world requirements when they are aware of how code is tested and how realistically the test environment compares to use in production. Similarly, no one likes it when their test fails because a resource was left in an improper state by a previous script execution or a coworker. Make sure that tests follow proper set-up and tear-down procedures that leave shared resources of satisfactory quality.
Whenever possible, delete temporary files, shift back incomplete database transaction records, and close network connections to maintain a stable testing environment. Many organizations are switching to automate the management of all settings through the container, deployment, and artifact management technologies because the shared tenancy of a test environment can also result in unanticipated issues in test cycles. Changes to network configuration and domain management, on the other hand, can have deleterious consequences for test execution, so it is absolutely essential to actively manage and buffer these modifications from your development cycle until they can be fully addressed.
Consider testing to be a component of development
To continue improving test flakiness, coding, and testing operations must be properly synchronized using specific practices. The smaller the gap between coding and testing activities, the more ownership over quality improvements will be visible in the work performed.
The symbiosis between testing and coding
To meet the complexity and speed requirements of application development, automation has driven testing to become code oriented. Thank heavens, you can enhance the reliability of your tests by employing a handful of well-known coding practices. Treat your tests as if they were code permanent residents. This means allowing your testing strategy to inherit the benefits of primary project assets like versioning, traceability, and ease of delivery. Using a system like Git to manage your test scripts puts them on the same level as the rest of the app. Just by making code and testing fellow citizens, this is a meaningful step toward including quality in development.
Close technical gaps between the source code for the app and the test code. Writing tests using the same language as the app and including app resource dictionaries in your test scripts promotes a healthy symbiosis between these assets. Failures during compile time are much easier to understand than failures during code check-in or later-stage regression cycles.