The Democratisation of (Test) Data
A glance at industry research from recent years shows that test data remains one of the major bottlenecks to fix in DevOps and CI/CD:
Design Complex Systems, Create Visual Models, Collaborate on Requirements, Eradicate Bugs and Deliver Quality!
Product Overview | Solutions |
Success Stories | Integrations |
Book a Demo | Release Notes |
Free Trial | Brochure |
Pricing |
Our innovative solutions help you deliver quality software earlier, and at less cost!
AI Accelerated Quality Scalable AI accelerated test creation for improved quality and faster software delivery.
Test Case Design Generate the smallest set of test cases needed to test complex systems.
Data Subsetting & Cloning Extract the smallest data sets needed for referential integrity and coverage.
API Test Automation Make complex API testing simple, using a visual approach to generate rigorous API tests.
Synthetic Data Generation Generate complete and compliant synthetic data on-demand for every scenario.
Data Allocation Automatically find and make data for every possible test, testing continuously and in parallel.
Requirements Modelling Model complex systems and requirements as complete flowcharts in-sprint.
Data Masking Identify and mask sensitive information across databases and files.
Legacy TDM Replacement Move to a modern test data solution with cutting-edge capabilities.
See how we empower customer success, watch our latest webinars, read our newest eBooks and more.
Events Join the Curiosity team in person or virtually at our upcoming events and conferences.
Blog Discover software quality trends and thought leadership brought to you by the Curiosity team.
Help & Support Find a solution, request expert support and contact Curiosity.
Success Stories Learn how our customers found success with Curiosity's Modeller and Enterprise Test Data.
Documentation Get started with the Curiosity Platform, discover our learning portal and find solutions.
Integrations Explore Modeller's wide range of connections and integrations.
Curiosity are your partners for designing and building complex systems in short sprints!
Meet Our Team Meet our team of world leading experts in software quality and test data.
Our History Explore Curiosity's long history of creating market-defining solutions and success.
Our Mission Discover how we aim to revolutionize the quality and speed of software delivery.
Our Partners Learn about our partners and how we can help you solve your software delivery challenges.
Careers Join our growing team of industry veterans, experts, innovators and specialists.
Press Releases Read the latest Curiosity news and company updates.
Success Stories Learn how our customers found success with Curiosity's Modeller and Enterprise Test Data.
Blog Discover software quality trends and thought leadership brought to you by the Curiosity team.
Contact Us Get in touch with a Curiosity expert or leave us a message.
Today, more than 50% of organisations are using full-size copies of production data in database development and testing [1], while 54% of test teams still depend on production data copies [2]. This use of low-variety production data undermines test coverage, along with the software quality that depends on it.
Too often, test data practices overlook questions of test coverage. Yet, achieving the right coverage is paramount to successful testing. This is because test coverage focuses on mitigating the risk of costly bugs, by testing the system’s logic as rigorously as needed in-sprint.
Poor test coverage, by contrast, increases the risk of defects getting past testing and into production. This in turn increases the time and cost to fix the bugs, as they are detected too late in the software delivery lifecycle.
This blog will explore common causes of low test data coverage, before offering 5 techniques for overcoming these issues. These techniques have been chosen to help you consider a new and transformative approach to test data.
This blog is part 3/4 in a series focusing on test data modernization. Check out the other 3 parts below:
Testers and developers manage test data using a range of techniques, including generation, masking and subsetting. However, many legacy TDM practices persist across the industry. These hinder test coverage. Four such practices are summarised below:
Copying raw or masked production data is simply not good enough for rigorous testing. This is because production data rarely covers negative scenarios, edge cases, or data to test new functionality. By contrast, rigorous testing requires a spectrum of data combinations with which to execute each test:
Low-variety production data copies rarely contain the data combinations needed for rigorous testing.
Manually copying complex data across environments and systems is slow and error-prone, often breaking relationships in the data. Furthermore, databases are likely to change during refreshes, which causes data sets to become unaligned.
Testing with out-of-date and misaligned data in turn undermines test coverage and causes time-consuming test failures. In fact, 61% of respondents in the latest World Quality Report cite “maintaining test data consistency across different systems under test” as a test data challenge [2].
Subsetting test data is valuable for lowering storage costs, data provisioning time, and the time required to execute tests. However, simplistic subsetting techniques can damage both the relationships and coverage of data.
For instance, simply taking the first 1000 rows of each table will not respect the relationships between data that exists across tables. Nor will it typically provide the data needed to execute every test in a suite.
To boost test coverage, testers are often required to manually create the complex data needed to fulfil their test cases. However, manual data creation is time-consuming and error-prone, often creating inconsistent or incorrect data that causes time-consuming test failures.
These outdated TDM practices hold both testers and test coverage back. They call for new, structured and efficient techniques for test data generation, maintenance, and management.
Five different techniques for boosting test data coverage are set out below. You can see some of these techniques live in our webinar with Windocks, Turn Your Production Systems into Test-Ready Data!
Synthetic test data is artificially created data, that can be used for development and testing of applications, and is typically key for enhancing overall test coverage. A modern synthetic test data generation solution can create missing combinations of test data on-demand. This means testers no longer need to create data manually. Nor do they use potentially-sensitive and incomplete production data.
Testers can use synthetic test data to fill the gaps in data not found in existing production data, including negative scenarios and edge cases needed for rigorous testing. Synthetic data can be created algorithmically, using coverage analysis to find and fill gaps.
Though synthetic data creation is a powerful tool for driving higher test coverage, the latest World Quality Report found that only around half of test teams create and maintain synthetic data for testing [2].
Data analysis and comparisons give tests teams the ability to measure coverage and compare it across different environments, identifying gaps in data density and variety, before filling them with synthetic test data generation.
Automated data analysis has compared data across two environments, identifying missing values in each.
Using data coverage analysis tools can help automatically identify gaps in existing test data, ensuring that test data can fulfil every test scenario needed for rigorous test coverage. This might be performed, for example, by linking test cases to data, performing data lookups based on the tests.
Automated analysis today can therefore help identify the missing data needed to produce complete test data, before using data generation to improve test coverage.
With on-the-fly test data find and makes, parallel teams and frameworks can create data automatically as tests run Finds look for data based on the test case requirements, while makes use integrated test data generation. This makes missing combinations needed in testing, improving overall test coverage.
Integrating the automated find and makes with test automation frameworks and CI/CD pipelines lets tests self-provision the data they need on-the-fly, rapidly running the rigorous and targeted tests needed for optimal in-sprint coverage.
Techniques used today for finding data can be standardised and automated, rapidly building a catalogue of reusable data “finds”. Manual or automated tests can then parameterise and reuse these automated finds whenever they need data, with integrated data generation to create missing combinations on-the-fly:
On-the-fly “find and makes” ensure that every tester, developer and automated test comes equipped with the data they need.
You can watch an overview of automated test data “find and makes” in this recent video from Curiosity’s Managing Director, Huw Price:
Data cloning is another technique for boosting test coverage.
Data combination cloning creates multiple sets of a given combination, assigning unique identifiers to each clone. It duplicates data with the same characteristics, allowing parallel testers and tests to work without using up or editing one another’s data.
Data cloning ensures that all your tests can run in parallel and without failures, as it multiplies the data needed for test scenarios that require the same or similar data combinations. Cloning is particularly useful for automated testing that burns rapidly through data, as it ensures that new data is always readily available. This boosts in-sprint test coverage, as every test in a suite runs with the data it needs.
Test data subsetting, performed correctly, extracts compact, consistent, and intact data sets. “Covered” subsetting is further designed to retain coverage, reducing the volume of data copies while retaining data variety.
Extracting “covered” subsets provisions complete copies of data to multiple teams and frameworks. This avoids the delays caused by cross-team constraints, while reducing the cost of maintaining multiple data copies. Maintaining the variety and relationships of data further means that every test runs smoothly using consistent data, unlocking optimal coverage levels.
Using Test Data Automation, covered test data subsetting can be integrated with the different techniques set out in this article. Each technique is furthermore reusable on-the-fly, automatically allocating coverage-optimised data to parallel teams and frameworks:
Integrated test data technologies can be reused on-the-fly to ensure that every tester and test is equipped with the data they need.
The automated test data techniques outlined in this article enable organisations to create and allocate the data they need for every test scenario, boosting test coverage drastically. Furthermore, these techniques form part of an integrated and automated test data suite, Curiosity’s Test Data Automation.
Test Data Automation combines all the techniques covered in this article and more, enabling parallel teams and frameworks to stream the data they need, when and where they need it. Rather than a blocker to speed and test coverage, test data instead becomes available on demand, at all times, across the whole SDLC.
This blog is part 3/4 in a series focusing on test data modernization. Check out the other 3 parts below:
Want to see these techniques live? Watch our free webinar with Windocks, Turn Your Production Systems into Test-Ready Data! This webinar sets out how production databases can be made “test ready” and delivered on demand, enabling rapid, rigorous and compliant testing.
[1] Redgate (2021), The 2021 State of Database DevOps Report. Retrieved from https://www.red-gate.com/solutions/database-devops/report-2021
[2] Capgemini, Sogeti (2021), World Quality Report 2021-22. Retrieved from https://www.capgemini.com/gb-en/research/world-quality-report-wqr-2021-22/
A glance at industry research from recent years shows that test data remains one of the major bottlenecks to fix in DevOps and CI/CD:
As a result of the constantly evolving environment of global data protection legislation, test data management has become increasingly complex....
It’s 2024 and the risks associated with poor test data practices show no signs of abating.
Evolution and innovation in software delivery often focuses on automation, or on changing how teams collaborate and work together across the software...
At Curiosity, we talk about test data extensively, because we believe test data is repeatedly neglected in testing and development discussions....
Today, organisations utilise and adopt a range of technologies, both old and new, in service of enabling their “agile” delivery methodologies. Yet,...
Software delivery teams across the industry have embraced new(ish) approaches to development, from the different flavours of agile, to DevOps,...
Last week, we published a blog making the case for the next generation in TDM “best practice”. We considered why the logistical approach of “mask,...
My two most recent blogs have made the case for a new TDM paradigm called “Test Data Automation”. The first article considered how a logistical...