Data has become the heartbeat of modern testing, every test case, every scenario, every AI request, and every automation run depends on having the correct data that also matches with any dependent systems.

Many teams still struggle with their test data. I’ve seen countless projects delayed because test data was missing, incomplete or simply outdated. In fact, 44% of testing time is still spent waiting for, manually making or finding test data [source]. That’s nearly half of your testing effort lost to data bottlenecks.

Over the last 40 years, I’ve helped some of the world’s largest enterprises eliminate data bottlenecks and transform their test data management practices. From that experience, I’ve distilled four essential components of a modern test data strategy:

Understanding your data landscape
Designing and creating test data
Ensuring complete data coverage
Delivering and managing test data

Let me walk you through how I approach each one.

1. Understanding your data landscape

Before effective testing can begin, you need a clear understanding of your data landscape. What data do you have? How good is it? How is it related across system? And how does it travel through your application landscape?

Deep data discovery

First up, run a series of deep analysis jobs. Starting with basic structures and version for every data asset in your organization, from databases to mainframes to Kafka messages to API payloads. Once the basic metadata is found you can further scan the data looking for data patterns, ranges, enumerations and similarities.

This process will discover hidden business relationships and identify any invalid data. Any commercially sensitive or private data can be tagged and mapped to data masking routines. The discovery builds you a map of how data flows through your application and can be run continuously, validating and verifying your data.

Building a centralised data catalogue

The next step is to build a centralised data catalogue, a living, central reference for data definitions, documentation, relationships, dependencies and formats. It acts as the single source of truth for how data behaves across systems, tracking valid values, structures and rules. Maintaining this dictionary helps prevent test failures caused by data mismatches or missing information, ensuring consistency and reliability across every environment.

2. Designing and creating test data

Once you understand your data, you need tools and a strategy for designing and creating it, to meet testing and development needs, safely, efficiently and at scale.

Generating synthetic data

Production data often falls short, it may contain sensitive information, lack edge cases or be too similar to test new features and edge cases. Synthetic data generation is essential for any modern test data strategy, it overcomes the limitations of production data by generating realistic, artificial datasets that cover every user story, test case and business requirement, delivered on demand across your development landscape. Most companies already use manually created data to support development and testing, the trick is to commoditise this effort.

Cloning and subsetting data

Cloning multiplies existing records with small variations, adjusting keys, dates or attributes to broaden coverage. Subsetting extracts smaller, representative portions of production data that preserve referential integrity while reducing storage and processing overhead. Combined with masking or synthetic data, subsets stay secure, manageable and complete.

Masking sensitive data

When using production-like data, masking is essential to protect privacy and compliance. It anonymises or pseudonymises sensitive values such as PII while preserving structure and integrity. Masking scales efficiently for large datasets, but unlike synthetic data, it’s limited by existing structures and offers less flexibility for complex or edge-case scenarios. Therefore, synthetic data in combination with masking is crucial for scalability, compliance and data coverage.

3. Ensuring complete data coverage

A robust test data strategy isn’t just about volume, it’s about coverage. How well does your data represent all possible combinations and scenarios? Complete coverage means your data represents all relevant combinations and conditions, the key to uncovering hidden defects.

Data design and modelling

Data design and modelling provide the blueprint for comprehensive coverage, defining how information is structured, related and flows between systems. Together, they create a clear map of entities, attributes and dependencies, helping teams visualise how real-world business processes are reflected in data and where gaps or inconsistencies may exist.

This blueprint also guides which data to create, clone or synthesise, targeting missing or underrepresented scenarios. The result is higher test quality, as data modelling transforms test data creation from guesswork into a deliberate, coverage-driven process that improves accuracy, completeness and confidence in test outcomes.

A simple model for testing and generating contact details.

4. Delivering and managing test data

Even the best data is useless if it isn’t available when and where it’s needed. Test data delivery and orchestration ensure that high-quality, relevant data reaches the right environments at the right time, enabling faster, more reliable testing.

Efficient data provisioning

Agile and DevOps teams can’t afford to wait days for data preparation. Efficient, API-driven provisioning gives testers and developers rapid, self-service access to the datasets they need, tailored to specific scenarios, environments or users.

In continuous testing environments, automated provisioning enables data to be requested and delivered dynamically during test execution, with no manual setup required. Integrated directly into CI/CD pipelines, it ensures that every automated test run has access to fresh, valid and contextually relevant data. In turn, robust provisioning leads to accelerated test execution, reduced dependency on data teams and seamless alignment between test data, automation and delivery workflows.

Curiosity’s Enterprise Test Data platform

Modern testing demands fast, safe and complete access to high-quality data. Curiosity’s Enterprise Test Data® platform delivers exactly that, combining AI, automation and self-service to transform how teams discover, design and deliver test data.

Enterprise Test Data® goes beyond simple discovery to give teams instant insight into their data ecosystem. AI-powered deep scanning and mapping uncover relationships and dependencies across systems. Built-in data design and modelling tools visualise data combinations, highlight gaps and guide the creation of missing records, giving teams the confidence to move faster without compromising quality.

Once you understand your data, you can take control of it. Enterprise Test Data’s synthetic data generation capabilities enable the creation and provisioning of realistic, consistent data, while scalable masking replaces sensitive values to maintain compliance, perfect for fast, repeatable test environments that mirror production with none of the risk.

In addition, continuous governance features keep data quality high and compliance effortless. Enterprise Test Data® leverages deep scanning and AI-driven monitoring to help ensure development data stays secure and compliant across every environment. With Enterprise Test Data®, you can manage, generate and deliver the right data at the right time, improving coverage, compliance and confidence across every release.

Turning test data into a competitive advantage

Test data may once have been an afterthought, but in today’s fast-moving delivery pipelines, it’s become a strategic asset, it’s the foundation of reliable, scalable and compliant testing.

With the right practices and technology, I believe development teams can transform how they work with data. Through clear understanding, intelligent generation, complete coverage and efficient provisioning, you can eliminate one of the biggest bottlenecks in testing.

Curiosity’s Enterprise Test Data® platform brings this strategy to life. It’s how data rich organisations are turning data chaos into controlled, compliant and continuously available test data that powers modern testing for modern delivery. The result is not just better data, but better software, delivered on time, every time.

How to turbocharge your data with AI

Join Curiosity's Head of Solution Engineering, Harry Burn and Solutions Engineer, Toby Richardson to see how AI can revolutionise your development processes and test data management.

Watch Webinar

4 essential components of a modern test data strategy

Table of contents