Test Data Strategy Success: Data Regulation
In 2023, test data remains one of the biggest blockers to fast and effective software delivery. Outdated test data management (TDM) practices don’t...
Design Complex Systems, Create Visual Models, Collaborate on Requirements, Eradicate Bugs and Deliver Quality!
Product Overview | Solutions |
Success Stories | Integrations |
Book a Demo | Release Notes |
Free Trial | Brochure |
Pricing |
Our innovative solutions help you deliver quality software earlier, and at less cost!
AI Accelerated Quality Scalable AI accelerated test creation for improved quality and faster software delivery.
Test Case Design Generate the smallest set of test cases needed to test complex systems.
Data Subsetting & Cloning Extract the smallest data sets needed for referential integrity and coverage.
API Test Automation Make complex API testing simple, using a visual approach to generate rigorous API tests.
Synthetic Data Generation Generate complete and compliant synthetic data on-demand for every scenario.
Data Allocation Automatically find and make data for every possible test, testing continuously and in parallel.
Requirements Modelling Model complex systems and requirements as complete flowcharts in-sprint.
Data Masking Identify and mask sensitive information across databases and files.
Legacy TDM Replacement Move to a modern test data solution with cutting-edge capabilities.
See how we empower customer success, watch our latest webinars, read our newest eBooks and more.
Events Join the Curiosity team in person or virtually at our upcoming events and conferences.
Blog Discover software quality trends and thought leadership brought to you by the Curiosity team.
Help & Support Find a solution, request expert support and contact Curiosity.
Success Stories Learn how our customers found success with Curiosity's Modeller and Enterprise Test Data.
Documentation Get started with the Curiosity Platform, discover our learning portal and find solutions.
Integrations Explore Modeller's wide range of connections and integrations.
Curiosity are your partners for designing and building complex systems in short sprints!
Meet Our Team Meet our team of world leading experts in software quality and test data.
Our History Explore Curiosity's long history of creating market-defining solutions and success.
Our Mission Discover how we aim to revolutionize the quality and speed of software delivery.
Our Partners Learn about our partners and how we can help you solve your software delivery challenges.
Careers Join our growing team of industry veterans, experts, innovators and specialists.
Press Releases Read the latest Curiosity news and company updates.
Success Stories Learn how our customers found success with Curiosity's Modeller and Enterprise Test Data.
Blog Discover software quality trends and thought leadership brought to you by the Curiosity team.
Contact Us Get in touch with a Curiosity expert or leave us a message.
4 min read
James Walker 27 June 2023 14:00:00 BST
Preventing Production Data Leaks in Test and Development Environments.
Last week, I was lucky enough to attend the European InfoSec conference in London. The event hosted a rich mix of start-ups, enterprises and insightful talks on information security. It was a feast of knowledge on the latest security standards.
At the conference, I was particularly struck by two things:
The market is awash with threat detection tools.
There's a common understanding of the risks posed by employees to an organization’s infrastructure.
This appreciation of employee threat applied particularly to test data. It seemed almost every software solution showcased at the conference had capabilities for detecting the misuse of production data in development and test environments.
It was fantastic seeing organisations take the security risks associated with non-production data seriously. But, what are the alternatives to using production data in testing and development?
As the adage goes, “a chain is only as strong as its weakest link.”
This applies particularly to your organisation’s data security: it's only as strong as your least informed, least careful employee. In fact, 74% of data breaches involve the “human element”,[1] while the average data breach costs $4.35 million.[2]
A single mishap can accordingly result in grave consequences, including when sensitive production data is spread to less-secure test environments.
Yet, organisations still routinely copy sensitive production data to less-secure test environments. This is an avoidable process which extends the attack surface of your data. So, why do organisations still do it?
The misuse of production data in test environments often stems from innocent-enough intentions. Developers frequently resort to using real data to test new features or troubleshoot issues, as it simulates real-world scenarios.
While this may seem beneficial from a testing perspective, it's a precarious practice for security. Companies may invest millions in securing production databases and associated infrastructure, implementing a myriad of guardrails, firewalls, and scanners. However, once that data transitions to a less secure environment such as a test or dev environment, it is now exposed to significantly less security.
Real production data often encompasses sensitive information, including customer names, addresses, and financial details. Mishandling this data can lead to breaches that not only tarnish your company's reputation, but also entail severe legal and financial consequences.
The boardroom might take note of one €1.2 billion fine that’s been levied under the EU General Data Protection Regulation (GDPR), along with the 1,691 other fines that have been imposed since it was introduced in 2018.[3]
When I asked the vendors at European InfoSec what they would recommend when production data is found in non-production environments, the common response was that it shouldn't be happening, and that access should be revoked.
This is a valid point, but it does not address the root cause. There are instances where developers or testers need to replicate scenarios to simulate specific scenarios in the application. If there isn’t a solid test data solution in place for provisioning and creating data, then pulling the production data is the only solution. So, the rules get bent, leading to exceptions that create unacceptable risk.
Instead, we should be thinking about a holistic test data strategy that empowers developers, testers and CI/CD tooling to create and provision the data they need securely and conveniently. Providing a faster, more robust alternative to copying production data will motivate testers and developers away from the practice; merely imposing stringent access control to production will lead to workarounds.
Historically, a common approach to protecting sensitive data in test environments has been to mask, anonymise or obfuscate the data. This technique replaces sensitive data with fictitious yet realistic information, allowing developers to work with data that behaves similarly to the original.
Though popular, this method has its shortcomings. One significant challenge with data masking is that it typically generates high-volume, low-variant data. In other words, it creates large quantities of data that lacks the diversity and unpredictability needed in testing.
Masked data mirrors past production usage, during which most users use the system as expected. It therefore lacks the negative scenarios needed for rigorous testing, along with data for testing new functionality:
Masked data will typically satisfy just a fraction of the scenarios needed in testing and development.
Masking alone limits the scope and effectiveness of testing, potentially allowing costly defects to slip through to production. Moreover, even though masking alters the data, it retains the original data's structure and distribution. For a skilled individual, you can usually then reverse-engineer sensitive data from the masked data set.
Researchers have identified 99.98% of individuals from anonymous data sets, using just 15 demographic attributes. Another study identified 90% of shoppers from credit card metadata, using just 4 random transactions per individual.[4]
The point is: you need to remove a lot of data before a data set is truly “anonymous”, including metadata and time series data. Yet, the data must still resemble the original.
To provide both privacy and testing rigour, a modern and fit-for-purpose approach to test data creation has been gaining traction: the use of synthetic data. Synthetic data refers to data that's artificially generated, rather than derived from actual events. Unlike masked data, synthetic data is not derived directly from real data, meaning it carries no risk of exposing sensitive information.
Instead, algorithms are used to create synthetic data based on scenarios and business logic underpinning an application. This means that a rich, covered set of data can be created, giving full data coverage for testing and development.
The use of synthetic data enables comprehensive and realistic testing, mitigating the risk of costly bugs and security risks. It is also capable of generating accurate data on demand, sidestepping the massive amount of development time wasted waiting for, finding, or making data.
With synthetic data, developers and testers can conduct their work efficiently and effectively, without exposing the organization to the risks associated with using real production data.
The strength of your data security strategy hinges on its weakest link. If you’re using live production data in non-production environments, this will likely represent one of the weakest links in your chain.
By implementing robust test data management practices, you can better fortify your organization's data against breaches. It's imperative to equip your workforce with the tools and knowledge they need to navigate the complex world of data security confidently and effectively. Synthetic test data generation provides a secure solution that accelerates and optimises testing and development.
Want to boost the security, efficiency and quality of your software delivery? Book a meeting to talk to us about Test Data Automation.
Footnotes:
[1] Verizon (2023), 2023 Data Breach Investigation Report. Retrieved from https://www.verizon.com/business/en-gb/resources/reports/dbir/ on 22/06/2023.
[2] Ponemon, IBM (2022), Cost of a Data Breach Report 2022. Retrieved from https://www.ibm.com/downloads/cas/3R8N1DZJ on 22/06/2023.
[3] Enforcement Tracker, “Statistics: Fines imposed over time”. Retrieved from https://www.enforcementtracker.com/?insights on 22/06/2023.
[4] Cited in Natasha Lomas (TechCrunch: 2019), “Researchers spotlight the lie of ‘anonymous’ data”. Retrieved from https://techcrunch.com/2019/07/24/researchers-spotlight-the-lie-of-anonymous-data/ on 22/06/2023.
In 2023, test data remains one of the biggest blockers to fast and effective software delivery. Outdated test data management (TDM) practices don’t...
Delays in testing are often due to testers waiting for data. These data provisioning bottlenecks are generally caused in part by an organisation’s...
With the rise of agile and DevOps practices, software testing is more important than ever for delivering high quality applications at speed. However,...
At Curiosity, we talk about test data extensively, because we believe test data is repeatedly neglected in testing and development discussions....
In today's fast-paced financial landscape, where seamless data exchange is crucial for operational efficiency, the adoption of ISO 20022 has emerged...
For many organisations, test data “best practices” start and end with compliance. This reflects a tendency to focus on the problem immediately in...
Curiosity often discuss barriers to “in-sprint testing”, focusing on techniques for reliably releasing fast-changing systems. These solutions...
In 2023, (test) data availability, quality, and compliance risks remain a major headache for software development.
Discover how Test Data Automation can help you automate your test data management by reading the infographic below! Curiosity's Test Data Automation...