Test data management (TDM) is a crucial practice for ensuring compliant data and providing uniformity to test data. In the same way testing environments and data models are continuously evolving, test data management practices require ongoing revision.
The use of personal data in development and testing environments is a persistent concern for software engineering leaders and organisations. This applies particularly in view of regulatory policies, such as the General Data Protection Regulation (GDPR) and the California Privacy Rights Act (CPRA).
On top of that, poor test data management (TDM) practices are bottlenecks to productivity and erode the confidence of software engineers regarding the quality of their products. These hurdles increase the importance of improving TDM practices.
Gartner recommends using tools that support synthetic data generation and automated test data, combined with test data practices to support technical challenges, and ensure product teams have the data they need to test their software engineering projects.
To improve TDM practices, software engineering leaders should follow these recommendations to lessen the burden on product teams, many of which are currently experiencing challenges in development and testing environments.
Application testing challenges
Product teams need to run tests to make informed decisions about potential risks and to build confidence in the quality of their products. However, if the teams involved in testing do not trust their test environments or the provisioned test data, testing is less valuable and test results are more likely to be met with scepticism. Overall, poor test environments and poor TDM practices reduce a team’s enthusiasm when it comes to testing activities.
According to a 2023 Gartner survey of software engineering leaders, the hiring, development and retention of talent ranks as the top challenge they currently face. From the developer perspective, Stack Overflow conducted a survey of more than 500 software developers to discover the factors that drive talent retention and what attracts technical talent in an organisation. The top concern for over 53% of the developers surveyed is the prioritisation of developer experience at work. What these statistics illustrate is that establishing good TDM practices is not solely about compliance.
While funding an initiative based on the lack of compliance with TDM is a good strategy, poor TDM practices have expensive and far-reaching implications for software engineering leaders. It affects their ability to maintain happy and effective teams which take pride in the way they are working. Hence software engineering leaders should view TDM as an opportunity to improve tester and developer experience.
- Production datasets are frequently large. This can increase the cost and time required for exercises conducted in development and testing environments.
- The permissions and other security controls required to access and navigate the production data environment produce delays for both internal and contracted team members.
- Production data frequently contains confidential and private information that may be subject to regulation. While production data is used in organisations, businesses and operations, it may still be missing characteristics for net new application features or for the development of machine learning models.
- Production data may contain biases, such as the shape and distribution of data regarding current customers, thus not fully representing the market of potential buyers.
- Data models may have complex relationships within and across production data models. This makes the process of subsetting the data problematic when running applications on top of the subsetted data.
TDM offers IT leaders a way to protect sensitive data and prevent it from being misused in less secure environments. It makes software engineering frictionless by providing a set of tools and processes that simplify the work and make test data available to product teams. TDM also mitigates concerns regarding compliance in the test environment.
Invest in ongoing test data management
Software-based products, source code and data models are continuously evolving. Data relationships not only live in databases, but also exist in the implementation of source code. As a consequence, software engineering leaders should not implement a one-time approach to TDM. Rather, Gartner recommends building TDM into the current software development and testing disciplines.
Due to the ongoing nature of TDM, product teams will need support from other teams, such as platforms or enabling teams. In Gartner’s 2020 Achieve business agility with DevOps and automation study, 82% of survey respondents indicated they were using platform teams. These types of teams aid product teams by supporting the implementation of technologies and practices, and accelerating innovation as and when a team hits a roadblock in a software engineering project.
Product teams can be supported using a platform team that offers coaching and engineering activities and may be able to implement a workaround to specific bottlenecks. The platform team may support software quality and testing tools that assist distributed testers. Gartner advises platform teams to adopt a product owner role that empathises with and understands the needs of developers and testers.
Implement synthetic data generation
Software engineering leaders should influence their teams and product engineering stakeholders to avoid a production-data-first mindset when it comes to the development, triaging and testing of features. To reduce the complexity of testing applications with production datasets (see box: Constraints associated with utilising production data). Gartner also recommends that software engineering leaders promote tools that generate synthetic data.
Synthetic data is artificially generated as opposed to being a copy, a subset and a mask operation from production data sources and is used to support systems where real data is expensive, unavailable, imbalanced or unusable due to privacy regulations. Examples of different methods offered by suppliers to generate synthetic data include rule-based logic, general adversarial networks and statistical sampling.
The main benefit of synthetic data generation to software engineering teams is that developers and testers get access to relevant data without requiring access to production or going through a lengthy process of masking sensitive production data.
Using synthetic data changes the viewpoint that production data should be the first stage to begin a TDM initiative. Also, promoting synthetic data generation raises awareness among product teams. For instance, there are times when production data does not exist or is not robust in data diversity to the extent it does not represent the needs of features in development and testing.
Gartner’s three steps to test data management starts with using TDM to improve the tester and developer experience, reduce bottleneck and increase compliance. The next step is using a platform team that offers software engineering teams guidance and technical assistance to work around roadblocks that prevent a project from progressing forward smoothly. Finally, as a third step, Gartner urges software engineering teams to test their projects using tools that generate synthetic data, to avoid the risks of using production datasets that are often large, contain confidential data and may exhibit inherent biases.
This article is based on the Gartner report, 3 steps to improve test data management for software engineering. Alys Woodward is a senior director analyst covering data and analytics for mid-size enterprises, along with synthetic data and artificial intelligence.
Read more about testing
- A proportion of cyber security spend goes towards securing application development. Software teams are also budgeting for IT security.
- Using real data is beneficial in software testing, but teams must be careful not to compromise security and privacy – six core strategies for fintech testing can help.