A metadata-based approach for the generation of performance test data

Performance testing is an important measure to provide a high software quality. For that, the software is observed processing a big or particularly formed input data set regarding speed and stability of the processing. It is a big challenge to find useful test data for performance testing. Randomly generated input data is often not sufficient, as possibly occuring characteristics of real-world data are ignored. In many cases, rules as they occur in the domain of the software, particular database contents, constraints and relations have to be recreated to provide realistic test data, which is often a very difficult task. Data mining provides techniques to gain knowledge in big data sets. Included is a technique called clustering that allows to detect groups of data samples that belong together. It is to be researched if the results of a data mining clustering analysis performed on real-world customer database metadata allows to generate better performance test data.

The general goal of this thesis is to analyze techniques to generate more realistic test data and test data that reveals actual issues with the performance. Within the scope of this thesis, data mining is applied at database metadata of actual customer databases to find clusters. In a second step it is researched if and in how far found characteristics and clusters can be used to achieve the goal of test data with higher quality.

Project information



Thesis for degree:



Martin Kühn