Validation Towards Realistic Synthetic Datasets in Production Planning Tagungsband uri icon

Open Access

  • false

Peer Reviewed

  • false

Abstract

  • For large-scale simulations, a sufficient data amount is required. Despite an increasing data availability, it is still challenging to gather large-scale datasets, which are comprehensive, correct, accessible, and realistic, to validate new algorithms and models. An alternative is the use of synthetic data. Thus, we propose a novel methodology to generate realistic datasets. Based upon the statistical properties of real-world data, synthetic datasets are generated by ML models and filtered for anomalous values. The generated datasets are then compared to find the most suitable one. For this validation procedure, a modified Hopfield neural network model is extended to enable an analysis of sequences and to derive a comparison metric. The method demonstrates its applicability by providing an in-depth comparison of all tested data generators using a real-world dataset of a mid-size manufacturing company, whereby transformer-based generators proved most suitable. More diverse use cases should be evaluated in future research.