Probabilistic Generative Modeling for Synthesizing High-Coverage Test Data in Safety-Critical Software Applications
Main Article Content
Abstract
Testing safety-critical software systems demands rigorous input generation strategies capable of uncovering subtle, low-frequency faults that could have serious consequences. Conventional test data generation techniques often struggle to achieve meaningful coverage especially in complex systems where exhaustive path exploration is computationally prohibitive. This paper introduces a probabilistic framework for test data synthesis that combines Variational Autoencoders (VAEs) with Probabilistic Context-Free Grammars (PCFGs) to generate inputs tailored for high structural and semantic coverage. Rather than relying on brute-force or random sampling, our method learns the statistical structure of valid input domains and uses this to guide test generation toward areas of the code underexplored by standard tools. We incorporate domain-specific metrics to align generated inputs with safety-relevant execution paths. In empirical evaluations across representative domains such as avionics and medical software our approach outperforms traditional fuzzers and symbolic execution engines in both branch and path coverage. These results suggest that probabilistic generative modeling, when applied thoughtfully, can support more effective and principled verification in high-assurance software development.