Securing AI Systems Against Adversarial Attacks: A Framework for Building Robust and Trustworthy Machine Learning Models

Main Article Content

Hamza Afzal, Malik Huzaifa

Abstract

Harnessing AI systems with adversarial attacks has turned out to be an issue of urgent concern as machine learning models continue to work in high-stakes settings. This research suggests an alternative and all-in-one adversarial defense, which incorporates safe data preprocessing, CNN-based feature extraction, and iterative adversarial retraining to improve the robustness and reliability of the model. The framework is tested on the MNIST data set and also includes adversarial samples created by using FGSM, PGD, BIM, and C&W. The retraining cycle allows the network to acquire more stable decision boundaries by repeatedly exposing the model to changing perturbations and helping to counter weaknesses both on white and gray-box threat conditions. Experimental findings indicate a great increment of robustness, accuracy upgrades between 7% and over 97% following retraining across all types of attack. The model is clean with an accuracy of 99.1 %, and it performs better than the current methods, including conventional adversarial training and AEDPL-DL. Comparative evaluation proves the fact that iterative retraining of adversarial models is more resilient to data poisoning, evasion, and gradient-based attacks. The suggested solution represents a promising avenue for creating secure, attack-deterrent, and trustworthy machine learning systems that are applicable in the real world.

Article Details

Section
Articles