Date on Master's Thesis/Doctoral Dissertation
5-2024
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Computer Engineering and Computer Science
Degree Program
Computer Science and Engineering, PhD
Committee Chair
Nasraoui, Olfa
Committee Co-Chair (if applicable)
Frigui, Hichem
Committee Member
Frigui, Hichem
Committee Member
Popa, Dan
Committee Member
Altiparmak, Nihat
Committee Member
Baidya, Sabur
Author's Keywords
AI; XAI; machine learning; explainability in machine learning
Abstract
Despite ongoing efforts to make black-box machine learning models more explainable, transparent, and trustworthy, there is growing advocacy for using only inherently interpretable models for high-stake decision-making. Post-hoc explanations have recently been criticized for learning surrogate models that may not accurately reflect the actual mechanisms of the original model and for adding computational burden at prediction time. We propose two novel explainability approaches to address these limitations: pre-hoc explainability and co-hoc explainability. These approaches integrate explanations derived from an inherently interpretable white-box model into the learning stage of the black-box model without compromising accuracy. Unlike post-hoc methods, our approach does not rely on random input perturbation or only post-hoc training. We extend our pre-hoc and co-hoc frameworks to generate instance-specific explanations by incorporating the Jensen-Shannon divergence as a regularization term while capturing the local behavior of the black-box model. This extension allows our methods to provide local explanations that are faithful to the model's behavior and consistent with the explanations generated by the global explainer model. We introduce a two-phase approach, where the first phase focuses on training the models for fidelity, and the second phase generates local explanations by fine-tuning the explainer model within the neighborhood of the instance being explained. Experiments on three benchmark datasets from different domains (credit risk scoring, movie recommendations, and robotic grasper failure detection) demonstrate the advantages of our techniques in terms of global and local fidelity without compromising accuracy. Our methods avoid the pitfalls of surrogate modeling, making them more scalable, robust, and reliable compared to post-hoc techniques like LIME. Moreover, our co-hoc learning framework enhances the accuracy of white-box models, which are learned to explain the black-box predictor. The white-box model achieves significantly higher prediction accuracy after the co-hoc learning process, highlighting the potential of the co-hoc in-training approach to improve the performance of white-box models, which are essential and required in specific high-risk and regulated application tasks in healthcare and legal decision-making. Our approaches provide more faithful and consistent explanations at a lower computational cost than LIME. Our theoretically derived methods are further shown to balance accuracy and interpretability through empirically regularized learning. The proposed frameworks offer a promising direction for making machine learning models more transparent and trustworthy while maintaining high prediction accuracy.
Recommended Citation
Acun, Asuman Cagla, "In-training explainability frameworks to make black-box machine learning models more explainable." (2024). Electronic Theses and Dissertations. Paper 4375.
https://doi.org/10.18297/etd/4375