Date on Master's Thesis/Doctoral Dissertation

5-2024

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Computer Engineering and Computer Science

Degree Program

Computer Science and Engineering, PhD

Committee Chair

Nasraoui, Olfa

Committee Co-Chair (if applicable)

Frigui, Hichem

Committee Member

Frigui, Hichem

Committee Member

Popa, Dan

Committee Member

Altiparmak, Nihat

Committee Member

Baidya, Sabur

Author's Keywords

AI; XAI; machine learning; explainability in machine learning

Abstract

Despite ongoing efforts to make black-box machine learning models more explainable, transparent, and trustworthy, there is growing advocacy for using only inherently interpretable models for high-stake decision-making. Post-hoc explanations have recently been criticized for learning surrogate models that may not accurately reflect the actual mechanisms of the original model and for adding computational burden at prediction time. We propose two novel explainability approaches to address these limitations: pre-hoc explainability and co-hoc explainability. These approaches integrate explanations derived from an inherently interpretable white-box model into the learning stage of the black-box model without compromising accuracy. Unlike post-hoc methods, our approach does not rely on random input perturbation or only post-hoc training. We extend our pre-hoc and co-hoc frameworks to generate instance-specific explanations by incorporating the Jensen-Shannon divergence as a regularization term while capturing the local behavior of the black-box model. This extension allows our methods to provide local explanations that are faithful to the model's behavior and consistent with the explanations generated by the global explainer model. We introduce a two-phase approach, where the first phase focuses on training the models for fidelity, and the second phase generates local explanations by fine-tuning the explainer model within the neighborhood of the instance being explained. Experiments on three benchmark datasets from different domains (credit risk scoring, movie recommendations, and robotic grasper failure detection) demonstrate the advantages of our techniques in terms of global and local fidelity without compromising accuracy. Our methods avoid the pitfalls of surrogate modeling, making them more scalable, robust, and reliable compared to post-hoc techniques like LIME. Moreover, our co-hoc learning framework enhances the accuracy of white-box models, which are learned to explain the black-box predictor. The white-box model achieves significantly higher prediction accuracy after the co-hoc learning process, highlighting the potential of the co-hoc in-training approach to improve the performance of white-box models, which are essential and required in specific high-risk and regulated application tasks in healthcare and legal decision-making. Our approaches provide more faithful and consistent explanations at a lower computational cost than LIME. Our theoretically derived methods are further shown to balance accuracy and interpretability through empirically regularized learning. The proposed frameworks offer a promising direction for making machine learning models more transparent and trustworthy while maintaining high prediction accuracy.

Share

COinS