Date on Master's Thesis/Doctoral Dissertation

5-2024

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Computer Engineering and Computer Science

Degree Program

Computer Science and Engineering, PhD

Committee Chair

Nasraoui, Olfa

Committee Co-Chair (if applicable)

Frigui, Hichem

Committee Member

Frigui, Hichem

Committee Member

Popa, Dan

Committee Member

Altiparmak, Nihat

Committee Member

Baidya, Sabur

Author's Keywords

AI; XAI; machine learning; explainability in machine learning

Abstract

Despite ongoing efforts to make black-box machine learning models more explainable, transparent, and trustworthy, there is growing advocacy for using only inherently interpretable models for high-stake decision-making. Post-hoc explanations have recently been criticized for learning surrogate models that may not accurately reflect the actual mechanisms of the original model and for adding computational burden at prediction time. We propose two novel explainability approaches to address these limitations: pre-hoc explainability and co-hoc explainability. These approaches integrate explanations derived from an inherently interpretable white-box model into the learning stage of the black-box model without compromising accuracy. Unlike post-hoc methods, our approach does not rely on random input perturbation or only post-hoc training. We extend our pre-hoc and co-hoc frameworks to generate instance-specific explanations by incorporating the Jensen-Shannon divergence as a regularization term while capturing the local behavior of the black-box model. This extension allows our methods to provide local explanations that are faithful to the model's behavior and consistent with the explanations generated by the global explainer model. We introduce a two-phase approach, where the first phase focuses on training the models for fidelity, and the second phase generates local explanations by fine-tuning the explainer model within the neighborhood of the instance being explained. Experiments on three benchmark datasets from different domains (credit risk scoring, movie recommendations, and robotic grasper failure detection) demonstrate the advantages of our techniques in terms of global and local fidelity without compromising accuracy. Our methods avoid the pitfalls of surrogate modeling, making them more scalable, robust, and reliable compared to post-hoc techniques like LIME. Moreover, our co-hoc learning framework enhances the accuracy of white-box models, which are learned to explain the black-box predictor. The white-box model achieves significantly higher prediction accuracy after the co-hoc learning process, highlighting the potential of the co-hoc in-training approach to improve the performance of white-box models, which are essential and required in specific high-risk and regulated application tasks in healthcare and legal decision-making. Our approaches provide more faithful and consistent explanations at a lower computational cost than LIME. Our theoretically derived methods are further shown to balance accuracy and interpretability through empirically regularized learning. The proposed frameworks offer a promising direction for making machine learning models more transparent and trustworthy while maintaining high prediction accuracy.

Recommended Citation

Acun, Asuman Cagla, "In-training explainability frameworks to make black-box machine learning models more explainable." (2024). Electronic Theses and Dissertations. Paper 4375.
https://doi.org/10.18297/etd/4375

Download

Included in

Other Computer Engineering Commons

COinS

ThinkIR: The University of Louisville's Institutional Repository

Electronic Theses and Dissertations

In-training explainability frameworks to make black-box machine learning models more explainable.

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Degree Program

Committee Chair

Committee Co-Chair (if applicable)

Committee Member

Committee Member

Committee Member

Committee Member

Author's Keywords

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Related Links

Contact:

ThinkIR: The University of Louisville's Institutional Repository

Electronic Theses and Dissertations

In-training explainability frameworks to make black-box machine learning models more explainable.

Author

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Degree Program

Committee Chair

Committee Co-Chair (if applicable)

Committee Member

Committee Member

Committee Member

Committee Member

Author's Keywords

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Related Links

Contact: