Date on Master's Thesis/Doctoral Dissertation


Document Type

Master's Thesis

Degree Name



Computer Engineering and Computer Science

Degree Program

Computer Science, MS

Committee Chair

Frigui, Hichem

Committee Co-Chair (if applicable)

Sunkara, Mahendra

Committee Member

Sunkara, Mahendra

Committee Member

Nasraoui, Olfa

Author's Keywords

machine learning; materials science; bandgap; chalcopyrites; defect-induced magnetism; materials genome initiative


The high pace of nowadays industrial evolution is creating an urgent need to design new cost efficient materials that can satisfy both current and future demands. However, with the increase of structural and functional complexity of materials, the ability to rationally design new materials with a precise set of properties has become increasingly challenging. This basic observation has triggered the idea of applying machine learning techniques in the field, which was further encouraged by the launch of the Materials Genome Initiative (MGI) by the US government since 2011. In this work, we present a novel approach to apply machine learning techniques for materials science applications. Guided by knowledge from domain experts, our approach focuses on machine learning to accelerate data-driven discovery of materials properties. Our objectives are two folds: (i) Identify the optimal set of features that best describes a given predicted variable. (ii) Boost prediction accuracy via applying various regression algorithms. Ordinary Least Square, Partial Least Square and Lasso regressions, combined with well adjusted feature selection techniques are applied and tested to predict key properties of semiconductors for two types of applications. First, we propose to build a more robust prediction model for band-gap energy (BG-E) of chalcopyrites, commonly used for solar cells industry. Compared to the results reported in [1-3] , our approach shows that learning and using only a subset of relevant features can improve the prediction accuracy by about 40%. For the second application, we propose to determine the underlying factors responsible for Defect-Induced Magnetism (DIM) in Dilute Magnetic Semiconductors (DMS) through the analysis of a set of 30 features for different DMS systems. We show that 8 of these features are more likely to contribute to this property. Using only these features to predict the total magnetic moment of new candidate DMSs has reduced the mean square error by about 90% compared to the models trained using the whole set of features. Given the scarcity of the available data sets for similar applications, this work aims not only to build robust models but also to establish a collaborative platform for future research.