Date on Master's Thesis/Doctoral Dissertation
12-2025
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Computer Engineering and Computer Science
Degree Program
Computer Science and Engineering, PhD
Committee Chair
Elmaghraby, Adel
Committee Member
Sierra-Sosa, Daniel
Committee Member
Lauf, Adrian
Committee Member
Imam, Ibrahim
Committee Member
Losavio, Michael
Author's Keywords
A.I., heterogeneous; crime data
Abstract
This dissertation examines the problem of fragmented data in the context of crime and cybercrime analysis. While both domains generate substantial public data, these sources vary widely in format, structure, and coverage, making it difficult to perform consistent inference or develop reliable predictive models. To address these limitations, this work proposes a layered integration framework grounded in ontology-driven modeling and data fusion. The approach is applied to two distinct but related case studies, each focused on constructing a coherent representation of criminal activity from heterogeneous sources. The first case study explores the integration of drug and human trafficking data from law enforcement agencies in Kentucky. Records from the FBI’s National Incident-Based Reporting System, Louisville Metro Police, and the state’s corrections department are merged through a staged process involving text normalization, entity resolution, and semantic de-duplication. The resulting dataset supports a spatio-temporal classification task, identifying periods and regions where human trafficking is likely to occur given preceding drug activity. The integrated model achieves strong predictive performance, with a Matthews correlation coefficient of 0.86. The second case study turns to the cybercrime domain, focusing on the relationship between known software vulnerabilities (CVEs) and adversarial techniques (TTPs) defined by the MITRE ATT&CK framework. An ontology is constructed by aligning NVD, CISA KEV, and ATT&CK data, enriched with synonym mappings and structural metadata. Using this graph, features are derived to support a supervised classification model that predicts ATT&CK sub-techniques from CVE descriptions. The model demonstrates improved accuracy and interpretability, especially for high-profile threats such as Log4Shell. Together, these studies show how semantic integration and graph-based reasoning can mitigate the effects of data fragmentation in criminal domains. The dissertation contributes a practical methodology for constructing structured, interoperable representations of crime data and illustrates how such representations can support both descriptive and predictive tasks.
Recommended Citation
Ahmed, Sadaf, "Explainable A.I. for analysis and interpretation of heterogeneous crime data." (2025). Electronic Theses and Dissertations. Paper 4698.
Retrieved from https://ir.library.louisville.edu/etd/4698