Date on Master's Thesis/Doctoral Dissertation

12-2025

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Computer Engineering and Computer Science

Degree Program

Computer Science and Engineering, PhD

Committee Chair

Elmaghraby, Adel

Committee Member

Sierra-Sosa, Daniel

Committee Member

Lauf, Adrian

Committee Member

Imam, Ibrahim

Committee Member

Losavio, Michael

Author's Keywords

A.I., heterogeneous; crime data

Abstract

This dissertation examines the problem of fragmented data in the context of crime and cybercrime analysis. While both domains generate substantial public data, these sources vary widely in format, structure, and coverage, making it difficult to perform consistent inference or develop reliable predictive models. To address these limitations, this work proposes a layered integration framework grounded in ontology-driven modeling and data fusion. The approach is applied to two distinct but related case studies, each focused on constructing a coherent representation of criminal activity from heterogeneous sources. The first case study explores the integration of drug and human trafficking data from law enforcement agencies in Kentucky. Records from the FBI’s National Incident-Based Reporting System, Louisville Metro Police, and the state’s corrections department are merged through a staged process involving text normalization, entity resolution, and semantic de-duplication. The resulting dataset supports a spatio-temporal classification task, identifying periods and regions where human trafficking is likely to occur given preceding drug activity. The integrated model achieves strong predictive performance, with a Matthews correlation coefficient of 0.86. The second case study turns to the cybercrime domain, focusing on the relationship between known software vulnerabilities (CVEs) and adversarial techniques (TTPs) defined by the MITRE ATT&CK framework. An ontology is constructed by aligning NVD, CISA KEV, and ATT&CK data, enriched with synonym mappings and structural metadata. Using this graph, features are derived to support a supervised classification model that predicts ATT&CK sub-techniques from CVE descriptions. The model demonstrates improved accuracy and interpretability, especially for high-profile threats such as Log4Shell. Together, these studies show how semantic integration and graph-based reasoning can mitigate the effects of data fragmentation in criminal domains. The dissertation contributes a practical methodology for constructing structured, interoperable representations of crime data and illustrates how such representations can support both descriptive and predictive tasks.

Share

COinS