Date on Master's Thesis/Doctoral Dissertation

5-2008

Document Type

Doctoral Dissertation

Degree Name

Ph. D.

Department

Industrial Engineering

Committee Chair

Alexander, Suraj Mammen

Author's Keywords

Patient records matching; Data cleaning; Data modeling; Machine learning; Fuzzy logic; Health care administration; Medical databases

Subject

Medical records--Data processing; Data mining; Databases--Design

Abstract

A major problem with integrating information from multiple databases is that the same data objects can exist in inconsistent data formats across databases and a variety of attribute variations, making it difficult to identify matching objects using exact string matching. In this research, a variety of models and methods have been developed and tested to alleviate this problem. A major motivation for this research is that the lack of efficient tools for patient record matching still exists for health care providers. This research is focused on the approximate matching of patient records with third party payer databases. This is a major need for all medical treatment facilities and hospitals that try to match patient treatment records with records of insurance companies, Medicare, Medicaid and the veteran's administration. Therefore, the main objectives of this research effort are to provide an approximate matching framework that can draw upon multiple input service databases, construct an identity, and match to third party payers with the highest possible accuracy in object identification and minimal user interactions. This research describes the object identification system framework that has been developed from a hybridization of several technologies, which compares the object's shared attributes in order to identify matching object. Methodologies and techniques from other fields, such as information retrieval, text correction, and data mining, are integrated to develop a framework to address the patient record matching problem. This research defines the quality of a match in multiple databases by using quality metrics, such as Precision, Recall, and F-measure etc, which are commonly used in Information Retrieval. The performance of resulting decision models are evaluated through extensive experiments and found to perform very well. The matching quality performance metrics, such as precision, recall, F-measure, and accuracy, are over 99%, ROC index are over 99.50% and mismatching rates are less than 0.18% for each model generated based on different data sets. This research also includes a discussion of the problems in patient records matching; an overview of relevant literature for the record matching problem and extensive experimental evaluation of the methodologies, such as string similarity functions and machine learning that are utilized. Finally, potential improvements and extensions to this work are also presented.

Recommended Citation

Wang, Xiaoyi 1962-, "Matching records in multiple databases using a hybridization of several technologies." (2008). Electronic Theses and Dissertations. Paper 1511.
https://doi.org/10.18297/etd/1511

Download

COinS

Electronic Theses and Dissertations

Matching records in multiple databases using a hybridization of several technologies.

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Committee Chair

Author's Keywords

Subject

Abstract

Recommended Citation

Search

Browse

Author Corner

Related Links

Contact:

Electronic Theses and Dissertations

Matching records in multiple databases using a hybridization of several technologies.

Author

Date on Master's Thesis/Doctoral Dissertation

Document Type

Degree Name

Department

Committee Chair

Author's Keywords

Subject

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

Related Links

Contact: