Date on Master's Thesis/Doctoral Dissertation
5-2024
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Computer Engineering and Computer Science
Degree Program
Computer Science and Engineering, PhD
Committee Chair
Chang, Dar-jen
Committee Co-Chair (if applicable)
Kantardzic, Mehmed M.
Committee Member
Kantardzic, Mehmed M.
Committee Member
Imam, Ibrahim N.
Committee Member
Elmaghraby, Adel S.
Committee Member
Park, Juw Won
Author's Keywords
document inversion; inverted document; GPGPU; research computing; computing cluster
Abstract
Bioinformatics is a domain that has experienced rapid research growth in recent years, as evidenced by the increasing number of articles in biomedical databases such as PubMed, which adds over a million publications every year. However, this also poses a challenge for researchers who need to find relevant citations for their work. Therefore, developing efficient indexing and searching methods for text data is crucial for Bioinformatics. One key technique for information retrieval is document inversion, which involves creating an inverted index to enable efficient searching through vast collections of text or documents. This Ph.D. research aims to design the research computing environment and implement a document inversion system on the multi-core Graphics Processing Unit (GPU) as a multithreaded application using a linear-time, hash-based, single program multiple data algorithm. The GPU is a powerful tool for general-purpose computing, especially for parallel and data-intensive applications. However, the GPU architecture differs from the Central Processing Unit (CPU) architecture, which creates two main challenges for GPU computing. The first challenge is to design the thread blocks and distribute the data among them. The second challenge is efficiently using the GPU memory by each type, such as global memory, constant memory, and shared memory, to achieve high-performance solutions. The dissertation research evaluates the performance of the system with two test datasets from PubMed abstracts and e-commerce product reviews. It shows that the multithreaded application on the GPU can perform document inversion around two to three times faster than the sequential one on the CPU. The research computing environments for this work include the Computer Science and Engineering Research Network and the Genomics cluster, which is a high-performance computing cluster with CPU/GPU computing nodes, large-size storage devices, and virtual environment systems. The cluster was initially designed for the Bioinformatics researchers and research groups in the Department of Computer Science and Engineering. The dissertation contributes to information retrieval by proposing a novel and efficient document inversion system on the GPU for extensive document collections and to Bioinformatics researchers by providing a flexible and efficient research computing environment design with massive computing power and enough space.
Recommended Citation
Jung, Sungbo, "Multithreaded applications on the heterogeneous research computing environment." (2024). Electronic Theses and Dissertations. Paper 4374.
https://doi.org/10.18297/etd/4374
Included in
Bioinformatics Commons, Computer and Systems Architecture Commons, Databases and Information Systems Commons, Systems Architecture Commons