Date on Master's Thesis/Doctoral Dissertation
5-2024
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Computer Engineering and Computer Science
Degree Program
Computer Science and Engineering, PhD
Committee Chair
Badia, Antonio
Committee Co-Chair (if applicable)
Khalefa, Mohamed
Committee Member
Khalefa, Mohamed
Committee Member
Altiparmak, Nihat
Committee Member
Frigui, Hichem
Author's Keywords
NoSQL; document stores; database schema; JSON
Abstract
Query optimization in document stores has traditionally relied on rule-based approaches, but recent research advocates for a shift towards cost-based optimization. However, this transition is hindered by the fragmented nature of existing approaches, stemming from the early development stage of cost-based query optimization for document databases. A key challenge lies in the absence of a standardized query language and semantics, exacerbated by the diverse and schema-less nature of JSON document collections. To tackle these challenges, the literature has proposed dynamic schemas, primarily utilized at parsing time. However, these schemas lack a formal foundation that describes meaningful semantics for query optimization. This thesis proposes a novel framework based on a relational-like plan, employing an algebra to internally represent queries. By manipulating algebra expressions, multiple plans are generated and subsequently evaluated for cost. Specifically tailored to JSON data, the thesis introduces a document algebra designed to accommodate JSON characteristics. Additionally, it formalizes a dynamic schema concept termed Data Pilot, inspired by XML DataGuides. An algebra over Data Pilots is presented, facilitating cardinality estimation without executing operations, aiding in query optimization. Furthermore, the thesis proposes a strategy to determine when query rewriting using Document Algebra properties may be advantageous. Experimental validation demonstrates the feasibility of the proposed framework and showcases the construction of Data Pilot structures. Through this research, a step towards standardized, cost-based query optimization in document stores is taken, paving the way for more efficient and scalable query processing in the future.
Recommended Citation
Llano-Rios, Tomas Felipe, "Using dynamic schemas for query optimization over JSON data." (2024). Electronic Theses and Dissertations. Paper 4308.
https://doi.org/10.18297/etd/4308