Date on Master's Thesis/Doctoral Dissertation

8-2021

Document Type

Master's Thesis

Degree Name

M.S.

Department

Computer Engineering and Computer Science

Degree Program

Computer Science, MS

Committee Chair

Kantardzic, Mehmed

Committee Co-Chair (if applicable)

Desoky, Ahmed

Committee Member

Desoky, Ahmed

Committee Member

Frieboes, Hermann

Author's Keywords

Nearest neighbor; network adequacy; trilateration; geospatial; database; indexing

Abstract

We present an alternative method for pre-processing and storing point data, particularly for Geospatial points, by storing multilateration distances to fixed points rather than coordinates such as Latitude and Longitude. We explore the use of this data to improve query performance for some distance related queries such as nearest neighbor and query-within-radius (i.e. “find all points in a set P within distance d of query point q”). Further, we discuss the problem of “Network Adequacy” common to medical and communications businesses, to analyze questions such as “are at least 90% of patients living within 50 miles of a covered emergency room.” This is in fact the class of question that led to the creation of our pre-processing and algorithms, and is a generalization of a class of Nearest-Neighbor problems. We hypothesize that storing the distances from fixed points (typically three, as in trilateration) as an alternative to Latitude and Longitude can be used to improve performance on distance functions when large numbers of points are involved, allowing algorithms that are efficient for Nearest Neighbor and Network Adequacy queries. This effectively creates a coordinate system where the coordinates are the trilateration distances. We explore this alternative coordinate system and the theoretical, technical, and practical implications of using it. Multilateration itself is a common technique in surveying and geo-location widely used in cartography, surveying, and orienteering, although algorithmic use of these concepts for NN-style problems are scarce. GPS uses the concept of detecting the distance of a device to multiple satellites to determine the location of the device; a concept known as true-range multilateration. However while the approach is common, the distance values from multilateration are typically immediately converted to Latitude/Longitude and then discarded. Here we attempt to use those intermediate distance values to computational benefit. Conceptually, our multilateration construction is applicable to metric spaces in any number of dimensions. Rather than requiring the complex pre-calculated tree structures (as in Ball and KD-Trees)(Liu, Moore, and Gray 2006), or high cost pre-calculated nearest-neighbor graphs (as in FAISS)(Johnson, Douze, and Jégou 2017), we rely only on sorted arrays as indexes. This approach also allows for processing computationally intensive distance queries (such as nearest-neighbor) in a way that is easily implemented with data manipulation languages such as SQL. We experiment with simple algorithms using the multilateration index to exploit these features. Weset up experiments for Nearest Neighbor and Network Adequacy on high computational cost distance functions, on various sized data sets to compare our performance to other existing algorithms. Our results include a roughly 10x performance improvement to existing query logic using SQL engines, and a 30x performance gain in Cython - compared to other NN algorithms using the popular ann-benchmark tool - when the cost of the atomic distance calculation itself is high, such as with geodesic distances on earth requiring high precision. While we focus primarily on geospatial data, potential applications to this approach extend to any distance-measured n-dimensional metric space where the distance function itself has a high computational cost.

Share

COinS