Date on Master's Thesis/Doctoral Dissertation
5-2007
Document Type
Master's Thesis
Degree Name
M. Eng.
Department
Computer Engineering and Computer Science
Committee Chair
Badia, Antonio Emilio
Subject
World Wide Web; Information organization
Abstract
This thesis proposes a new method of automatic taxonomy generation using the link structure of Webpages. Taxonomy is a hierarchy of concepts where each child concept is said to be encompassed by its parent concept. Techniques have previously been developed to extract taxonomies from a traditional text corpus, but this thesis relies exclusively on the links between documents in the corpus, as opposed to the text of the corpus itself. A series of algorithms were designed and implemented to realize the objectives of this thesis. These programs perform comparably to other techniques using the text in the documents and have shown that there is information available in the link structure of Webpages when creating concept taxonomies.
Recommended Citation
Elliott, Joseph Paul, "Automatic pure anchor-based taxonomy generation from the world wide web." (2007). Electronic Theses and Dissertations. Paper 399.
https://doi.org/10.18297/etd/399