Date on Master's Thesis/Doctoral Dissertation

5-2007

Document Type

Master's Thesis

Degree Name

M. Eng.

Department

Computer Engineering and Computer Science

Committee Chair

Badia, Antonio Emilio

Subject

World Wide Web; Information organization

Abstract

This thesis proposes a new method of automatic taxonomy generation using the link structure of Webpages. Taxonomy is a hierarchy of concepts where each child concept is said to be encompassed by its parent concept. Techniques have previously been developed to extract taxonomies from a traditional text corpus, but this thesis relies exclusively on the links between documents in the corpus, as opposed to the text of the corpus itself. A series of algorithms were designed and implemented to realize the objectives of this thesis. These programs perform comparably to other techniques using the text in the documents and have shown that there is information available in the link structure of Webpages when creating concept taxonomies.

Share

COinS