Date on Master's Thesis/Doctoral Dissertation


Document Type

Doctoral Dissertation

Degree Name

Ph. D.


Industrial Engineering

Degree Program

Industrial Engineering, PhD

Committee Chair

Saleem, Jason

Committee Member

Gentili, Monica

Committee Member

DePuy, Gail

Committee Member

Nasraoui, Olfa

Author's Keywords

Usability evaluation; log file analysis; task-base segmentation; log file simulation; usability metrics


Usability evaluation is one of the essential aspects of software production. This evaluation should be done during the entire life cycle of a software application, from pre-production to production and post-production. However, the collection and evaluation of usability data can be a very challenging, time-consuming, and expensive task to be conducted manually, particularly for certain types of products and working conditions. These challenges may include the need to recruit participants fully engage and motivate them during evaluation, and factor in environmental conditions. Other challenges may include collecting data in real-world environments, especially when the users are geographically dispersed, minimizing evaluator and participant bias, and analyzing complex data sets, particularly when the volume of data is large. This research explores an alternative approach to automate the whole process of usability evaluation by utilizing data mining and machine learning techniques on data recorded in log files. The objective of this dissertation is to extract the value of usability-related metrics from user interactions and estimate usability measures in a quantitative manner. The use of log files for usability evaluation offers several advantages over traditional methods of evaluation, such as collecting data objectively without the need for subjective interpretation of the evaluator in some cases, creating a comprehensive view of user interactions with the possibility of identifying user behavioral patterns and trends that appear in large data sets, and collecting data from real-world scenarios instead of data from simulated scenarios. Other advantages include reducing evaluation costs, enabling remote data collection from anywhere in the world, which leads to the identification of location-dependent usability problems, and facilitating continuous monitoring over time, which leads to the identification of time-dependent usability issues. In this dissertation, Chapter II provides a comprehensive categorization, comparison, and summary of the pertinent usability evaluation techniques that utilize log files as input data in both academic and industrial research. Each method is examined carefully, and its respective strengths and weaknesses are highlighted to provide a systematic understanding of the advantages and limitations of each technique. Chapter III assesses the originality of the research questions and proposed solutions. Given the ongoing development and refinement of log file analysis tools and techniques by proficient teams in advanced corporations, and the substantial research already conducted in web usage mining, this dissertation compares its solution with prior works in these two domains. The similarities and disparities between them are evaluated to determine the uniqueness and value of the approach advocated in this research. In Chapter IV, the complexities of data collection are explored, highlighting its prominence as one of the foremost challenges when dealing with log files. This chapter examines the generation of simulated data, as well as the collection of real data. In order to generate synthetic log files that closely resemble real log files and reflect the same challenges, a Bayesian networks model is proposed. This model includes nodes at the highest level that can be numerically measured, such as the number of task actions, entries, words per item, percentage of items with missing information, percentage of legible items, and other relevant variables. By assigning values to these variables at the highest level, log files can be generated based on measurable and understandable factors. Moreover, the values can be adjusted in each iteration to produce a new simulated log file of any desired size. Chapter V introduces a general framework for estimating usability metrics and attributes through log file analysis. The systematic steps of the presented framework are followed to introduce the essential models for log file analysis and knowledge extraction. Within the knowledge extraction phase, a two-stage clustering approach that leverages similarity distance and Hidden Markov Models (HMM) is employed to identify page view sequences related to each task. In the subsequent knowledge analysis stage, the required data for calculating usability metrics is extracted and computed. Finally, the outcomes of the experiments are shared, solidifying the model's effectiveness. A summary of overall conclusion from this dissertation is presented at the end of Chapter V, demonstrating the unique contributions of this work.