Date on Master's Thesis/Doctoral Dissertation
5-2025
Document Type
Doctoral Dissertation
Degree Name
Ph. D.
Department
Electrical and Computer Engineering
Degree Program
Electrical Engineering, PhD
Committee Chair
Popa, Dan
Committee Member
McIntyre, Michael
Committee Member
Naber, John
Committee Member
Roussel, Thomas
Author's Keywords
Robotics; affective computing; healthcare; deep learning; large language models
Abstract
This dissertation explores the integration of multimodal data streams and artificial intelligence pipelines to understand human affect in neurotypical and children with Autism Spectrum Disorder (ASD). This dissertation captures human affect in the context of human-robot interaction. For this, multiple studies have been presented with both children with ASD and neurotypical adults. This dissertation makes four contributions: 1) The first study introduces autonomy during perspective-taking teaching sessions by making verbal content generation through large language models (LLMs). This system is the first of its kind for teaching perspective-taking in a semi-autonomous manner under the supervision of domain experts. Furthermore, this robotic intervention was evaluated by domain experts using NASA TLX and Godspeed surveys. This semi-autonomous system was perceived as safe and likable by the domain experts on the Godspeed survey. The second part of this study describes the physiological differences between neurotypical individuals when the robot has different modes of operation (different voice types and hand gestures). It concludes that there are not only different physiological differences but also different perceptions of the robot when it is operated with different voice types and hand gestures. 2) The second study describes forecasting the Blood Volume Pulse (BVP) signal using a time series forecast using CNN+LSTM model for children with ASD. This signal forecast is done during a candid conversation between six pairs of children with ASD. This contribution differs from the literature’s approaches since time-series forecasting hasn’t been explicitly leveraged to forecast challenging behaviors for children with ASD using physiological signals. This forecasting is important in making robotic interventions personalized and adaptable to the needs of an individual. 3) The third contribution highlights the importance of using multimodal data for affect recognition in neurotypical individuals and individuals with ASD. This use of multimodal data is demonstrated with the help of two different studies: (i) affective analysis of human-led and robot-led sessions for a social stories intervention, and (ii) multimodal sensing and machine learning to compare printed and robot-based instruction for a simulated assembly task. For both of these situations, using multimodal data outperformed using individual modalities for affect recognition. 4) The last study describes speech emotion recognition (SER) in the context of HRI. So far, vision transformers have not been used in the HRI-SER literature covering diverse demographics. This dissertation bridges this gap by classifying speech emotion from collected participant data into four primary emotions: happy, sad, angry, and neutral. This contribution highlights that vision transformer-based models outperformed previous state-of-the-art models in classifying speech emotion data from non-North-American accents, too, even though they were initially fine-tuned on datasets containing speakers with North American accents. In addition to this, it was also seen that vision transformers were able to outperform the current state-of-the-art models for the RAVDESS and TESS SER datasets.
Recommended Citation
Mishra, Ruchik, "Multimodal emotion recognition for human-robot interaction across neuro-diverse populations." (2025). Electronic Theses and Dissertations. Paper 4567.
Retrieved from https://ir.library.louisville.edu/etd/4567