Learning from Human Conversations: A Seq2Seq based Multi-modal Robot Facial Expression Reaction Framework in HRI

Zhegong Shangguan, Xiaoxuan Hei, Fangjun Li, Chuang Yu, Siyang Song*, Jianzhuang Zhao, Angelo Cangelosi, Adriana Tapus

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract


Nonverbal communication plays a crucial role in both human-human and human-robot interactions (HRIs), where facial expressions convey emotions, intentions and trust. Enabling humanoid robots to generate human-like facial reactions in response to human speech and facial behaviours remains significant challenges. In this work, we leverage human-human interaction (HHI) datasets to train a humanoid robot, allowing it to learn and imitate facial reactions to both speech and facial expression inputs. Specifically, we extend a sequence-to-sequence (Seq2Seq)-based framework that enables robots to simulate human-like virtual facial expressions that are appropriate for responding to the perceived human user behaviours. Then, we propose a deep neural network-based motor mapping model to translate these expressions into physical robot movements. Experiments demonstrate that our facial reaction–motor mapping framework successfully enables robotic self-reactions to various human behaviours, where our model can best predict 50 frames (two seconds) of facial reactions in response to the input user behaviour of the same duration, aligning with human cognitive and neuromuscular processes.
Original languageEnglish
Title of host publication2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
PublisherIEEE
Publication statusAccepted/In press - 16 Jun 2025

Fingerprint

Dive into the research topics of 'Learning from Human Conversations: A Seq2Seq based Multi-modal Robot Facial Expression Reaction Framework in HRI'. Together they form a unique fingerprint.

Cite this