TY - GEN
T1 - Learning from Human Conversations: A Seq2Seq based Multi-modal Robot Facial Expression Reaction Framework in HRI
AU - Shangguan, Zhegong
AU - Hei, Xiaoxuan
AU - Li, Fangjun
AU - Yu, Chuang
AU - Song, Siyang
AU - Zhao, Jianzhuang
AU - Cangelosi, Angelo
AU - Tapus, Adriana
PY - 2025/6/16
Y1 - 2025/6/16
N2 -
Nonverbal communication plays a crucial role in both human-human and human-robot interactions (HRIs), where facial expressions convey emotions, intentions and trust. Enabling humanoid robots to generate human-like facial reactions in response to human speech and facial behaviours remains significant challenges. In this work, we leverage human-human interaction (HHI) datasets to train a humanoid robot, allowing it to learn and imitate facial reactions to both speech and facial expression inputs. Specifically, we extend a sequence-to-sequence (Seq2Seq)-based framework that enables robots to simulate human-like virtual facial expressions that are appropriate for responding to the perceived human user behaviours. Then, we propose a deep neural network-based motor mapping model to translate these expressions into physical robot movements. Experiments demonstrate that our facial reaction–motor mapping framework successfully enables robotic self-reactions to various human behaviours, where our model can best predict 50 frames (two seconds) of facial reactions in response to the input user behaviour of the same duration, aligning with human cognitive and neuromuscular processes.
AB -
Nonverbal communication plays a crucial role in both human-human and human-robot interactions (HRIs), where facial expressions convey emotions, intentions and trust. Enabling humanoid robots to generate human-like facial reactions in response to human speech and facial behaviours remains significant challenges. In this work, we leverage human-human interaction (HHI) datasets to train a humanoid robot, allowing it to learn and imitate facial reactions to both speech and facial expression inputs. Specifically, we extend a sequence-to-sequence (Seq2Seq)-based framework that enables robots to simulate human-like virtual facial expressions that are appropriate for responding to the perceived human user behaviours. Then, we propose a deep neural network-based motor mapping model to translate these expressions into physical robot movements. Experiments demonstrate that our facial reaction–motor mapping framework successfully enables robotic self-reactions to various human behaviours, where our model can best predict 50 frames (two seconds) of facial reactions in response to the input user behaviour of the same duration, aligning with human cognitive and neuromuscular processes.
M3 - Conference contribution
BT - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
PB - IEEE
ER -