Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their...Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations.展开更多
Artificial entities,such as virtual agents,have become more pervasive.Their long-term presence among humans requires the virtual agent’s ability to express appropriate emotions to elicit the necessary empathy from th...Artificial entities,such as virtual agents,have become more pervasive.Their long-term presence among humans requires the virtual agent’s ability to express appropriate emotions to elicit the necessary empathy from the users.Affective empathy involves behavioral mimicry,a synchronized co-movement between dyadic pairs.However,the characteristics of such synchrony between humans and virtual agents remain unclear in empathic interactions.Our study evaluates the participant’s behavioral synchronization when a virtual agent exhibits an emotional expression congruent with the emotional context through facial expressions,behavioral gestures,and voice.Participants viewed an emotion-eliciting video stimulus(negative or positive)with a virtual agent.The participants then conversed with the virtual agent about the video,such as how the participant felt about the content.The virtual agent expressed emotions congruent with the video or neutral emotion during the dialog.The participants’facial expressions,such as the facial expressive intensity and facial muscle movement,were measured during the dialog using a camera.The results showed the participants’significant behavioral synchronization(i.e.,cosine similarity≥.05)in both the negative and positive emotion conditions,evident in the participant’s facial mimicry with the virtual agent.Additionally,the participants’facial expressions,both movement and intensity,were significantly stronger in the emotional virtual agent than in the neutral virtual agent.In particular,we found that the facial muscle intensity of AU45(Blink)is an effective index to assess the participant’s synchronization that differs by the individual’s empathic capability(low,mid,high).Based on the results,we suggest an appraisal criterion to provide empirical conditions to validate empathic interaction based on the facial expression measures.展开更多
文摘Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations.
文摘Artificial entities,such as virtual agents,have become more pervasive.Their long-term presence among humans requires the virtual agent’s ability to express appropriate emotions to elicit the necessary empathy from the users.Affective empathy involves behavioral mimicry,a synchronized co-movement between dyadic pairs.However,the characteristics of such synchrony between humans and virtual agents remain unclear in empathic interactions.Our study evaluates the participant’s behavioral synchronization when a virtual agent exhibits an emotional expression congruent with the emotional context through facial expressions,behavioral gestures,and voice.Participants viewed an emotion-eliciting video stimulus(negative or positive)with a virtual agent.The participants then conversed with the virtual agent about the video,such as how the participant felt about the content.The virtual agent expressed emotions congruent with the video or neutral emotion during the dialog.The participants’facial expressions,such as the facial expressive intensity and facial muscle movement,were measured during the dialog using a camera.The results showed the participants’significant behavioral synchronization(i.e.,cosine similarity≥.05)in both the negative and positive emotion conditions,evident in the participant’s facial mimicry with the virtual agent.Additionally,the participants’facial expressions,both movement and intensity,were significantly stronger in the emotional virtual agent than in the neutral virtual agent.In particular,we found that the facial muscle intensity of AU45(Blink)is an effective index to assess the participant’s synchronization that differs by the individual’s empathic capability(low,mid,high).Based on the results,we suggest an appraisal criterion to provide empirical conditions to validate empathic interaction based on the facial expression measures.