Can scientists create emotionally intelligent robots?

April 28, 2021

emotionally intelligent robot, robot — © Aleksey Popov

Scientists at the Japan Advanced Institute of Science and Technology are trying to use neuroscience to create emotionally intelligent robots

The majority of human beings can understand and recognise emotion in one another. When it comes to robots, so far, they are unable to develop the kind of emotional intelligence that humans wield so easily.

This is because it is extremely difficult to even define emotion, then to distil this definition into something that is technically teachable. But now, scientists in Ishikawa, Japan, believe they have a strategy that could work – using cognitive neuroscience, they want to teach robots how to process a human’s emotional state and respond appropriately.

In a different study, robots have been floated as therapists – dealing directly with human emotion. But generally, they process logical instructions and miss out on the dimensional emotion of a human voice.

How do we define emotion for a robot?

“Continuous dimensional emotion can help a robot capture the time dynamics of a speaker’s emotional state and accordingly adjust its manner of interaction and content in real time,” said Professor Masashi Unoki from Japan Advanced Institute of Science and Technology (JAIST), who works on speech recognition and processing.

Emotions such as happiness, sadness, and anger are well-understood by us but can be hard for robots to figure out.

Researchers want to give robots listening capacity

Essentially, they want to get the robot to understand “temporal modulation cues”, which capture the dynamics of dimensional emotions. Neural networks can then be employed to extract features from these cues that reflect this time dynamics.

However, due to the complexity and variety of auditory perception models, the feature extraction part turns out to be significantly difficult.

How did the researchers figure this out?

The researchers have proposed a novel feature called multi-resolution modulation-filtered cochleagram (MMCG), which combines four modulation-filtered cochleagrams (time-frequency representations of the input sound) at different resolutions to obtain the temporal and contextual modulation cues.

To account for the diversity of the cochleagrams, researchers designed a parallel neural network architecture called “long short-term memory” (LSTM), which modelled the time variations of multi-resolution signals from the cochleagrams and carried out extensive experiments on two datasets of spontaneous speech.

Robots with emotional intelligence, soon?

Professor Unoki further commented: “Our next goal is to analyze the robustness of environmental noise sources and investigate our feature for other tasks, such as categorical emotion recognition, speech separation, and voice activity detection.”

Read the full study here.