ICT Research on Emotionally-Aware AI Accepted to ACL 2025 (Vienna)

Published: June 5, 2025

By Ala N. Tak, PhD Candidate, Computer Science, Viterbi School of Engineering; Research Assistant, Affective Computing Group, ICT

We are pleased to announce that our paper Mechanistic Interpretability of Emotion Inference in Large Language Models (Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani, Mina Kian, Robin Jia, Jonathan Gratch) has been accepted to The 63rd Annual Meeting of the Association for Computational Linguistics in Vienna, Austria (Jul 27 – Aug 1st, 2025). The Association for Computational Linguistics (ACL), founded in 1962, is the international scientific and professional society for people working on computational problems of natural language.

In our paper, we open the black box of LLMs and explore how they understand emotions. We show that emotional representations are functionally localized, and by intervening on cognitive appraisals, we can causally steer emotional outputs in theory-aligned ways. This could be a big step forward for safer, more emotionally-aware AI systems.

Opening the Black Box

Large language models have grown adept at recognising, describing and even emulating human emotion, yet the circuitry behind those feats has remained opaque. Our work set out to illuminate that hidden machinery. We wanted to know where, inside tens of billions of parameters, an LLM asks the age‑old questions that people ask themselves: Was this pleasant? Am I to blame? Could I have predicted it? Answering those questions convincingly could turn a black box into a system we can guide, audit, and trust in sensitive domains such as mental‑health support or legal drafting.

A Neuroscience‑Inspired Peek Inside

To trace how LLMs process emotion, we drew inspiration from cognitive neuroscience— particularly the idea of “functional localization”, where specific brain regions are linked to distinct cognitive functions. First, we trained lightweight “probes” and swept them across every layer of several LLM families. Whenever a probe fired with high confidence on an emotion label, we knew we were close to the model’s emotional representations. Next came “causal patching”: we transplanted internal activations from an example that produced, say, guilt into another that normally expressed sadness. When the second example shifted from sadness to guilt, we confirmed that the modified activations were directly responsible for shaping the model’s emotional output. Finally, we visualized attention patterns to identify which words these components focused on. Consistently, they concentrated on tokens that humans typically recognize as emotionally significant.

What We Found

The emotional processing occurs in a surprisingly narrow band of middle layers, which is unexpected given the typically distributed nature of representations in LLMs. Within that region, we found that emotion-relevant signals consistently appeared at specific points in the forward pass—such as after multi-head attention units—indicating that certain components in the network are more involved in processing emotional content. Disable them and emotion predictions collapse; nudge them and the expressed feeling pivots in a controlled, theory‑consistent direction. Most intriguingly, their internal structure aligns with classic appraisal dimensions—such as self-agency, predictability, and goal conduciveness—suggesting that LLMs may rely on mechanisms similar to those proposed in human emotional reasoning.

Steering Emotions, Safely

Because the appraisal representations behave approximately linearly, we can modify them in a controlled and predictable way. For example, increasing self-agency shifts the model’s output from sadness to guilt, while reducing perceived threat softens responses from alarm to calm reflection. These changes affect only the emotional framing, without altering the factual content or grammar. This level of control opens up practical uses—like building chatbots that respond with empathy or legal assistants that maintain a consistent, neutral tone—without needing to retrain or fine-tune the entire model.

Looking Ahead

At ACL in Vienna, I’ll present this work as part of a broader conversation on the future of emotionally aware AI. As language models take on greater roles in domains like mental health, law, and human–AI interaction, understanding how they represent and reason about emotion is no longer optional—it’s essential. By uncovering structured, manipulable emotion representations inside the model, we take a step toward more transparent, controllable, and psychologically grounded AI systems that can operate safely and responsibly in affective, high-stakes settings.

Back