By Ala N. Tak, PhD Candidate, Computer Science, Viterbi School of Engineering; Research Assistant, Affective Computing Group, ICT
My recent paper, “Impact of LLM Alignment on Impression Formation in Social Interactions,” has been accepted to the 2025 Conference on Language Modeling (COLM). This work, conducted with collaborators Anahita Bolourani, Daniel Shank, and Jonathan Gratch, explores how large language models form impressions of people during social scenarios—and how alignment processes, designed to improve these models, affect that capacity.
Impression formation is a core function of social cognition. In human interactions, we continually make rapid evaluations of others based not only on who they are, but also on what they do and in what context. Affect Control Theory (ACT), a longstanding model in sociology, formalizes this process by evaluating how much an interaction conforms to or deviates from culturally shared expectations. These expectations are typically framed in terms of identities, behaviors, and recipients—e.g., how society might evaluate a “professor praising a student” versus “a stranger confronting a nurse.”
ACT gives us a mathematical and empirically grounded way to measure social plausibility. It treats impression formation as a structured, context-sensitive process—one that accounts for the interplay between social roles, emotional meaning, and cultural norms.
Our question was: do large language models, especially those tuned to follow human preferences, exhibit a similar form of structured reasoning?
To test this, we created a large benchmark dataset composed of nearly 26,000 synthetic social events. These events varied systematically across emotional dimensions of valence, arousal, and dominance. We compared how several language models—including base and fine-tuned versions of LLaMA-3, DeepSeek, and GPT-4—formed impressions of these events, using ACT predictions as a reference point.
What we found is that most models form impressions in a markedly different way than humans do. Whereas ACT assigns substantial importance to context—such as the behavior taking place and the roles involved—most models base their impressions primarily on the actor’s identity alone. This results in a kind of rigidity, where certain types of actors are assigned default social meanings regardless of what they do or who they are interacting with.
Even more telling, we observed that post-training alignment procedures (such as supervised fine-tuning and reinforcement learning from human feedback) had inconsistent and sometimes surprising effects on model behavior. Some models became more extreme in their preferences; others diverged further from ACT’s predictions. In nearly every case, alignment introduced shifts in impression formation that were difficult to predict and rarely grounded in context-sensitive reasoning.
This matters. As LLMs are increasingly deployed in settings where social perception is implicit—education, healthcare, hiring, and interpersonal support—it becomes essential to understand how they are interpreting the roles and actions of the people they encounter. If they rely on overly simplified heuristics, they risk misrepresenting complex human interactions and reinforcing narrow interpretations of social meaning.
This study introduces a structured benchmark for evaluating impression formation in LLMs, and we offer it to the research community as a tool for further probing the boundaries of machine social reasoning. It also makes a broader point: without principled, theory-driven frameworks like ACT, it becomes difficult to assess when and how language models are deviating from the kind of nuanced understanding that real social interaction demands.
There is more to uncover—across other domains of identity, context, and interaction—but this is a step toward greater clarity in how these models interpret the social world. And more importantly, how we might improve that understanding in ways that reflect the complexity of human life.
//