Embodied Companion AI Agents & Society

Published: July 1, 2025

Category: Essays | News

Spencer Lin

Spencer Lin is an AI & XR researcher/developer at ICT, with a passion for building embodied socially interactive AI agents. He holds both Master’s and Bachelor’s degrees in Computer Science and a minor in Immersive Media from USC. In this essay Lin discusses his work in embodied companion AI agents, research at ICT, and the broader ramifications of AI agents in society.

BYLINE: Spencer Lin, AI & XR Researcher/Developer, ICT

Prelude

As technology brings us closer than ever, it can ironically leave us feeling more isolated. This narrative is not novel, but it’s one in which we have slowly but surely acquiesced to as we trade face-to-face time for FaceTime, and now some may even say it’s improper to call without texting first. This isn’t so much a critique of what has come to be so much as it is an acknowledgment.

Of course, for an issue as behemoth and abstract as this, I can only humbly speak as an AI & XR researcher and from empiricism as a fellow member of a society. It is undeniable that the internet, and now AI, has brought knowledge within reach and connected us like never before. Though, many would argue the rapid atrophying of “genuine human connection” to be a complete travesty. And who could blame them as we plunge into a reality of shortening attention spans, fast trends, and the latest boogeyman: “replacement AI friends”? It is clear there exists no one-size-fit-all solution as simply pulling the plug or societal-level mandates; economics and social reality compels it so. Perhaps the next step lies in AI agents that cater to the individual’s needs.

Developing embodied socially interactive agents, or what you might call companion AI or chatbots in the broadest sense, was my first interest in life. It captures the imagination, to create life from effectively nothing more than our thoughts translated to code. It’s what inspired me to pursue computer science as a study and it’s this zealous fascination which brought me to the gates of ICT, where much cutting edge work combining agents and XR technologies is happening. If my time conducting research at ICT and USC has taught me anything, there is no such thing as a silver bullet. But there are better approaches.

Corporate Private Eyes

Humans have somehow managed to commoditize nearly every problem on Maslow’s Hierarchy of Needs except the top layers. It comes as no surprise that self-actualization, self-esteem, and love & belonging are the very issues we are grappling with today despite the wonders of modern technology. But is it even possible to make a product that solves those issues, will agents be that product, and more importantly, should we even attempt to solve these issues through products?

It seems obvious that agents would have a tremendous impact helping with issues like loneliness, therapeutics, education, and more. In an ideal world, having an intelligent and selfless companion by your side, always available, patient, confidential, and non-judgemental is priceless. But if the use cases are so compelling, why haven’t we seen the widespread adoption of AI agents like how sci-fi prophesied? Like all big questions, there is no monolithic answer. But I remember receiving a critical piece of insight once from Scott Fisher, Director of Mobile & Environmental Media Lab, one of the earliest pioneers in XR and an important mentor to me. Scott spoke to me of his own experience using digital assistants: “It just doesn’t feel like Siri is working for me, rather it’s working for some big company.” What always cracks me up is he said that so nonchalantly I’m not sure if he meant that as a veiled hint or if he was just plain jaded. It was probably a bit of both. But I like to think it was a nugget of wisdom delivered in his classic vexing style so I can preserve my headcanon of him as a mystic sensei heh.

Scott’s clue was big. If a companion agent is meant to grow with you long term, it’s imperative that we trust it. However, the majority of current agent services are subscription-based cloud services. This alone is much food for thought. The essence of a companion agent is interaction. Do we really want a rent-a-buddy that can disappear the moment we stop paying the parent company or even worse, the company ceases operations? How exactly are we supposed to continually interact with our agent if the service is rife with usage quotas despite already paying a hefty monthly price tag? Then, the biggest value of any companion is through the deep understanding it forms of you over the long term. This highlights the biggest question: who actually owns your data? The biggest revenue driver for these services is to sell your data and with it, the contents of your conversations. Naturally for many, the answer to “if we want a corporate private eye in our pocket or even in our glasses”, is no.

Estuary

I decided I would like to give my own spin on the problem. At the time, I alongside my team of ragtag classmates had just pivoted from developing digital assistants for AR heads-up-displays for NASA astronauts on martian/lunar exploration missions to developing Estuary, a platform which makes it easy for anyone to build embodied companion AI agents in AR that see and interact with you in your world. Building Estuary into a privacy-first platform quickly became my top priority out of a dogged belief in AI that people can truly call their own.

Fortunately, my team had originally engineered our system to work on other planets where, to no one’s surprise, there is no internet. This meant we already had infrastructure to run agents completely on your own device. While other platforms choose to continue to treat agents as wrappers for cloud APIs and SDKs, we began treating them as bespoke digital entities that adapt themselves to the individual without centralization. By the end of two semesters, we had developed a pretty compelling proof of concept with an AR cartoon skeleton character that can navigate in the physical world, semantically understand its surroundings, and most importantly, run on your own device. We called him Marvin. Admittedly, he wasn’t the brightest thing in the world, but he could hold a conversation (when he worked) and had a quirky voice!

SpencerLin 3 copy

Marvin on the Apple Vision Pro

Software engineers jokingly liken our craft to the tale of Sisyphus endlessly rolling his boulder up a hill only for it to roll back down. We build systems only to endlessly patch, re-engineer, and support them. This is especially true for AI agents, but cranked to 11 given the amount of moving parts and how fast each part goes into obsolescence. Marvin brought with him much novelty and technical sophistication, but it was clear he still was not at the level we had envisioned. Between the myriad of things we could do to improve him, finding the most meaningful next step was difficult when practically any day another new piece of AI technology can sprout up and render all of our progress effectively obsolete.

At this point, I had run into the engineer’s equivalent of a writer’s block. That is until I met my next important mentor, Sharon Mozgai, Director of Virtual Human Therapeutics Lab and a veteran in the field of agents. Little did I know that fateful Taco Tuesday would kick off a year and counting of close collaboration. I lugged home a stack of her textbooks that day, which got me up to speed on socially interactive agent research in the past three decades. Needless to say, I gained a wealth of crucial knowledge towards how best to continue improving Estuary. From there, I made my bones in research with Sharon. I thought it was amazing enough that I had managed to lead our team in publishing my first first-author Estuary paper with her help. But sometimes, the deepest form of flattery is being offered to do more work.

Sharon strongly recommended that I publish a case study on Estuary for CHI 2025, the flagship conference for the broader Human Computer Interaction field. Admittedly, I had no idea what a herculean task that would be when I accepted the challenge, but I suppose Sharon saw something more in my caffeine-fueled team, field of dreams, and just plain dumb determination.

Getting the case study from conception to acceptance and presentation was nothing short of sprinting through a marathon. On top of that, I had to simultaneously travel to the UK to present our previous paper at IVA 2024 which not only took time, it also meant my case study participant pool (socially interactive agent researchers at ICT) were all unavailable since most of them were traveling too. Having started the case study with an ambitious timeline and being so new to the process, our team was truly down to the wire. We crunched our data and wrote our paper like our lives depended on it all the way until the last minute. What’s even more amazing is Sharon stayed up with us until the submission deadline at 5 am, supporting us in any way she could. And I should also mention she went to work the next day at 8 am as usual which I have no clue how she did. By the time I submitted, I was running on nothing more than fumes, and I’m pretty sure those fumes were just pure adrenaline.

Fortunately, this story has a happy ending. We were accepted to CHI and we open-sourced Estuary for all researchers to use. It’s useful especially for private, replicable, and/or off-cloud experiments involving agents. One of the most important pieces of wisdom I received from Sharon that I still think about today is “what does IP mean these days?” With tech moving so fast, the best thing to do is simply open-source. Otherwise, risk your hard work becoming redundant or obsolete having had minimal impact, truly becoming Sisyphean. Our case study and the knowledge and experience we gathered across our interviews with ICT’s venerable researchers have proven to be invaluable towards guiding Estuary into the useful platform it is today.

SpencerLin 1 copy

Presenting our case study paper at CHI 2025

Concluding Remarks

Estuary has without a doubt been the highlight of my college career, and CHI, its capstone. Though my team and I have done much to make it into a powerful research tool, much more remains to be done towards improving its capabilities and its accessibility to a wider audience beyond computer-savvy researchers. Soon, we shall make it so that we can all own our data, our device, and ultimately, our AI.

We are truly living in a wild west when it comes to AI and agents. Each day is filled with exciting innovations and with it, uncertainty. With more people beginning to use AI for companionship and even therapeutics, it is tempting to subscribe to doomerist theories of AI agents deteriorating human connection further. Of course, I am inclined to take the more optimistic view that such a future won’t happen both out of my passion in building the technology and genuine belief that the fundamental purpose of AI agents should not be to replace humans in any capacity.

I suppose an intriguing thought experiment to end off with would be this. ChatGPT and other foundation models have already passed the Turing test in more than several instances. Strictly online friendships and even relationships are already being observed at all levels in our society. What’s to say an online friend we have isn’t an AI? And even if we later discover that friend to be an AI, should we retroactively invalidate the joys we shared through hours of texting, calling, gaming, and more? Alas, this all leads back to the biggest question we started with. What even is “genuine human connection”? Sometimes, the shortest and laziest answer “it depends” is just simply true. Connection understandably exists on a gradient and it’s what we make of it as an individual. With AI agents that genuinely belong to the individual, we’re afforded an opportunity to define what connection is for ourselves.

Acknowledgements

I am indebted to my parents for living, but to my teachers for living well. In addition to Sharon and Scott, I am thankful to have had the pleasure of working with and learning from the many wonderful researchers here at ICT. Amongst them, I have David Nelson, Director of ICT’s Mixed Reality (MxR) Lab to thank for his generosity and astute vision in allowing me to join MxR and ICT, as well as Allison Aptaker for her support throughout my time with MxR. I will always have fond memories of my water cooler chats with Dr. Benjamin Nye, Director of ICT’s Learning Sciences Lab and the many projects I’ve collaborated with him and the Learning Sciences lab. Finally, I want to thank Mark Bolas, Jonathan Gratch, and Mohammad Soleymani for their excellent professorship in my college career.

I am grateful to have met wonderful peers to grow with at ICT. I am ceaselessly amazed by Natali Chavez’s boundless passion and energy in weaving art and technology together. I will be even more surprised if I ever manage to meet another person as gregarious and who lives life as well as her. Furthermore, I am glad to have had the pleasure of working with Bin Han, Brian Kwon, and Kaleen Shreshta who’s stalwart endurance and determination in bringing our class project into a full research study cannot be understated. Last but not least, I recognize I am exceptionally lucky to have met Basem Rizk, Miru Jun, and my other Estuary and NASA SUITS team members, a lot of whom worked with me through too many restless nights to count and throughout nearly all of my time at USC. This includes my brother (also at ICT), Stanley Lin, who has always found a way to support my ventures in life and whose cinematography skills in producing sizzle reels of all my projects (and even the photos in this essay) probably had a nontrivial impact in getting me into ICT. And because of the sheer magnitude and how integral he has been to building the foundations of Estuary, I must acknowledge my long-time partner, Basem Rizk, again. You’ve truly made the grade and I look forward to what the future holds for Estuary and us.

Follow Estuary

If this essay was in any way insightful or if you hope to follow and use Estuary, please connect with us on LinkedIn and check out our website!

Publications

[1] Spencer L., et al. – “Optimizing SIA Development: A Case Study in User-Centered Design for Estuary, a Multimodal Socially Interactive Agent Framework” [CHI 2025]

[2] Justin C., Spencer L., et al. – “Can Vision Language Models Understand Mimed Actions?” [ACL 2025]

[3] Bin H., Brian K., Spencer L., Kaleen S. – “Can LLMs Generate Behaviors for Embodied Virtual Agents Based on Personality Prompting?” [IVA 2025]

[4] Zeynep Abes, Nathan Fairchild, Spencer L., et al. “The Immersive Archive: Archival Strategies for the Sensorama & Sutherland HMD” [IEEE AIxVR 2025]

[5] Stanley Lin, Nathan Fairchild, Spencer L., et al. “Bridging Cinema and VR: Practical Workflows for 180° Stereoscopic Filmmaking” [SIGGRAPH Asia 2025 Under Review]

[6] Spencer L., Basem R., Miru J., et al. – “Estuary: A Framework For Building Multimodal Low-Latency Real-Time Socially Interactive Agents” [IVA 2024]

[7] Kimberly P., Benjamin F., Brent L., David N., Ben N., Rhys Y., Spencer L. – “See Like a Satellite: Adapting Human Vision to Complex Sensing Technologies with Adaptive Synthetic Aperture Radar Image Recognition Training (ASIRT)” [I/ITSEC 2024]

//

Back