Ensuring Safety, Ethics, and Efficacy: Best Practices for AI Conversational Agents in Healthcare

Published: July 21, 2025

By Dr. Albert “Skip” Rizzo, Director, MedVR Lab, ICT and Sharon Mozgai, Director, Virtual Human Therapeutics Lab [VHTL], ICT

Editor’s Note: This article is based on research accepted for publication in the Journal of Medical Extended Reality (July 2025): Rizzo, A., Mozgai, S., Sigaras A., Rubin, J.E., & Jotwani, R. (2025). Expert Consensus Best Practices for the Safe, Ethical, and Effective Design and Implementation of Artificially Intelligent Conversational Agent (i.e. Chatbot/Virtual Human) Systems in Healthcare Applications. Journal of Medical Extended Reality.

Artificial Intelligence is steadily transforming the contours of modern healthcare. From diagnostic tools powered by machine learning to wearable biosensors monitoring vital signs in real-time, technology has begun to serve as an invisible hand supporting both clinicians and patients. Yet, among these advances, one form of AI is uniquely poised to reshape the human experience of care: the Artificially Intelligent Conversational Agent (AICA).

Often referred to as chatbots or virtual humans, AICAs are increasingly used to support patients, guide clinical training, and offer scalable mental health support. With the rise of large language models (LLMs) and extended reality (XR) platforms, these agents now engage in remarkably human-like interactions. But as their capabilities grow, so too does the urgency for thoughtful guardrails to govern their design and deployment. The potential for positive impact is immense—but so are the risks if these systems are implemented without ethical forethought, clinical oversight, and robust data protections.

To address this critical inflection point, our team assembled a multidisciplinary panel of experts in AI, healthcare, extended reality, and ethics to develop consensus-driven best practices for AICAs in healthcare. The resulting recommendations, recently accepted for publication in the Journal of Medical Extended Reality, reflect a shared commitment to innovation that respects patient dignity, protects vulnerable populations, and upholds the integrity of clinical practice.

From Promise to Practice: The Rise of AICAs in Healthcare

Virtual humans have been a focal point of ICT research for nearly two decades. In early iterations, these embodied agents offered scripted interactions for training scenarios. Today, AICAs can simulate empathetic dialogue, respond adaptively, and guide users through clinical content—all while operating on smartphones, kiosks, or immersive XR platforms.

These systems are not science fiction. They are already assisting in PTSD screening, delivering psychoeducation, and providing a bridge to care for patients hesitant to seek help in traditional settings. AICAs have been shown to reduce perceived stigma, foster honest disclosure, and expand access in underserved communities. But alongside these promising use cases, troubling headlines have emerged: unmonitored AI agents giving harmful advice, or adolescents forming unhealthy attachments to fictional chatbot personas.

In the absence of clear guidelines, we risk eroding public trust, compromising patient safety, and undermining the very systems we hope to strengthen.

Building Guardrails: Five Domains of Best Practice

Our consensus framework organizes best practices into five essential domains: (1) AICA Manifestations and User Engagement, (2) Privacy, Safety, and Security, (3) Optimizing User Experience, (4) Systems Improvement, and (5) Integration of External Data. Each is underpinned by ethical reasoning, technical feasibility, and clinical sensibility.

1. AICA Manifestations and User Engagement

At the heart of ethical AICA design is transparency. Users must be clearly informed when they are interacting with an AI system. In healthcare, this is not merely a design preference—it’s a foundational matter of informed consent. We recommend explicit disclosure through verbal cues, visual markers, or user onboarding prompts. AICAs must not pass as human beings without user knowledge.

Further, the virtual representation of AICAs—especially those mimicking human identities—must be developed with cultural sensitivity and stakeholder input. This includes attention to diversity, avoidance of harmful stereotypes, and respect for users who may have philosophical objections to engaging with AI agents.

Accountability must also be built into the chain of design and deployment. Developers, platform providers, and healthcare institutions share responsibility for ensuring that AICAs function within clearly defined roles and limitations. Importantly, we advocate a “human-in-the-loop” approach: AI should augment—not replace—the judgment of trained professionals.

2. Privacy, Safety, and Security

Healthcare AI applications require the highest level of data protection. AICAs must comply with regional data regulations such as HIPAA (US), GDPR (EU), and equivalent standards worldwide. We recommend that all personal health information (PHI) be processed within HITRUST-certified environments, with zero data retention models used wherever feasible.

Moreover, clear boundaries must be established for the use of patient data. If data is to be used for research or system improvement, users must opt in with full understanding of the scope and purpose.

Just as crucial is the need for emergency protocols. AICAs must be able to detect language indicative of psychological distress or crisis and escalate appropriately to human support. This includes the use of natural language processing to recognize red flags such as suicidal ideation, and built-in escalation pathways to connect users to care.

3. Optimizing User Experience

AICA design should prioritize autonomy, accessibility, and empathy. Users must be empowered to pause or exit interactions, adjust the pace or tone of dialogue, and access explanations about how AI responses are generated.

We also advocate for robust cultural competency. AICAs should respond appropriately across linguistic, educational, and cultural contexts. Emotional support functions—while not a substitute for real empathy—should be designed to offer validation, resources, and self-care tools without misleading users about the agent’s capabilities.

To be effective and trustworthy, AICAs must be rooted in evidence-based practice. Responses should be drawn from peer-reviewed sources, and systems should be able to cite the origin of clinical advice. When uncertainty exists, AICAs must acknowledge it rather than fabricate confident but inaccurate responses.

4. Systems Improvement

No AICA should remain static. Developers must implement mechanisms for iterative improvement based on user feedback and independent validation studies. This includes auditing chatbot performance over time, assessing for bias, and conducting comparative studies to evaluate effectiveness against human-led interventions.

Critically, users must be included in these feedback loops—not merely as data sources, but as co-designers of better systems. Trust is not a one-time achievement; it must be maintained through continuous transparency and responsiveness.

5. Integration of External Data

Wearable biosensors and behavior tracking hold vast potential to personalize AICA interactions—detecting fatigue, anxiety, or other emotional states in real time. But with great insight comes great responsibility.

We call for explicit, informed consent prior to any use of physiological data. Users must understand what is being tracked, how often, and to what end. Further, such data must be stored securely, anonymized, and never shared beyond the scope of the user’s care without clear permission.

Biometric cues, when ethically used, can enhance AICAs’ ability to deliver timely, supportive interventions. But we must guard against intrusive or manipulative practices. The line between support and surveillance must never be blurred.

Toward a Trusted Future

This moment in healthcare innovation echoes an earlier era. In 1966, MIT’s Joseph Weizenbaum developed ELIZA, a program simulating a Rogerian therapist. To his surprise, users developed deep emotional connections to what was, in effect, a string-matching script. Weizenbaum famously warned against substituting machines for human functions that require “interpersonal respect, understanding, and love.”

Today’s AICAs are far more sophisticated—but Weizenbaum’s ethical caution endures. Our systems must be built not to impersonate humanity, but to support it. They must reduce barriers to care, not replace it. They must earn trust not through polish, but through integrity.

By adhering to these best practices, we believe AICAs can play a powerful role in the future of healthcare—expanding access, enhancing training, and supporting patients in ways that complement human expertise. But they must be designed, deployed, and monitored with the utmost care.

Let us not be dazzled by what AI can do. Let us be defined by what we choose to do with it.

Back