News
Commentary: A case for ethical continuity in the age of medical AI
By Gregory Kiar, PhD
Director, Center for Data Analytics, Innovation, and Rigor (DAIR), Child Mind Institute
&
Michael P. Milham, MD, PhD
Chief Science Officer, Child Mind Institute
Abstract
Medicine has long wrestled with a form of professional hubris, often termed a “God complex”, in which the conviction of noble intent is mistaken for a guarantee of patient safety. History has repeatedly shown the limits of that belief. Each breakthrough, from anesthesia to antibiotics, has carried unforeseen harms that demanded restraint, oversight, and a commitment to safety proportional to clinical risk. Medical artificial intelligence now renews that challenge, this time accelerated by commercial pressures, amplified by scale, and driven largely by forces outside medicine. This commentary calls for ethical continuity, extending the discipline that made medicine trustworthy into the digital age. We outline a risk stratification framework consisting of: risk–benefit assessment, operationalizing accuracy thresholds, pathways for human care escalation, and continuous post-market accountability. Behavioral health sits at the front line of this transformation, testing whether medicine’s ethical discipline can be incorporated into the digital age.
Introduction
Historically, medical breakthroughs, ranging from anesthesia to antipsychotics, have introduced novel risks alongside clinical benefits. These precedents underscore that the methodology of advancement matters as much as the innovation itself.
Artificial Intelligence (AI) is emerging as a new inflection point, reviving a familiar ethical challenge. Medicine once operated under a belief that having noble intent and professional self-regulation were sufficient. Catastrophes like thalidomide proved otherwise; the FDA was medicine’s hard won regulatory answer to that hubris. This requirement for external oversight is shared across technical disciplines, where ethical codes evolved in response to systemic failures.
Today, the scale and velocity of medical AI deployment necessitate a similar evolution. Here, we argue for ethical continuity: extending the rigorous engineering principles, professional codes, and regulatory safeguards that have kept medicine humane for decades.
Balancing unmet need with unchecked innovation
The medical AI marketplace is emerging, though without even the standards of over-the-counter medicine. Although offering greater scale and accessibility, absent accountability it risks replacing one form of inequity with another. General purpose AI tools interpret symptoms and guide decisions without professional input or assurances of quality. Specialist tools, such as therapy bots, pose risks when deployed without clinical oversight. Reports of clinical harm, including youth suicide linked to unmoderated AI persona use (e.g., Character.AI case), reveal the dangers of technological hubris. Commercial pressures and unprecedented scalability further amplify these risks, with momentum driven largely from outside the clinical field.
Yet the opposite risk is equally real: overly restrictive responses carry their own dangers. Medicine’s mandate to “do no harm” is a matter of proportion. “No harm” does not mean “no risk,” as even benign drugs can yield serious side effects. This tradeoff is poignant when considering underserved populations where digital tools may offer the only immediate hope for intervention. AI’s scalability can redefine medical action, extending the duty of care beyond the clinic walls and into the digital lives of patients.
Medicine’s progress has depended on learning safely from failure through structured trials and transparent reporting. Fast failures can be valuable when appropriately monitored and contained within systems of accountability, advancing innovation through evidence rather than exceptions. Clinical research as a care option (CRCO) has emerged within pharmaceutical research as a mechanism for bringing novel innovations to the public with appropriate labeling and monitoring. The artificial intelligence community must follow the same ethical model: innovations require justification by proportional benefit and bounded by oversight through standards of transparency and accountability.
The litmus test of behavioral health
Behavioral health sits at the most personal and interpretive edge of medicine, where AI most clearly can both reproduce and distort care. AI hallucinations, misread cues, and patient manipulations can cause immediate harm, as can subtler effects like discouraging people from seeking human intervention. Some systems may overstate medical risk, while others may mirror distorted thinking, overpathologize normal emotion, or minimize severe distress as ordinary.
Conversely, behavioral health stands to gain significantly from AI by expanding access where clinicians are scarce, tailoring language for individual contexts, and sustaining support between visits. The challenge is to capture that potential without eroding the clinical judgment and empathy that define therapeutic care. This duality, where the potential for connection meets the risk of distortion, makes behavioral health the definitive test for whether we can build AI to be both intelligent and humane.
A framework for risk stratification
A new system of governance is required to bridge the gap between unmet need and unchecked innovation. We propose a framework for ethical continuity that balances progress with risk, ensuring the safe and equitable deployment of medical AI tools.
Risk–benefit assessment — The first question is whether a tool should be built. This involves underscoring the gap, existing alternatives, and the cost of not filling that gap. This includes an assessment of what harm can be done if that gap is filled poorly, and which populations may be differentially impacted — such as individuals who are non-native English speakers. The decision to proceed must rest on an explicit acknowledgement of these tradeoffs, and pass through a review-board merit evaluation.
Operationalizing accuracy thresholds — No tools or medical assays are perfectly accurate or without bias: all have known sensitivities, specificities, and failure-modes. Physicians increase their understanding of patients through these imperfect assessments, while balancing risk to the patient, psychological burden, and resource availability. Medical AI may inherently require similar decision-making, without the luxury of clinician involvement. This positions the accuracy of an AI tool as an acceptable ethical threshold. In order for this ethical standard to be understood, much less enforced, medical AI tools need to be built and benchmarked on transparent and representative datasets for well-defined purposes.
Pathways for human care escalation — Gradual escalation and specialization of care has always been a core element of medicine. Medical AI needs to follow a similar model. Escalation can take multiple forms, including moving from general-purpose AI tools to domain-specialist models. However, medical AI must recognize its limitations and provide a clear pathway for human-led care escalation. The inherent scalability of digital tools gives them the tremendous opportunity to be used as a pathway for obtaining care or treatment oversight — though, only if escalation for clinical support is a core design feature.
Continuous post-market accountability — Even with the above, medical AI tools need oversight and guardrails. Ongoing evaluation against representative datasets is necessary, alongside clear guidelines governing the intended use and boundaries, and ongoing management of user consent. Strict behavioral guardrails are needed to govern what tools can and cannot do, as the possibility of action requires accountability.
The necessity of innovation in regulation
Responsible advances in medicine require that innovation be matched by discipline and restraint. The regulatory frameworks that followed past failures were corrective, not bureaucratic. Each emerged from the same recognition: good intent is not a safeguard. Artificial intelligence now brings that lesson to a new frontier, requiring its extension to create oversight for AI that is proportional, transparent, and scaled to risk.
Oversight should be risk-stratified. A generic resource portal or symptom checker requires less scrutiny than a diagnostic engine or an unconstrained “therapy bot.” Between these poles, mechanisms such as structured audits, standardized safety benchmarks, and domain-specific frameworks can guide oversight. Centralizing these requirements, rather than leaving them solely to tool developers, ensures consistency, fairness, and transparency.
By bringing regulators and technology builders to the same table, we can innovate in how we regulate, establishing platforms for continuous public auditing, open licensing, and defined escalation pathways that achieve discipline without slowing innovation. This is the necessary evolution of regulators, from gatekeepers to ecosystem builders.
Conclusion
The development of medical AI technologies promises both substantial benefit and significant risk. This is a familiar crossroads for medicine, and we have the advantage of an established ethical foundation to guide our progress. By adopting a risk stratification framework, we can ensure that innovation is timely and safe. The hard-won lessons of risk-benefit assessment, rigorous accuracy evaluation, human escalation pathways, and clear accountability, transform medical AI from an unregulated marketplace into a disciplined clinical structure. The measure of our success will not be the speed at which AI scales, but how it preserves the humility and caution that have protected patients and advanced medicine.