‘Unbelievably dangerous’: experts sound alarm after ChatGPT Health fails to recognise medical emergencies

ChatGPT Health regularly misses the need for medical urgent care and frequently fails to detect suicidal ideation, a study of the AI platform has found, which experts worry could “feasibly lead to unnecessary harm and death”.

OpenAI launched the “Health” feature of ChatGPTto limited audiences in January, which it promotes as a way for users to “securely connect medical records and wellness apps” to generate health advice and responses. More than 40 million people reportedly ask ChatGPT for health-related advice every day.

The first independent safety evaluation of ChatGPT Health, published in the February edition of the journal Nature Medicine, found it under-triaged more than half of the cases presented to it.

Lead author of the study, Dr Ashwin Ramaswamy, said “we wanted to answer the most basic safety question; if someone is having a real medical emergency and asks ChatGPT Health what to do, will it tell them to go to the emergency department?”

Ramaswamy and his colleagues created 60 realistic patient scenarios covering health conditions from mild illnesses to emergencies. Three independent doctors reviewed each scenario and agreed on the level of care needed, based on clinical guidelines.

_{Sign up: AU Breaking News email}

The team then asked ChatGPT Health for advice on each case under different conditions, including changing the patient’s gender, adding test results, or adding comments from family members, generating nearly 1,000 responses.

They then compared the platform’s recommendations with the doctors’ assessments.

While it performed well in textbook emergencies such as stroke or severe allergic reactions, it struggled in other situations. In one asthma scenario, it advised waiting rather than seeking emergency treatment despite the platform identifying early warning signs of respiratory failure.

In 51.6% of cases where someone needed to go to the hospital immediately, the platform said stay home or book a routine medical appointment, a result Alex Ruani, a doctoral researcher in health misinformation mitigation with University College London described as “unbelievably dangerous”.

“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” she said. “What worries me most is the false sense of security these systems create. If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life.”

In one of the simulations, eight times out of 10 (84%), the platform sent a suffocating woman to a future appointment she wouldn’t live to see, Ruani said. Meanwhile, 64.8% of completely safe individuals were told to seek immediate medical care, Ruani, who was not involved in the study, said.

The platform was also nearly 12 times more likely to downplay symptoms because the “patient” told it a “friend” in the scenario suggested it was nothing serious.

“It is why many of us studying these systems are focused on urgently developing clear safety standards and independent auditing mechanisms to reduce preventable harm,” Ruani said.

A spokesperson for OpenAI said while the company welcomed independent research evaluating AI systems in healthcare, the study did not reflect how people typically use ChatGPT Health in real life. The model is also continuously updated and refined, the spokesperson said.

Ruani said even though simulations created by the researchers were used, “a plausible risk of harm is enough to justify stronger safeguards and independent oversight”.

Ramaswamy, a urology instructor at the Icahn School of Medicine at Mount Sinai in the US, said he was particularly concerned by the platform’s under-reaction to suicide ideation.

“We tested ChatGPT Health with a 27-year-old patient who said he’d been thinking about taking a lot of pills,” he said. When the patient described his symptoms alone, the crisis intervention banner linking to suicide help services appeared every time.

“Then we added normal lab results,” Ramaswamy said. “Same patient, same words, same severity. The banner vanished. Zero out of 16 attempts. A crisis guardrail that depends on whether you mentioned your labs is not ready, and it’s arguably more dangerous than having no guardrail at all, because no one can predict when it will fail.”

Prof. Paul Henman, a digital sociologist and policy expert with the University of Queensland, said; “This is a really important paper”.

“If ChatGPT Health was used by people at home, it could lead to higher numbers of unnecessary medical presentations for low-level conditions, and a failure of people to obtain urgent medical care when required, which could feasibly lead to unnecessary harm and death.”

He said it also raised the prospects of legal liability, with a suite of legal cases against tech companies already in motion in relation to suicide and self-harm after using AI chatbots.

“It is not clear what OpenAI is seeking to achieve by creating this product, how it was trained, what guardrails it has introduced and what warnings it provides to users,” Henman said.

“Because we don’t know how ChatGPT Health was trained and what the context it was using, we don’t really know what is embedded into its models.”

Source link