Evaluating Hallucination - A danger of LLMs

I just read a very interesting paper on Arxiv titled “The impact of using an AI chatbot to respond to patient messages.” where an evaluation of human-chat versus GPT-chat was done, measuring effectiveness, efficiency, and the potential danger of hallucination. YOu can see the paper [here] (https://arxiv.org/abs/2310.17703)


The integration of Artificial Intelligence (AI) into healthcare is a rapidly evolving frontier, promising to revolutionize patient care and clinician workflow. A recent study published on arXiv.org delves into this topic, exploring the effects of using an AI chatbot, specifically GPT-4, for responding to patient messages in a healthcare setting.

AI in Medicine

Overview of the Study

This study aimed to assess the acceptability, safety, and efficiency of using an AI-based chatbot in drafting responses to patient queries. Conducted at Brigham and Women’s Hospital, Boston, in 2023, it involved six board-certified oncologists who responded to 100 realistic cancer patient scenarios. The study was designed in two stages, where in the first stage, oncologists manually responded to patient messages, and in the second stage, they edited responses generated by GPT-4.

Key Findings

The results were significant:

  • Efficiency Improvement: In 77% of cases, the use of GPT-4 improved documentation efficiency.
  • Safety: 82% of the time, the AI-generated responses were considered safe.
  • Risk of Harm: However, there was a 7.7% chance that unedited GPT-4 responses could lead to severe harm or death.

Physician Perception of AI Responses

Interestingly, in 31% of cases, physicians believed the GPT-4 drafts were written by humans. This perception highlights the sophistication of AI-generated responses but also underscores the necessity for careful review.

Impact on Patient Education and Clinical Actions

The use of AI led to an increase in patient education recommendations but a decrease in direct clinical actions. This shift suggests that while AI can enhance informational support, it might underrepresent the need for immediate clinical interventions.

Risks and Challenges

Despite the benefits, the study highlights critical risks associated with AI in healthcare:

  • Potential for severe harm in a small but significant percentage of responses.
  • The need for diligent oversight and human intervention to ensure patient safety.

Future Implications

The study opens up a discussion about the future role of AI in healthcare communication. While AI shows promise in reducing clinician workload and enhancing patient education, its impact on clinical decision-making and the potential risks involved require ongoing scrutiny and responsible implementation.


In conclusion, this study on using AI chatbots for patient communication reveals a complex balance. While AI can significantly enhance efficiency and patient education, it introduces risks that necessitate vigilant oversight. The future of AI in healthcare holds great promise but demands a cautious and well-informed approach.


  • Chen, S., et al. (2023). The impact of using an AI chatbot to respond to patient messages. arXiv.org. arXiv:2310.17703

Please note that this blog post is a summary and interpretation of the study and does not cover all aspects of the research. For a comprehensive understanding, readers are encouraged to refer to the original paper.

Note: This post was written with the assistance of Large Language Models