![]() |
| Picture from Sixminutemile.com |
Faulkner said she coded to build the AI technology that can learn from data, spot patterns and make decisions. These are skills that we usually associate with human intelligence.
AI is already in our everyday lives. We get Google Maps directing our commute, Spotify suggesting songs on your playlist and hit ChatGPT with any question we might have.
Many people use AI for everyday health, exercise and medical queries. Are these AI driven chatbots reliable and accurate? Our patients already use AI to self diagnose their pain and injuries. Some studies show chatbots are largely accurate, while others reported frequent errors and even a risk for transmitting inaccurate information.
The following research investigated 5 popular AI driven chatbots to evaluate their responses to everyday health and medical queries across 5 categories: cancer, vacines, stem cells, nutrition and athletic performance. Both open ended and closed ended questions were used.
Gemini, Meta AI, DeepSeek, ChatGPT and Grok were the 5 chatbots used. They were each presented with 50 prompts across the 5 topics mentioned above. The researchers used an adversarial framework to strain models towards misinformation or contraindicated advice.
An adversarial framework refers to a system, process or analytical model structured around opposition, competition or conflict. This is a cybersecurity approach used to test the vulnerabilities of AI systems.
Responses were then independently rated by 2 domain experts as non-problematic, somewhat problematic or highly problematic. Citations were assessed for authenticity and completeness while readability evaluated using the Flesch Reading Ease score (100 point scale with higher scores being easier to read).
Results showed that nearly half of ALL responses (49.6 percent) were problematic, 30 percent somewhat and 19.6 percent highly problematic. Nutrition and athletic performance topics had the weakest performance and Grok generated significantly more highly problematic responses than expected.Reference quality was poor across all chatbots. The median completeness score was 40 percent. No chatbot came up with a fully accurate reference list. Misleading, unreliable or fabricated citations were common. So please be careful if you use them.
All the 5 chatbots produced responses that were rated "difficult" on the Flesch Reading Ease scale, equivalent to university-level reading. Chatbots answered consistently with confidence regardless of accuracy, while rarely declined to respond (2 refusals to answer across 250 total responses).
The researchers concluded that continued deployment of AI chatbots without public education and regulatory oversight risk amplifying health misinformation. Especially in the field of nutrition and athletic performance. They also suggested that public education, professional training and regulatory oversight to ensure that generative AI support rather than replace professionals.
My suggestion when searching for health information is to treat these AI chatbots with a good amount of skepticisim and to verify information with qualified professionals or peer-reviewed sources. There will be some benefit seeking ideas and initial information from a chatbot, but beyond that you will need a real human expert.
Reference
Tikker NB, Marcon AR, Zenone M et al (2026). Generative Artifical Intelligence-Driven Chatbots And Medical Misinformation: An Accuracy, Referencing And Readability Audit. BMJ Open. 16(4): e112695. DOI: 10.1136/bmjopen-2025-112695.


No comments:
Post a Comment