Abstract
Background: We assessed the use of large language models (LLMs) like ChatGPT-3.5 and Gemini against human experts as sources of patient information.
Research design and methods: We compared the accuracy, completeness and quality of freely accessible, baseline, general-purpose LLM-generated responses to 20 frequently asked questions (FAQs) on liver disease, with those from two gastroenterologists, using the Kruskal–Wallis test. Three independent gastroenterologists blindly rated each response.
Results: The expert and AI-generated responses displayed high mean scores across all domains, with no statistical difference between the groups for accuracy [H(2) = 0.421, p = 0.811], completeness [H(2) = 3.146, p = 0.207], or quality [H(2) = 3.350, p = 0.187]. We found no statistical difference between rank totals in accuracy [H(2) = 5.559, p = 0.062], completeness [H(2) = 0.104, p = 0.949], or quality [H(2) = 0.420, p = 0.810] between the three raters (R1, R2, R3).
Conclusion: Our findings outline the potential of freely accessible, baseline, general-purpose LLMs in providing reliable answers to FAQs on liver disease.
| Original language | English |
|---|---|
| Pages (from-to) | 437-442 |
| Number of pages | 6 |
| Journal | Expert Review of Gastroenterology and Hepatology |
| Volume | 19 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 27 Feb 2025 |
| Externally published | Yes |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- AI
- Artificial intelligence
- large language model
- liver disease
- LLM
- patient information
Fingerprint
Dive into the research topics of 'The reliability of freely accessible, baseline, general-purpose large language model generated patient information for frequently asked questions on liver disease: a preliminary cross-sectional study'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver