WORLD JOURNAL OF INNOVATION AND MODERN TECHNOLOGY (WJIMT )
E-ISSN 2504-4766
P-ISSN 2682-5910
VOL. 6 NO. 1 2022
DOI: DOI not found
Vikram Pasupuleti
The integration of artificial intelligence (AI) in healthcare offers promising avenues for enhancing diagnostic accuracy and patient engagement. However, AI models must be evaluated for their effectiveness in healthcare-specific tasks. This study assesses the performance of three AI models: BERT, GPT-3, and ClinicalBERT in the simulated healthcare environments, focusing on their conversational AI capabilities and medical knowledge extraction. We conducted a comparative evaluation using simulated patient interactions. ClinicalBERT was trained on clinical data from the MIMIC-III database, while BERT and GPT-3 utilized generalized language processing. Five medical experts assessed model performance across four metrics: Accuracy, Relevance, Coherence, and Medical Appropriateness. ClinicalBERT, with its specialized training, significantly outperformed BERT and GPT-3 across multiple metrics, including accuracy and F1 scores. For example, ClinicalBERT achieved an F1 score of 0.82, indicating its superior ability to interpret complex medical dialogues and extract relevant information. The study concludes by emphasizing the necessity of tailored training for AI models in healthcare. ClinicalBERT’s performance suggests domain-specific AI can enhance clinical outcomes. A hybrid approach combining generalist and specialist AI capabilities may further optimize healthcare communication. Future research should explore these models and address ethical and practical considerations in the clinical AI deployment.
Conversational AI, Clinical language models; Healthcare communication,
1. Väänänen, A., Haataja, K., Vehviläinen-Julkunen, K., & Toivanen, P. (2021). AI in
healthcare: A narrative review. F1000Research, 10, 6.
Secinaro, S., Calandra, D., Secinaro, A., Muthurangu, V., & Biancone, P. (2021). The
role of artificial intelligence in healthcare: a structured literature review. BMC Medical
Informatics and Decision Making, 21, 1-23.
Amann, J., Blasimme, A., Vayena, E., Frey, D., Madai, V. I., & Precise4Q Consortium.
(2020). Explainability for artificial intelligence in healthcare: a multidisciplinary
perspective. BMC Medical Informatics and Decision Making, 20, 1-9.
El Kah, A., & Zeroual, I. (2021, August). A review on applied natural language
processing to electronic health records. In 2021 1st International Conference on
Emerging Smart Technologies and Applications (eSmarTA) (pp. 1-6). IEEE.
Roy, K., Debdas, S., Kundu, S., Chouhan, S., Mohanty, S., & Biswas, B. (2021).
Application of natural language processing in healthcare. Computational Intelligence
and Healthcare Informatics, 393-407.
Hudaa, S., Setiyadi, D. B. P., Lydia, E. L., Shankar, K., Nguyen, P. T., Hashim, W., &
Maseleno,
A.
(2019).
Natural
language
processing
utilization
in
healthcare. International Journal of Engineering and Advanced Technology, 8(6),
1117-1120.
Laranjo, L., Dunn, A. G., Tong, H. L., Kocaballi, A. B., Chen, J., Bashir, R., ...
&Coiera, E. (2018). Conversational agents in healthcare: a systematic review. Journal
of the American Medical Informatics Association, 25(9), 1248-1258.
Kasula, B. Y. (2021). AI-Driven Innovations in Healthcare: Improving Diagnostics
and Patient Care. International Journal of Machine Learning and Artificial
Intelligence, 2(2), 1-8.
Lyon, J. Y., Bogodistov, Y., & Moormann, J. (2021). AI-driven optimization in
healthcare: the diagnostic process. European Journal of Management Issues, 29(4),
218-231.
Davenport, T., &Kalakota, R. (2019). The potential for artificial intelligence in
healthcare. Future Healthcare Journal, 6(2), 94.
Bohr, A., & Memarzadeh, K. (2020). The rise of artificial intelligence in healthcare
applications. In Artificial Intelligence in healthcare (pp. 25-60). Academic Press.
Schönberger, D. (2019). Artificial intelligence in healthcare: a critical analysis of the
legal and ethical implications. International Journal of Law and Information
Technology, 27(2), 171-203.
Hamid, O. H., & Braun, J. (2019). Reinforcement learning and attractor neural network
models of associative learning. In Computational Intelligence: 9th International Joint
Conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1-3, 2017 Revised
Selected Papers (pp. 327-349). Springer International Publishing.
Koroteev, M. V. (2021). BERT: a review of applications in natural language processing
and understanding. arXiv preprint arXiv:2103.11943.
Kenton, J. D. M. W. C., & Toutanova, L. K. (2019, June). Bert: Pre-training of deep
bidirectional transformers for language understanding. In Proceedings of naacL-HLT,
vol. 1.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep
bidirectional
transformers
for
language
understanding. arXiv
preprint
arXiv:1810.04805.
Zhu, R., Tu, X., & Huang, J. X. (2021). Utilizing BERT for biomedical and clinical
text mining. In Data analytics in biomedical engineering and healthcare (pp. 73-103).
Academic Press.
Zhang, M., & Li, J. (2021). A commentary of GPT-3 in MIT Technology Review
Fundamental Research, 1(6), 831-833.
Huang, K., Altosaar, J., & Ranganath, R. (2019). Clinicalbert: Modeling clinical notes
and predicting hospital readmission. arXiv preprint arXiv:1904.05342.
Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., ...
& Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific
Data, 3(1), 1-9.
Sharma, R. K., & Joshi, M. (2020). An analytical study and review of open source
chatbot framework, rasa. Int. J. Eng. Res, 9(06), 1011-1014.
Bodenreider, O. (2004). The unified medical language system (UMLS): integrating
biomedical terminology. Nucleic Acids Research, 32(suppl_1), D267-D270.
Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries.
In Text summarization branches out (pp. 74-81).
Huang, H., Xu, H., Wang, X., & Silamu, W. (2015). Maximum F1-score discriminative
training criterion for automatic mispronunciation detection. IEEE/ACM Transactions
on Audio, Speech, and Language Processing, 23(4), 787-797.