Submit your papers Submit Now
International Peer-Reviewed Journal
For Enquiries: editor@iiardjournals.org
📄 Download Paper

Advancing Healthcare through Language Models for Enhanced Conversational AI and Knowledge Extraction

Vikram Pasupuleti

Abstract

The integration of artificial intelligence (AI) in healthcare offers promising avenues for enhancing diagnostic accuracy and patient engagement. However, AI models must be evaluated for their effectiveness in healthcare-specific tasks. This study assesses the performance of three AI models: BERT, GPT-3, and ClinicalBERT in the simulated healthcare environments, focusing on their conversational AI capabilities and medical knowledge extraction. We conducted a comparative evaluation using simulated patient interactions. ClinicalBERT was trained on clinical data from the MIMIC-III database, while BERT and GPT-3 utilized generalized language processing. Five medical experts assessed model performance across four metrics: Accuracy, Relevance, Coherence, and Medical Appropriateness. ClinicalBERT, with its specialized training, significantly outperformed BERT and GPT-3 across multiple metrics, including accuracy and F1 scores. For example, ClinicalBERT achieved an F1 score of 0.82, indicating its superior ability to interpret complex medical dialogues and extract relevant information. The study concludes by emphasizing the necessity of tailored training for AI models in healthcare. ClinicalBERT’s performance suggests domain-specific AI can enhance clinical outcomes. A hybrid approach combining generalist and specialist AI capabilities may further optimize healthcare communication. Future research should explore these models and address ethical and practical considerations in the clinical AI deployment.

Keywords

Conversational AI Clinical language models Healthcare communication

References

1. Väänänen, A., Haataja, K., Vehviläinen-Julkunen, K., & Toivanen, P. (2021). AI in healthcare: A narrative review. F1000Research, 10, 6. Secinaro, S., Calandra, D., Secinaro, A., Muthurangu, V., & Biancone, P. (2021). The role of artificial intelligence in healthcare: a structured literature review. BMC Medical Informatics and Decision Making, 21, 1-23. Amann, J., Blasimme, A., Vayena, E., Frey, D., Madai, V. I., & Precise 4Q Consortium (2020). Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20, 1-9. El Kah, A., & Zeroual, I. (2021, August). A review on applied natural language processing to electronic health records. In 2021 1st International Conference on Emerging Smart Technologies and Applications (eSmarTA) (pp. 1-6). IEEE. Roy, K., Debdas, S., Kundu, S., Chouhan, S., Mohanty, S., & Biswas, B. (2021). Application of natural language processing in healthcare. Computational Intelligence and Healthcare Informatics, 393-407. Hudaa, S., Setiyadi, D. B. P., Lydia, E. L., Shankar, K., Nguyen, P. T., Hashim, W., & Maseleno, A. (2019). Natural language processing utilization in healthcare. International Journal of Engineering and Advanced Technology, 8(6), 1117-1120. Laranjo, L., Dunn, A. G., Tong, H. L., Kocaballi, A. B., Chen, J., Bashir, R., ... & Coiera, E. (2018). Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association, 25(9), 1248-1258. Kasula, B. Y. (2021). AI-Driven Innovations in Healthcare: Improving Diagnostics and Patient Care. International Journal of Machine Learning and Artificial Intelligence, 2(2), 1-8. Lyon, J. Y., Bogodistov, Y., & Moormann, J. (2021). AI-driven optimization in healthcare: the diagnostic process. European Journal of Management Issues, 29(4), 218-231. Davenport, T., & Kalakota, R. (2019). The potential for artificial intelligence in healthcare. Future Healthcare Journal, 6(2), 94. Bohr, A., & Memarzadeh, K. (2020). The rise of artificial intelligence in healthcare applications. In Artificial Intelligence in healthcare (pp. 25-60). Academic Press. Schönberger, D. (2019). Artificial intelligence in healthcare: a critical analysis of the legal and ethical implications. International Journal of Law and Information Technology, 27(2), 171-203. Hamid, O. H., & Braun, J. (2019). Reinforcement learning and attractor neural network models of associative learning. In Computational Intelligence: 9th International Joint Conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1-3, 2017 Revised Selected Papers (pp. 327-349). Springer International Publishing. Koroteev, M. V. (2021). BERT: a review of applications in natural language processing and understanding. arXiv preprint arXiv:2103.11943. Kenton, J. D. M. W. C., & Toutanova, L. K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, vol. 1. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Zhu, R., Tu, X., & Huang, J. X. (2021). Utilizing BERT for biomedical and clinical text mining. In Data analytics in biomedical engineering and healthcare (pp. 73-103). Academic Press. Zhang, M., & Li, J. (2021). A commentary of GPT-3 in MIT Technology Review Fundamental Research, 1(6), 831-833. Huang, K., Altosaar, J., & Ranganath, R. (2019). Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342. Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1), 1-9. Sharma, R. K., & Joshi, M. (2020). An analytical study and review of open source chatbot framework, rasa. Int. J. Eng. Res, 9(06), 1011-1014. Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(suppl_1), D267-D270. Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81). Huang, H., Xu, H., Wang, X., & Silamu, W. (2015). Maximum F1-score discriminative training criterion for automatic mispronunciation detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(4), 787-797.