Answers generated by artificial intelligence can pass the examinations needed to be granted a medical licence in the US, a new study has claimed.
Researchers saidÌęOpenAIâs software ChatGPTÌęscored at or around the 60 per cent threshold in the series of three tests that make up the Medical Licensing Exam (USMLE) with âcoherentâ responses that âcontained frequent insightsâ.
Achieving a pass in the ânotoriously difficultâ assessments â usually taken by medical students after at least two years of study â was seen as a âmilestoneâ for the development of AI tools that could have wide-reaching implications for medical education, according to the studyâs authors.
But other academics questioned the validity of the findings,ÌępublishedÌęin the open access journal , and called the study a publicity stunt for the healthcare company that backed the researchers involved.
Âé¶č
Author Tiffany Kung â a clinical fellow in anaesthesia at Massachusetts General Hospital, part of Harvard Medical School â and colleagues used 350 questions from the June 2022 USMLE, incorporating most medical disciplines from biochemistry to diagnostic reasoning.
Their paper found that, after indeterminate responses were removed, ChatGPT scored between 52.4 per cent and 75 per cent across the exams, which usually have a pass threshold of around 60 per cent.
Âé¶č
THE Campus resource: ChatGPT has arrived â and nothing has changed
They add that ChatGPT also demonstrated 94.6 per cent concordance across all its responses and produced at least one significant insight â defined as âsomething that was new, non-obvious and clinically validâ â for 88.9 per cent of its responses.
These were higher scores than those achieved by another AI chatbot, PubMedGPT,Ìęwhich had been trained exclusively on biomedical domain literature. It scored 50.8 per cent on an older dataset of USMLE-style questions.
The authors note that the sample size of questions used was relatively small but feel their study provides âa glimpse of ChatGPTâs potential to enhance medical education, and eventually, clinical practiceâ.
A preprint of the article circulated on social media had listed ChatGPT as an author as the researchers had asked it to âsynthesise, simplify and offer counterpoints to drafts in progressâ. The chatbotâs citation was removed ahead of final publication, but Dr Tung stressed that it had âcontributed substantially to the writing of [our] manuscriptâ.
Âé¶č
Reacting to the study, Peter Bannister, executive chair of the Institution of Engineering and Technology, said ChatGPT âcontinues to demonstrate an impressive ability to generate logical content in numerous settingsâ and the results âserve to highlight the limitations of written tests as the only way of assessing performance in complex and multidisciplinary professions such as medicineâ.
âWhile the results may be of great interest, the study has important limitations that call for caution,â warned LucĂa Ortiz de ZĂĄrateÌęAlcarazo,Ìęa pre-doctoral researcher in the ethics and governance of artificial intelligence at the Autonomous University of Madrid.
âWe will have to wait and see what results are obtained when ChatGPT is applied to a larger number of questions and, in turn, is trained with a larger volume of data and more specialised content,â she said.
Ms Ortiz de ZĂĄrate Alcarazo added that the results had only been evaluated by two doctors and further studies would need to employ a larger number of qualified evaluators to be able to endorse theÌęfindings.Ìę
Âé¶č
Collin Bjork, senior lecturer in science communication at Massey University, said the claim that ChatGPT could pass the exams was âoverblown and should come with a lengthy series of asterisksâ.
He noted that all but one of the authors work for Ansible Health, a Silicon Valley-based healthcare start-up that would soon be likely to need more investment capital. âThe media splash from this well-timed journal article will certainly help fund their next round of growth,â Dr Bjork said.
Âé¶č
He added claims about the insight shown by the chatbot were âmisleadingâ due to the âvagueâ definition used by researchers for what constituted this. Claims that AI would one day be able to teach medicine were ânaiveâ, Dr Bjork said. âHow can an unaware learner distinguish between true and false insights, especially when ChatGPT only offers âaccurateâ answers on the USMLE a little more than half the time?â
Register to continue
Why register?
- Registration is free and only takes a moment
- Once registered, you can read 3 articles a month
- Sign up for our newsletter
Subscribe
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to °Ő±á·Ąâs university and college rankings analysis
Already registered or a current subscriber?








