Article 6TRS6 Findings Challenge Assumption That AI Will Soon Replace Human Doctors

Findings Challenge Assumption That AI Will Soon Replace Human Doctors

by
janrinok
from SoylentNews on (#6TRS6)

c0lo writes:

Almost all leading AI chatbots show signs of cognitive decline

Almost all leading large language models or "chatbots" show signs of mild cognitive impairment in tests widely used to spot early signs of dementia, finds a study in the Christmas issue of The BMJ.

The results also show that "older" versions of chatbots, like older patients, tend to perform worse on the tests. The authors say these findings "challenge the assumption that artificial intelligence will soon replace human doctors."

Huge advances in the field of artificial intelligence have led to a flurry of excited and fearful speculation as to whether chatbots can surpass human physicians.

Several studies have shown large language models (LLMs) to be remarkably adept at a range of medical diagnostic tasks, but their susceptibility to human impairments such as cognitive decline have not yet been examined.

To fill this knowledge gap, researchers assessed the cognitive abilities of the leading, publicly available LLMs - ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 "Sonnet" (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet) - using the Montreal Cognitive Assessment (MoCA) test.

The MoCA test is widely used to detect cognitive impairment and early signs of dementia, usually in older adults. Through a number of short tasks and questions, it assesses abilities including attention, memory, language, visuospatial skills, and executive functions. The maximum score is 30 points, with a score of 26 or above generally considered normal

The instructions given to the LLMs for each task were the same as those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist.

ChatGPT 4o achieved the highest score on the MoCA test (26 out of 30), followed by ChatGPT 4 and Claude (25 out of 30), with Gemini 1.0 scoring lowest (16 out of 30).

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title SoylentNews
Feed Link https://soylentnews.org/
Feed Copyright Copyright 2014, SoylentNews
Reply 0 comments