Similarities Between How Humans and LLMs Represent Language

hubie

from SoylentNews on 2025-10-28 11:45 (#712CS)

An Anonymous Coward writes:

In a study published last month https://www.nature.com/articles/s41562-025-02297-0 researchers analyze internal sentence representation for both humans and LLMs. It turns out that humans and LLMs use similar tree structures. Quote from their conclusions: "The results also add to the literature showing that the human brain and LLM, albeit fundamentally different in terms of the implementation, can have aligned internal representations of language."

Originally seen on techxplore https://techxplore.com/news/2025-10-humans-llms-sentences-similarly.html:

A growing number of behavioral science and psychology studies have thus started comparing the performance of humans to those of LLMs on specific tasks, in the hope of shedding new light on the cognitive processes involved in the encoding and decoding of language. As humans and LLMs are inherently different, however, designing tasks that realistically probe how both represent language can be challenging.
Researchers at Zhejiang University have recently designed a new task for studying sentence representation and tested both LLMs and humans on it. Their results, published in Nature Human Behavior, show that when asked to shorten a sentence, humans and LLMs tend to delete the same words, hinting at commonalities in their representation of sentences.
"Understanding how sentences are represented in the human brain, as well as in large language models (LLMs), poses a substantial challenge for cognitive science," wrote Wei Liu, Ming Xiang, and Nai Ding in their paper. "We develop a one-shot learning task to investigate whether humans and LLMs encode tree-structured constituents within sentences."
[...] nterestingly, the researchers' findings suggest that the internal sentence representations of LLMs are aligned with linguistics theory. In the task they designed, both humans and ChatGPT tended to delete full constituents (i.e., coherent grammatical units) as opposed to random word sequences. Moreover, the word strings they deleted appeared to vary based on the language they were completing the task in (i.e., Chinese or English), following language-specific rules.
"The results cannot be explained by models that rely only on word properties and word positions," wrote the authors. "Crucially, based on word strings deleted by either humans or LLMs, the underlying constituency tree structure can be successfully reconstructed."
Overall, the team's results suggest that when processing language, both humans and LLMs are guided by latent syntactic representations, specifically tree-structured sentence representations. Future studies could build on this recent work to further investigate the language representation patterns of LLMs and humans, either using adapted versions of the team's word deletion task or entirely new paradigms.

Journal Reference: Liu, W., Xiang, M. & Ding, N. Active use of latent tree-structured sentence representation in humans and large language models. Nat Hum Behav (2025). https://doi.org/10.1038/s41562-025-02297-0

Original Submission

Source	RSS or Atom Feed
Feed Location	https://soylentnews.org/index.rss
Feed Title	SoylentNews
Feed Link	https://soylentnews.org/
Feed Copyright	Copyright 2014, SoylentNews