How Much Does Chatbot Bias Influence Users? A Lot, It Turns Out
hubie writes:
Customers are 32% more likely to buy a product after reading a review summary generated by a chatbot than after reading the original review written by a human. That's because large language models introduce bias, in this case a positive framing, in summaries. That, in turn, affects users' behavior.
These are the findings of the first study to show evidence that cognitive biases introduced by large language models, or LLMs, have real consequences on users' decision making, said computer scientists at the University of California San Diego. To the researchers' knowledge, it's also the first study to quantitatively measure that impact.
Researchers found that LLM-generated summaries changed the sentiments of the reviews they summarized in 26.5% of cases. They also found that LLMs hallucinated 60% of the time when answering user questions, if the answers were not part of the original training data used in the study. The hallucinations happened when the LLMs answered questions about news items, either real or fake, which could be easily fact checked. "This consistently low accuracy highlights a critical limitation: the persistent inability to reliably differentiate fact from fabrication," the researchers write.
How does bias creep into LLM output? The models tend to rely on the beginning of the text they summarize, leaving out the nuances that appear further down. LLMs also become less reliable when confronted with data outside of their training model.
To test how the LLMs' biases influenced user decisions, researchers chose examples with extreme framing changes (e.g., negative to positive) and recruited 70 people to read either original reviews or LLM-generated summaries to different products, such as headsets, headlamps and radios. Participants who read the LLM summaries said they would buy the products in 84% of cases, as opposed to 52% of participants who read the original reviews.
"We did not expect how big the impact of the summaries would be," said Abeer Alessa, the paper's first author, who completed the work while a master's student in computer science at UC San Diego. "Our tests were set in a low-stakes scenario. But in a high-stakes setting, the impact could be much more extreme."
Read more of this story at SoylentNews.