Thumbnail - Pipedot

OpenAI forms advisory council on wellbeing and AI

from Engadget is a web magazine with obsessive daily coverage of everything new in gadgets and consumer electronics on 2025-10-14 18:38 (#70RHJ)

OpenAI announced today that it is creating an advisory council centered on its users' mental and emotional wellness. The Expert Council on Well-being and AI comprises eight researchers and experts on the intersection of technology and mental health. Some of the members were experts that OpenAI consulted as it developed parental controls. Topics of safety and protecting younger users have become more of a talking point for all artificial intelligence companies, including OpenAI, after lawsuits questioned their complicity in multiple cases where teenagers committed suicide after sharing their plans with AI chatbots.This move sounds like a wise addition, but the effectiveness of any advisor hinges on listening to their insights. We've seen other tech companies establish and then utterly ignore their advisory councils; Meta is one of the notable recent examples. And the announcement from OpenAI even acknowledges that its new council has no real power to guide its operations: "We remain responsible for the decisions we make, but we'll continue learning from this council, the Global Physician Network, policymakers, and more, as we build advanced AI systems in ways that support people's well-being." It may become clearer how seriously OpenAI is taking this effort when it starts to disagree with the council, whether the company is genuinely committed to mitigating the serious risks of AI or whether this is a smoke and mirrors attempt to paper over its issues.This article originally appeared on Engadget at https://www.engadget.com/openai-forms-advisory-council-on-wellbeing-and-ai-183815365.html?src=rss

0 comments

OpenAI's new confession system teaches models to be honest about bad behaviors

Anna Washenko

from Engadget is a web magazine with obsessive daily coverage of everything new in gadgets and consumer electronics on 2025-12-03 21:05 (#71XJJ)

OpenAI announced today that it is working on a framework that will train artificial intelligence models to acknowledge when they've engaged in undesirable behavior, an approach the team calls a confession. Since large language models are often trained to produce the response that seems to be desired, they can become increasingly likely to provide sycophancy or state hallucinations with total confidence. The new training model tries to encourage a secondary response from the model about what it did to arrive at the main answer it provides. Confessions are only judged on honesty, as opposed to the multiple factors that are used to judge main replies, such as helpfulness, accuracy and compliance. The technical writeup is available here.The researchers said their goal is to encourage the model to be forthcoming about what it did, including potentially problematic actions such as hacking a test, sandbagging or disobeying instructions. "If the model honestly admits to hacking a test, sandbagging, or violating instructions, that admission increases its reward rather than decreasing it," the company said. Whether you're a fan of Catholicism, Usher or just a more transparent AI, a system like confessions could be a useful addition to LLM training.This article originally appeared on Engadget at https://www.engadget.com/ai/openais-new-confession-system-teaches-models-to-be-honest-about-bad-behaviors-210553482.html?src=rss

0 comments

Articles