Article 6NM5B Refusal in Language Models Is Mediated by a Single Direction

Refusal in Language Models Is Mediated by a Single Direction

by
from Hacker News on (#6NM5B)
Comments
External Content
Source RSS or Atom Feed
Feed Location https://news.ycombinator.com/rss
Feed Title Hacker News
Feed Link https://news.ycombinator.com/
Reply 0 comments