Offline Reinforcement Learning for LLM Multi-Step Reasoning by from Hacker News on 2024-12-23 10:16 (#6T3WY) Comments