Paper: 'Moral Disagreement and the Limits of AI Value Alignment' by Nick Schuster & Daniel Kilov in AI & Society
In their article “Moral Disagreement and the Limits of AI Value Alignment” published in AI & Society (accepted June 6, 2025), Nick Schuster and Daniel Kilov examine three leading value alignment strategies—crowdsourcing, reinforcement learning from human feedback (RLHF), and constitutional AI. They argue that all three fall short in contexts involving reasonable moral disagreement, offering neither sufficient epistemic justification nor political legitimacy for morally controversial AI outputs. As these are the most promising current alignment methods, the authors conclude that addressing moral disagreement remains a core open problem for AI safety, and they outline directions for future research.
Read the full paper here.