Anthropic’s System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly (Celia Ford/Transformer)
Celia Ford / Transformer:
Anthropic’s System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly — Anthropic’s new model appears to use “eval …