Memo: Defeat Devices for Benchmarks
Every enterprise AI buying decision rests on a benchmark score. A growing body of peer-reviewed research shows frontier models can tell when they are being evaluated – and that this ability scales up with each model generation rather than down, following a measured power law. Volkswagen built a device that