Long context models have been getting increasingly good at passing “Needle in a Haystack” tests recently, but what about a problem in the opposite direction?