Evaluating advanced AI models: A complex issue - Rollup News