Benchmarks that have been killed by LLM based systems
Killed by LLM is a project that documents public AI benchmarks that LLM-based AI systems have largely solved since 2018. Getting killed means that a benchmark no longer measures the frontier of AI technology as a challenge asking "Can AI do X?", but might still be a useful tool. Links to papers documenting fallen benchmarks are provided.
The project is on GitHub, and other people are invited to contribute new benchmarks that have been overcome.UMBC Center for AI
Posted: January 7, 2025, 9:18 AM