{"id":8616,"date":"2025-01-25T18:48:23","date_gmt":"2025-01-25T18:48:23","guid":{"rendered":"https:\/\/theoreti.ca\/?p=8616"},"modified":"2025-01-25T18:48:23","modified_gmt":"2025-01-25T18:48:23","slug":"humanitys-last-exam","status":"publish","type":"post","link":"https:\/\/theoreti.ca\/?p=8616","title":{"rendered":"Humanity&#8217;s Last Exam"},"content":{"rendered":"<p>Researchers with the Center for AI Safety and Scale AI are gathering submissions for <a href=\"https:\/\/agi.safe.ai\/\">Humanity&#8217;s Last Exam<\/a>. The <a href=\"https:\/\/agi.safe.ai\/submit\">submission form is here<\/a>. The idea is to develop an exam with questions from a breadth of academic specializations that current LLMs can&#8217;t answer.<\/p>\n<blockquote><p>While current LLMs achieve very low accuracy on Humanity&#8217;s Last Exam, recent history shows benchmarks are quickly saturated &#8212; with models dramatically progressing from near-zero to near-perfect performance in a short timeframe. Given the rapid pace of AI development, it is plausible that models could exceed 50% accuracy on HLE by the end of 2025. High accuracy on HLE would demonstrate expert-level performance on closed-ended, verifiable questions and cutting-edge scientific knowledge, but it would not alone suggest autonomous research capabilities or &#8220;artificial general intelligence.&#8221; HLE tests structured academic problems rather than open-ended research or creative problem-solving abilities, making it a focused measure of technical knowledge and reasoning. HLE may be the last academic exam we need to give to models, but it is far from the last benchmark for AI.<\/p><\/blockquote>\n<p>One wonders if it really will be the last exam. Perhaps we will get more complex exams that test for integrated skills. Andrej Karpathy <a href=\"https:\/\/x.com\/karpathy\/status\/1882498281089241545\">criticises the exam on X<\/a>. I agree that what we need are AIs able to do intern-level complex tasks rather than just answering questions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Researchers with the Center for AI Safety and Scale AI are gathering submissions for Humanity&#8217;s Last Exam. The submission form is here. The idea is to develop an exam with questions from a breadth of academic specializations that current LLMs can&#8217;t answer. While current LLMs achieve very low accuracy on Humanity&#8217;s Last Exam, recent history &hellip; <a href=\"https:\/\/theoreti.ca\/?p=8616\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Humanity&#8217;s Last Exam<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[74,10,58,27],"tags":[],"class_list":["post-8616","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-computers-and-education","category-ethics-of-data-science","category-media-and-news"],"_links":{"self":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts\/8616","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8616"}],"version-history":[{"count":1,"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts\/8616\/revisions"}],"predecessor-version":[{"id":8617,"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts\/8616\/revisions\/8617"}],"wp:attachment":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8616"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8616"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8616"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}