An AI has reached a human level of intelligence, this is what it means

We knew the day would come. Since OpenAI launched its first chatbot model two years ago, programming has been surpassing itself in terms of performance, in short: has increasingly shown greater general intelligence. And now he just got human-level results on a test designed to measure “general intelligence.”

On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above AI’s previous best score of 55% and on par with the average human score. He also scored well on a very difficult math test.

In basic terms, The ARC-AGI test evaluates the “sampling efficiency” of an AI system in adapting to something new: how many examples of a novel situation the system needs to see to figure out how it works.

Creating artificial general intelligence, or AGI, is the stated goal of all major AI research labs. It is a type of AI that surpasses humans in all aspects. And, at first glance, OpenAI appears to have taken at least one significant step toward this goal.

“While skepticism persists – says Elija Perrier, an expert at Stanford University in an essay in The Conversation -, many AI researchers and developers feel that something has just changed. For many, the prospect of the AGI now seems more real, urgent and closer than anticipated. Are they right?”

Currently, AI systems like ChatGPT (GPT-4) are not very efficient, as they are trained on millions of human text examples, building probabilistic “rules” about which word combinations are most likely. The result is quite good on common tasks, but bad on rare tasks, because you have less data (fewer samples) on those tasks.

Until AI systems can learn from a small number of examples and adapt with greater sampling efficiency, They will only be used for very repetitive jobs and where occasional failure is tolerable.. The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the ability to generalize. And this is what AGI is based on, which could revolutionize technology.

“We don’t know exactly how OpenAI has done it – adds Perrier -, but the results suggest that the o3 model is very adaptable. From a few examples, find rules that can be generalized.”

Although still we don’t know how OpenAI achieved this resultit seems unlikely that they deliberately optimized the o3 system to find the rules that allow it to generalize.

“If OpenAI really achieved this breakthrough – concludes Perrier -, could have a huge revolutionary economic impact, ushering in a new era of accelerated, self-improving intelligence. We will need new benchmarks for the IAG itself and serious consideration of how it should be governed. If not, this will still be an impressive result. However, everyday life will remain largely the same.”