Imagine a final exam so tough that even the smartest humans would struggle—and even today’s best artificial intelligence (AI) systems barely get a few answers right.
A question I often think about is whether we would recognize a truly alien kind of intelligence and whether we could analyze/measure it by our (continually changing) standards. For example, we now know that tree systems can recognize when other trees are in trouble and send them nutrients and other helpful substances via the shared root system. Is that a kind of intelligence? Do we have any real understanding of whether whales or octopi have intelligence?
Many years ago, my late mother-in-law had a student who came back from taking an ETS-like exam. He was perplexed by a question about who was Christopher Robin’s friend. This was before all the Disney cartoons and this young man had a home and school background that did not include A.A. Milne, though the exam seemed to assume all its takers would have such knowledge. So, cultural background can skew intelligence tests. Would a brilliant Martian even comprehend some of the questions we ask AI?
We measure things by what we know and often fall short when encountering completely new phenomena.
In order to get results quick and fast, the word “intelligence” is mixed with “knowledge”and “skills”. Knowledge or skills make an intelligent individual very powerful, but that is not “intelligence”.
What is 'intelligence' is shifting. A 100 years ago calculating a complicated numerical formula might have been considered the sign of high inteligence. Today we have pocket calculators and numerical calculations are not even AI. The GPT systems do the same with words what calculators do with numbers.
The AI companies have every incentive to let their model memorise answers to these questions.
Anyway, it is the wrong approach to use GPT as a kind of information-retrieval system (is this what these tests do?). The GPT should be used for large scale analysis of *unstructured* data. Much like the normal computers are used for *structured* data already.
GPT is just a glimpse of what’s ahead. With each new release—like DeepSeek and beyond—the definition of intelligence continues to evolve. Having a dynamic benchmark helps humanity track and prepare for what’s next.
A question I often think about is whether we would recognize a truly alien kind of intelligence and whether we could analyze/measure it by our (continually changing) standards. For example, we now know that tree systems can recognize when other trees are in trouble and send them nutrients and other helpful substances via the shared root system. Is that a kind of intelligence? Do we have any real understanding of whether whales or octopi have intelligence?
Many years ago, my late mother-in-law had a student who came back from taking an ETS-like exam. He was perplexed by a question about who was Christopher Robin’s friend. This was before all the Disney cartoons and this young man had a home and school background that did not include A.A. Milne, though the exam seemed to assume all its takers would have such knowledge. So, cultural background can skew intelligence tests. Would a brilliant Martian even comprehend some of the questions we ask AI?
We measure things by what we know and often fall short when encountering completely new phenomena.
In order to get results quick and fast, the word “intelligence” is mixed with “knowledge”and “skills”. Knowledge or skills make an intelligent individual very powerful, but that is not “intelligence”.
What is 'intelligence' is shifting. A 100 years ago calculating a complicated numerical formula might have been considered the sign of high inteligence. Today we have pocket calculators and numerical calculations are not even AI. The GPT systems do the same with words what calculators do with numbers.
The AI companies have every incentive to let their model memorise answers to these questions.
Anyway, it is the wrong approach to use GPT as a kind of information-retrieval system (is this what these tests do?). The GPT should be used for large scale analysis of *unstructured* data. Much like the normal computers are used for *structured* data already.
GPT is just a glimpse of what’s ahead. With each new release—like DeepSeek and beyond—the definition of intelligence continues to evolve. Having a dynamic benchmark helps humanity track and prepare for what’s next.