Paul Buckingham - Blog Philosophy and life in general

This month marks the 75th anniversary of a landmark paper by mathematician Alan Turing. In October 1950, he published Computing Machinery and Intelligence, proposing an “imitation game” that would later become known as the Turing Test.

In this game, an examiner converses – proposed originally to be via teleprinter - with both a computer and a human. If the examiner cannot reliably distinguish which is which, Turing argued, then the machine could be said to “think,” or at least to imitate human thought convincingly enough to be considered intelligent. As he famously predicted:

“I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10⁹ bytes (roughly one gigabyte), to make them play the imitation game so well that an average interrogator will not have more than a 70 percent chance of making the right identification after five minutes of questioning.”

Seventy-five years later, it is worth asking: even if a computer’s responses perfectly mimic a human’s, does that truly demonstrate intelligence? The Turing Test does not attempt to verify whether genuine reasoning or understanding has taken place. It cannot. It measures only appearance - the simulation of thought, not its substance.

From Rules to Learning

When Turing first proposed his test, he envisaged computers being programmed with sets of rules that would guide their decision-making. For decades, that was indeed believed to be the way in which artificial intelligence would be brought into being. It only became clear later on, much later on that an ability to learn would have to be at the heart of the process.

Even in the 1990s, computers could not yet learn from experience. The great triumph of that era - IBM’s Deep Blue defeating world chess champion Garry Kasparov - was achieved not through learning but through exhaustive calculation and heuristics. Deep Blue operated by applying rules of thumb, such as “if you can capture an opponent’s piece without risk, then do so.” These heuristics, first proposed by Turing and refined by later researchers, allowed the machine to compete at grandmaster level.

But Deep Blue did not learn. Its apparent intelligence was in fact only the implementation of what the programmers had learned from their experience, the experience gained while taking part in and running simulations of games of chess using Deep Blue.

The Rise of Machines That “Learn”

Today’s computers are vastly different. Machine learning systems can discern patterns invisible to us, predict molecular behaviour, and recognise speech or images with astonishing accuracy. Yet, despite their sophistication, they still do not think like humans. They lack our emotional depth, our psychological biases, and our reflective reasoning.

Turing could not have imagined the other major advance either - Large Language Models (LLMs) - machines that produce text so fluent and human-like that they appear to reason. These systems scoop up immense amounts of data from the internet and regurgitate it in responses, all based on probability rather than comprehension. An LLM does not understand its answers, yet it can be remarkably convincing.

Bias, Accuracy, and “AI Marking Its Own Homework”

A recent study highlights both the promise and pitfalls of this technology. Researchers examined LLM based search tools and conversational models, assessing them for factual accuracy and bias. They found that a substantial proportion of responses were unsupported by reliable evidence—about one-third overall. For OpenAI’s GPT-4.5, the figure rose to 47 percent.

The study evaluated 303 queries against eight metrics, including objectivity, confidence, relevance, and citation reliability. The systems powering search engines such as Bing Chat and Perplexity AI produced unsupported claims in roughly a quarter to a third of cases.

Curiously, the researchers used another LLM to evaluate the answers – so, marking its own homework? Sort of. The model had in fact been trained to mimic human evaluations of replies to similar questions - which of course also introduces a further potential layer of bias. After all, we humans are filled to the brim with biases and fail to provide sources to justify our assertions.

But in that sense an LLM reflects us and our abilities very well indeed and so passes the Turing Test with flying colours.

What Passing the Test Shows

We know that many people find the answers from LLMs to be indistinguishable from answers produced by actual human beings. They even ascribe personality and self-awareness to them, sometimes accepting their advice even where it suggests self-harm or suicide as the answer to life’s problems.

If though a computer process which has no actual intelligence or reasoning ability can convince us that it is a person, then that in turn tells us that the Turing test is not of any obvious value. It is merely a test of the appearance of rationality or human personality. The Turing Test was, after all, described by him as ‘an imitation game’. And so it measures only our willingness to be deceived.

One might also argue that the sheer speed of AI responses should betray their non-human nature. Human reasoning often takes time; reflection is not instantaneous. Additional material has often to be found to fill in the gaps in our knowledge. Machines that answer immediately therefore inadvertently reveal, through their very fluency, that they are not thinking at all.

On Consciousness and the Curious Case of My Car

Of course, to have intelligence, most people would say that self-awareness has also to be present. But there is still great debate as to what we mean by consciousness and how it emerges from our trillion or so synapses. At what point could a computer truly become aware of itself - and so of its own potential mortality?

Some philosophers, such as Professor Philip Goff of Durham University, propose panpsychism as the solution: the idea that all matter, down to the smallest particle, possesses some form of consciousness. And, despite my initial scepticism, I am now convinced of its validity. This though is subject to a vital extension to the theory. In my view, where there is self-awareness, the obvious corollary is that it must come with feelings. Self-awareness without feelings would be meaningless. How can a mountain be self-aware without a realisation of its place in and importance to the world?

And I am pleased to say that my change of mind is justified by the evidence. Returning from a two-week holiday in France, I discovered that the headlights on my car no longer worked. I booked it into a garage only for them to tell me that they had found no fault at all; everything was functioning perfectly. Obviously my car had felt neglected after having been deserted by me for a fortnight; it wanted some love and attention. Once it received it, the car miraculously healed itself.

Spooky—or what?

Conclusion

Seventy-five years after Computing Machinery and Intelligence, the Turing Test remains a fascinating description of what Turing (and no doubt others) imagined would be the future of machine intelligence. Today’s LLMs can write poetry, compose music, and even debate philosophy - yet they remain mirrors reflecting our own reasoning, biases, and desires back at us.

Perhaps that is the real lesson of the Turing Test: not that machines are becoming human, but that we humans are remarkably willing to believe that they are.

10 October 2025

Paul Buckingham

	The Turing Test: Imitation, Intelligence, and Illusion

	This month marks the 75th anniversary of a landmark paper by mathematician Alan Turing. In October 1950, he published Computing Machinery and Intelligence, proposing an “imitation game” that would later become known as the Turing Test. In this game, an examiner converses – proposed originally to be via teleprinter - with both a computer and a human. If the examiner cannot reliably distinguish which is which, Turing argued, then the machine could be said to “think,” or at least to imitate human thought convincingly enough to be considered intelligent. As he famously predicted: “I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10⁹ bytes (roughly one gigabyte), to make them play the imitation game so well that an average interrogator will not have more than a 70 percent chance of making the right identification after five minutes of questioning.” Seventy-five years later, it is worth asking: even if a computer’s responses perfectly mimic a human’s, does that truly demonstrate intelligence? The Turing Test does not attempt to verify whether genuine reasoning or understanding has taken place. It cannot. It measures only appearance - the simulation of thought, not its substance. From Rules to Learning When Turing first proposed his test, he envisaged computers being programmed with sets of rules that would guide their decision-making. For decades, that was indeed believed to be the way in which artificial intelligence would be brought into being. It only became clear later on, much later on that an ability to learn would have to be at the heart of the process. Even in the 1990s, computers could not yet learn from experience. The great triumph of that era - IBM’s Deep Blue defeating world chess champion Garry Kasparov - was achieved not through learning but through exhaustive calculation and heuristics. Deep Blue operated by applying rules of thumb, such as “if you can capture an opponent’s piece without risk, then do so.” These heuristics, first proposed by Turing and refined by later researchers, allowed the machine to compete at grandmaster level. But Deep Blue did not learn. Its apparent intelligence was in fact only the implementation of what the programmers had learned from their experience, the experience gained while taking part in and running simulations of games of chess using Deep Blue. The Rise of Machines That “Learn” Today’s computers are vastly different. Machine learning systems can discern patterns invisible to us, predict molecular behaviour, and recognise speech or images with astonishing accuracy. Yet, despite their sophistication, they still do not think like humans. They lack our emotional depth, our psychological biases, and our reflective reasoning. Turing could not have imagined the other major advance either - Large Language Models (LLMs) - machines that produce text so fluent and human-like that they appear to reason. These systems scoop up immense amounts of data from the internet and regurgitate it in responses, all based on probability rather than comprehension. An LLM does not understand its answers, yet it can be remarkably convincing. Bias, Accuracy, and “AI Marking Its Own Homework” A recent study highlights both the promise and pitfalls of this technology. Researchers examined LLM based search tools and conversational models, assessing them for factual accuracy and bias. They found that a substantial proportion of responses were unsupported by reliable evidence—about one-third overall. For OpenAI’s GPT-4.5, the figure rose to 47 percent. The study evaluated 303 queries against eight metrics, including objectivity, confidence, relevance, and citation reliability. The systems powering search engines such as Bing Chat and Perplexity AI produced unsupported claims in roughly a quarter to a third of cases. Curiously, the researchers used another LLM to evaluate the answers – so, marking its own homework? Sort of. The model had in fact been trained to mimic human evaluations of replies to similar questions - which of course also introduces a further potential layer of bias. After all, we humans are filled to the brim with biases and fail to provide sources to justify our assertions. But in that sense an LLM reflects us and our abilities very well indeed and so passes the Turing Test with flying colours. What Passing the Test Shows We know that many people find the answers from LLMs to be indistinguishable from answers produced by actual human beings. They even ascribe personality and self-awareness to them, sometimes accepting their advice even where it suggests self-harm or suicide as the answer to life’s problems. If though a computer process which has no actual intelligence or reasoning ability can convince us that it is a person, then that in turn tells us that the Turing test is not of any obvious value. It is merely a test of the appearance of rationality or human personality. The Turing Test was, after all, described by him as ‘an imitation game’. And so it measures only our willingness to be deceived. One might also argue that the sheer speed of AI responses should betray their non-human nature. Human reasoning often takes time; reflection is not instantaneous. Additional material has often to be found to fill in the gaps in our knowledge. Machines that answer immediately therefore inadvertently reveal, through their very fluency, that they are not thinking at all. On Consciousness and the Curious Case of My Car Of course, to have intelligence, most people would say that self-awareness has also to be present. But there is still great debate as to what we mean by consciousness and how it emerges from our trillion or so synapses. At what point could a computer truly become aware of itself - and so of its own potential mortality? Some philosophers, such as Professor Philip Goff of Durham University, propose panpsychism as the solution: the idea that all matter, down to the smallest particle, possesses some form of consciousness. And, despite my initial scepticism, I am now convinced of its validity. This though is subject to a vital extension to the theory. In my view, where there is self-awareness, the obvious corollary is that it must come with feelings. Self-awareness without feelings would be meaningless. How can a mountain be self-aware without a realisation of its place in and importance to the world? And I am pleased to say that my change of mind is justified by the evidence. Returning from a two-week holiday in France, I discovered that the headlights on my car no longer worked. I booked it into a garage only for them to tell me that they had found no fault at all; everything was functioning perfectly. Obviously my car had felt neglected after having been deserted by me for a fortnight; it wanted some love and attention. Once it received it, the car miraculously healed itself. Spooky—or what? Conclusion Seventy-five years after Computing Machinery and Intelligence, the Turing Test remains a fascinating description of what Turing (and no doubt others) imagined would be the future of machine intelligence. Today’s LLMs can write poetry, compose music, and even debate philosophy - yet they remain mirrors reflecting our own reasoning, biases, and desires back at us. Perhaps that is the real lesson of the Turing Test: not that machines are becoming human, but that we humans are remarkably willing to believe that they are. 10 October 2025 Paul Buckingham
	Home A Point of View Philosophy Who am I? Links Photos of Annecy