This is what happens when they put Chatgpt and Copilot to play chess with an Atari 2600 of 1977

Something that users must always keep in mind with generative the tools is that, although they can highlight a lot in some tasks, They are equally bad in others. We have the last example in the experiment conducted by the engineer of Citrix Systems Robert Jr. Carusowho last month began to play chess to Chatgpt Against the first successful video game console, the Atari 2600 of 1977and now he has just done the same with Microsoft Copilot.

To face the veteran console with the AI of OpenAI and Microsoftused the Atari emulator Stella and the video game Chess videolaunched in 1979. Caruso took captures of each play that the Atari made and rose them to the AI To propose its next movement from that information, as explained on LinkedIn.

The 70s software and hardware that have faced Chatgpt and Microsoft Copilot

You might think that, in a world in which Machines can defeat the best minds in chess (Deep Blueof IBMwon Garry Kaspárov in 1997), the advanced and capable chatbots would have no problem in the face of software as basic as the video chess, whose code only occupies 4 kbthe storage space of the ATARI 2600 cartridges.

The hardware that moved this video game is not to pour rockets either. Faced with the modern data centers with thousands of GPUs that support the current language models, is that of the Atari 2600 that moved the game With a Mos Technology 6507 processor at a frequency of 1.19 MHz.

Chess video should work with those resources and could only foresee one or two plays in advance. Given this panorama, both Chatgpt and Copilot exhibited a self -confidence worthy of the CEO of an AI company. Chatgpt wondered ‘how long it would take’ to overcome Atari 2600 ‘before being crushed’while COPILOT I was ‘overflowing of trust’ and claimed without cutting that it was Excellent playing chess and that would not be a problem.

Chatgpt: 0 – Atari 2600: 1

First was defeated Chatgpt, which, according to Caruso, He made so many mistakes that would laugh at the chatbot in a ‘third grade chess club’. The problem: space memory and its inability to follow the state of the chess board over time. Was ‘Crushed completely’.

The engineer shared this information with Copilot, which did not reduce his confidence in winning the Atari 2600.

‘No problem,’ Copilot replied, ensuring that ‘I would remember the previous plays and maintain continuity in the development of the gameso ours would be much more fluid. ‘ Microsoft’s AI, which uses Openai’s language models, said that although it was able to foresee Between 10 and 15 plays, ‘he preferred to limit himself to 3–5 against 2600because this makes “suboptimal plays” from which “could take advantage … instead of becoming obsessed with deep calculations.”

The joy did not last long for Copilot.

Microsoft Copilot: 0 – Atari 2600: 2

‘In the seventh turn he had already lost two pawns, a horse and a bishop … in exchange for a single pawn. And now he asked me to place his lady right in front of the Atari, to eat it in the next play. ‘ Before, Copilot had mocked that the Atari 2600 ‘made rare movements’, but now’ I was ridiculous’ ‘.

Despite the state of the board after each play of the Atari 2600, Copilot ended up losing the tracking of the plays. At one point, he raised to restart, but Caruso told him that with that disadvantage he would not win and that It was better to surrender.

The AI replied: ‘You are right, Bob. The Atari has won the victory this time. I will tilt my digital king with dignity and pay tribute to the vintage silicon genius that has defeated me cleanly. Even in the defeat, I have to say it: it has been a pleasure … long life to the 8 -bit battles and the noble surrender ‘.

The conclusion is that, although this is not a task for which the chatbots are designed, it serves to show how little the context and how easy they lose their thread even of seemingly simple situations. As Caruso summarized it: ‘His inability to maintain the basic state of the board from one turn to another was very disappointing. Is it so different from forgetting other crucial contexts in a conversation? ‘