Technologies
AI Is Bad at Sudoku. It’s Even Worse at Showing Its Work
Researchers did more than ask chatbots to play games. They tested whether AI models could describe their thinking. The results were troubling.
Chatbots are genuinely impressive when you watch them do things they’re good at, like writing a basic email or creating weird, futuristic-looking images. But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails.
That’s what researchers at the University of Colorado at Boulder found when they challenged large language models to solve sudoku. And not even the standard 9×9 puzzles. An easier 6×6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools).
A more important finding came when the models were asked to show their work. For the most part, they couldn’t. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they hallucinated and started talking about the weather.
If gen AI tools can’t explain their decisions accurately or transparently, that should cause us to be cautious as we give these things more control over our lives and decisions, said Ashutosh Trivedi, a computer science professor at the University of Colorado at Boulder and one of the authors of the paper published in July in the Findings of the Association for Computational Linguistics.
«We would really like those explanations to be transparent and be reflective of why AI made that decision, and not AI trying to manipulate the human by providing an explanation that a human might like,» Trivedi said.
Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.
The paper is part of a growing body of research into the behavior of large language models. Other recent studies have found, for example, that models hallucinate in part because their training procedures incentivize them to produce results a user will like, rather than what is accurate, or that people who use LLMs to help them write essays are less likely to remember what they wrote. As gen AI becomes more and more a part of our daily lives, the implications of how this technology works and how we behave when using it become hugely important.
When you make a decision, you can try to justify it, or at least explain how you arrived at it. An AI model may not be able to accurately or transparently do the same. Would you trust it?
Why LLMs struggle with sudoku
We’ve seen AI models fail at basic games and puzzles before. OpenAI’s ChatGPT (among others) has been totally crushed at chess by the computer opponent in a 1979 Atari game. A recent research paper from Apple found that models can struggle with other puzzles, like the Tower of Hanoi.
It has to do with the way LLMs work and fill in gaps in information. These models try to complete those gaps based on what happens in similar cases in their training data or other things they’ve seen in the past. With a sudoku, the question is one of logic. The AI might try to fill each gap in order, based on what seems like a reasonable answer, but to solve it properly, it instead has to look at the entire picture and find a logical order that changes from puzzle to puzzle.
Read more: 29 Ways You Can Make Gen AI Work for You, According to Our Experts
Chatbots are bad at chess for a similar reason. They find logical next moves but don’t necessarily think three, four or five moves ahead — the fundamental skill needed to play chess well. Chatbots also sometimes tend to move chess pieces in ways that don’t really follow the rules or put pieces in meaningless jeopardy.
You might expect LLMs to be able to solve sudoku because they’re computers and the puzzle consists of numbers, but the puzzles themselves are not really mathematical; they’re symbolic. «Sudoku is famous for being a puzzle with numbers that could be done with anything that is not numbers,» said Fabio Somenzi, a professor at CU and one of the research paper’s authors.
I used a sample prompt from the researchers’ paper and gave it to ChatGPT. The tool showed its work, and repeatedly told me it had the answer before showing a puzzle that didn’t work, then going back and correcting it. It was like the bot was turning in a presentation that kept getting last-second edits: This is the final answer. No, actually, never mind, this is the final answer. It got the answer eventually, through trial and error. But trial and error isn’t a practical way for a person to solve a sudoku in the newspaper. That’s way too much erasing and ruins the fun.
AI struggles to show its work
The Colorado researchers didn’t just want to see if the bots could solve puzzles. They asked for explanations of how the bots worked through them. Things did not go well.
Testing OpenAI’s o1-preview reasoning model, the researchers saw that the explanations — even for correctly solved puzzles — didn’t accurately explain or justify their moves and got basic terms wrong.
«One thing they’re good at is providing explanations that seem reasonable,» said Maria Pacheco, an assistant professor of computer science at CU. «They align to humans, so they learn to speak like we like it, but whether they’re faithful to what the actual steps need to be to solve the thing is where we’re struggling a little bit.»
Sometimes, the explanations were completely irrelevant. Since the paper’s work was finished, the researchers have continued to test new models released. Somenzi said that when he and Trivedi were running OpenAI’s o4 reasoning model through the same tests, at one point, it seemed to give up entirely.
«The next question that we asked, the answer was the weather forecast for Denver,» he said.
(Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
Explaining yourself is an important skill
When you solve a puzzle, you’re almost certainly able to walk someone else through your thinking. The fact that these LLMs failed so spectacularly at that basic job isn’t a trivial problem. With AI companies constantly talking about «AI agents» that can take actions on your behalf, being able to explain yourself is essential.
Consider the types of jobs being given to AI now, or planned for in the near future: driving, doing taxes, deciding business strategies and translating important documents. Imagine what would happen if you, a person, did one of those things and something went wrong.
«When humans have to put their face in front of their decisions, they better be able to explain what led to that decision,» Somenzi said.
It isn’t just a matter of getting a reasonable-sounding answer. It needs to be accurate. One day, an AI’s explanation of itself might have to hold up in court, but how can its testimony be taken seriously if it’s known to lie? You wouldn’t trust a person who failed to explain themselves, and you also wouldn’t trust someone you found was saying what you wanted to hear instead of the truth.
«Having an explanation is very close to manipulation if it is done for the wrong reason,» Trivedi said. «We have to be very careful with respect to the transparency of these explanations.»
Verum Messenger has unveiled a new project — a mini-series created using Verum AI. The story consists of 7 episodes and will be released on the messenger’s social media channels.
The plot revolves around a global corporation seeking to take control of digital communications and a group of heroes who use Verum Messenger as a tool of resistance. Beyond the story itself, the series highlights the app’s key features, technologies, and advantages.
Combining entertainment with a showcase of the Verum ecosystem, the project presents a dynamic digital series designed for the modern era.
The first episode premieres today, with the remaining episodes to be released over time.
Stay tuned for more.
Technologies
Verum Finance: Earn While You Communicate — The Super App That Pays You
Verum Finance: Earn While You Communicate — The Super App That Pays You
Verum has officially launched Verum Finance, an innovative financial application that transforms a private messenger into a true financial super app. News of the launch was also featured on the respected platform Dealroom.co.
Verum Finance can now be used both within Verum Messenger and as a standalone application for iPhone and iPad. When users sign in to Verum Finance with their Verum Messenger account, all balances, settings, and account data are automatically synchronized for maximum convenience.
Users can now do more than communicate securely and protect their data — they can also generate passive income directly within the ecosystem.
What Verum Finance Offers
• Top up your balance with a bank card, Apple Pay, or USDT
• Send money instantly anywhere in the world
• Issue and manage debit cards (virtual and physical)
• Full Apple Pay support
• Exchange assets and withdraw funds quickly
One of the most unique features is the built-in cryptocurrency mining system inside Verum Messenger.
The application utilizes your device’s resources and allows you to earn cryptocurrency in the background — passively, while chatting, traveling, or simply using the messenger.
Maximum Privacy + Real Freedom
• Registration without a phone number, email address, or passport
• End-to-end encryption and full control over your data
• Lifetime free VPN
• eSIM connectivity in more than 150 countries
• Reliable offline communication mode
• Support for 12+ languages for users worldwide
Everything is available in one place: secure communication, financial tools, earning opportunities, and privacy protection.
Users can access the full experience directly within Verum Messenger or switch to the dedicated Verum Finance app for iOS. All data is synchronized automatically between the two applications.
Why Download Verum Today
While many messaging platforms collect user data and expose users to restrictions, Verum offers greater independence and the opportunity to earn.
With a one-time purchase of the feature package, users receive lifetime access to privacy tools, VPN, eSIM services, cryptocurrency mining, and financial features.
This is more than just a messenger.
It is your personal tool for financial and digital freedom.
Download Verum Finance and Verum Messenger today — start communicating securely and begin earning tomorrow.
Download Links:
→ App Store (iPhone / iPad): Verum Finance
→ App Store (Verum Messenger): Verum Messenger
Technologies
Verum Finance: A Super App for Private Finance Integrated Into a Messenger
Verum Finance: A Super App for Private Finance Integrated Into a Messenger
Verum Finance has announced the launch of a new financial application that allows users to manage their money directly within the secure Verum Messenger ecosystem.
The project has already attracted attention from major media outlets. A dedicated feature was published by Forbes Türkiye, while one of the world’s largest cryptocurrency exchanges, MEXC, covered the launch. Yahoo Finance had previously reported on the evolution of Verum Messenger into a comprehensive financial ecosystem.
What Verum Finance Offers
Verum Finance transforms a messenger into a complete financial platform. Users can:
• Manage their balance and top up using bank cards or USDT
• Send money instantly to other Verum users
• Issue and use debit cards, including Apple Pay support
• Exchange assets and withdraw funds
• Access all these services without installing separate banking applications
A strong emphasis is placed on privacy. The platform offers registration without a phone number or email address, end-to-end encryption, and full user control over personal data.
Recognition from Forbes Türkiye
In a dedicated article, Forbes Türkiye highlighted Verum Finance as a notable example of modern privacy-driven fintech. The publication emphasized the growing trend of financial services moving from standalone banking applications into unified messaging ecosystems — a model that has proven successful in Asia through platforms such as WeChat and Alipay and is now expanding globally.
Support from the Crypto Community
Alongside the Forbes Türkiye coverage, news about the launch of Verum Finance was also featured by MEXC, one of the world’s leading cryptocurrency exchanges. This reflects growing interest in the project from both traditional business media and the cryptocurrency community.
A Strategic Vision
“We are building more than a payments application and more than a messenger. Verum is a unified secure ecosystem where communication, finance, and privacy tools work together,” the company stated.
Verum Finance is now available for iPhone and iPad users. The application complements Verum Messenger, which offers anonymous chats, voice and video calls, VPN services, eSIM connectivity, and other tools designed to enhance digital freedom.
Verum Finance: https://finance.verum.im
Verum Messenger: https://verum.im
-
Technologies3 года agoTech Companies Need to Be Held Accountable for Security, Experts Say
-
Technologies3 года agoBest Handheld Game Console in 2023
-
Technologies5 лет agoBlack Friday 2021: The best deals on TVs, headphones, kitchenware, and more
-
Technologies3 года agoTighten Up Your VR Game With the Best Head Straps for Quest 2
-
Technologies5 лет agoGoogle to require vaccinations as Silicon Valley rethinks return-to-office policies
-
Technologies5 лет agoVerum, Wickr and Threema: next generation secured messengers
-
Technologies4 года agoThe number of Сrypto Bank customers increased by 10% in five days
-
Technologies5 лет agoOlivia Harlan Dekker for Verum Messenger
