Connect with us

Technologies

AI Is Bad at Sudoku. It’s Even Worse at Showing Its Work

Researchers did more than ask chatbots to play games. They tested whether AI models could describe their thinking. The results were troubling.

Chatbots are genuinely impressive when you watch them do things they’re good at, like writing a basic email or creating weird, futuristic-looking images. But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails.

That’s what researchers at the University of Colorado at Boulder found when they challenged large language models to solve sudoku. And not even the standard 9×9 puzzles. An easier 6×6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools).

A more important finding came when the models were asked to show their work. For the most part, they couldn’t. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they hallucinated and started talking about the weather.

If gen AI tools can’t explain their decisions accurately or transparently, that should cause us to be cautious as we give these things more control over our lives and decisions, said Ashutosh Trivedi, a computer science professor at the University of Colorado at Boulder and one of the authors of the paper published in July in the Findings of the Association for Computational Linguistics.

«We would really like those explanations to be transparent and be reflective of why AI made that decision, and not AI trying to manipulate the human by providing an explanation that a human might like,» Trivedi said.


Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.


The paper is part of a growing body of research into the behavior of large language models. Other recent studies have found, for example, that models hallucinate in part because their training procedures incentivize them to produce results a user will like, rather than what is accurate, or that people who use LLMs to help them write essays are less likely to remember what they wrote. As gen AI becomes more and more a part of our daily lives, the implications of how this technology works and how we behave when using it become hugely important.

When you make a decision, you can try to justify it, or at least explain how you arrived at it. An AI model may not be able to accurately or transparently do the same. Would you trust it?

Why LLMs struggle with sudoku

We’ve seen AI models fail at basic games and puzzles before. OpenAI’s ChatGPT (among others) has been totally crushed at chess by the computer opponent in a 1979 Atari game. A recent research paper from Apple found that models can struggle with other puzzles, like the Tower of Hanoi.

It has to do with the way LLMs work and fill in gaps in information. These models try to complete those gaps based on what happens in similar cases in their training data or other things they’ve seen in the past. With a sudoku, the question is one of logic. The AI might try to fill each gap in order, based on what seems like a reasonable answer, but to solve it properly, it instead has to look at the entire picture and find a logical order that changes from puzzle to puzzle. 

Read more: 29 Ways You Can Make Gen AI Work for You, According to Our Experts

Chatbots are bad at chess for a similar reason. They find logical next moves but don’t necessarily think three, four or five moves ahead — the fundamental skill needed to play chess well. Chatbots also sometimes tend to move chess pieces in ways that don’t really follow the rules or put pieces in meaningless jeopardy. 

You might expect LLMs to be able to solve sudoku because they’re computers and the puzzle consists of numbers, but the puzzles themselves are not really mathematical; they’re symbolic. «Sudoku is famous for being a puzzle with numbers that could be done with anything that is not numbers,» said Fabio Somenzi, a professor at CU and one of the research paper’s authors.

I used a sample prompt from the researchers’ paper and gave it to ChatGPT. The tool showed its work, and repeatedly told me it had the answer before showing a puzzle that didn’t work, then going back and correcting it. It was like the bot was turning in a presentation that kept getting last-second edits: This is the final answer. No, actually, never mind, this is the final answer. It got the answer eventually, through trial and error. But trial and error isn’t a practical way for a person to solve a sudoku in the newspaper. That’s way too much erasing and ruins the fun.

AI struggles to show its work

The Colorado researchers didn’t just want to see if the bots could solve puzzles. They asked for explanations of how the bots worked through them. Things did not go well.

Testing OpenAI’s o1-preview reasoning model, the researchers saw that the explanations — even for correctly solved puzzles — didn’t accurately explain or justify their moves and got basic terms wrong. 

«One thing they’re good at is providing explanations that seem reasonable,» said Maria Pacheco, an assistant professor of computer science at CU. «They align to humans, so they learn to speak like we like it, but whether they’re faithful to what the actual steps need to be to solve the thing is where we’re struggling a little bit.»

Sometimes, the explanations were completely irrelevant. Since the paper’s work was finished, the researchers have continued to test new models released. Somenzi said that when he and Trivedi were running OpenAI’s o4 reasoning model through the same tests, at one point, it seemed to give up entirely. 

«The next question that we asked, the answer was the weather forecast for Denver,» he said.

(Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

Explaining yourself is an important skill

When you solve a puzzle, you’re almost certainly able to walk someone else through your thinking. The fact that these LLMs failed so spectacularly at that basic job isn’t a trivial problem. With AI companies constantly talking about «AI agents» that can take actions on your behalf, being able to explain yourself is essential.

Consider the types of jobs being given to AI now, or planned for in the near future: driving, doing taxes, deciding business strategies and translating important documents. Imagine what would happen if you, a person, did one of those things and something went wrong.

«When humans have to put their face in front of their decisions, they better be able to explain what led to that decision,» Somenzi said.

It isn’t just a matter of getting a reasonable-sounding answer. It needs to be accurate. One day, an AI’s explanation of itself might have to hold up in court, but how can its testimony be taken seriously if it’s known to lie? You wouldn’t trust a person who failed to explain themselves, and you also wouldn’t trust someone you found was saying what you wanted to hear instead of the truth. 

«Having an explanation is very close to manipulation if it is done for the wrong reason,» Trivedi said. «We have to be very careful with respect to the transparency of these explanations.»

Technologies

MacBook Pro May Be Finally Getting a Touchscreen

The OLED MacBook Pro reportedly will break new ground with late 2026 launch.

Get your swiping fingers ready. According to a reliable insider, Apple will introduce touchscreen capability to the MacBook Pro for the first time with the launch of the OLED MacBook Pro in late 2026.


Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.


Ming-Chi Kuo, an analyst with TF International Securities, posted on X that the cheaper MacBook, scheduled to go into production later this year, will not have a touchscreen. However, he said it’s probable that the model’s second generation, slated for production in 2027, could get a touchscreen.

A representative for Apple did not immediately respond to a request for comment.

After years of seeing how consumers use their iPads, Kuo believes that Apple realizes that «in certain scenarios, touch controls can enhance both productivity and the overall user experience.»

Read more: What exactly IS the best MacBook for you?

Apple has not confirmed the report. However, on Wednesday, Kuo listed numerous reports that have come true, including the iPhone Air specs, the Apple Watch Ultra 3 and the iPhone 17 Dynamic Island staying the same size.

What’s next for the MacBook?

It’s been nearly two decades since Apple introduced the first MacBook. CNET’s Scott Stein believes that the iPad and MacBook platforms will eventually merge.

YouTuber Jon Rettinger, who has 1.65M subscribers, agrees that the iPad and MacBook are «on a collision course with unity» and that consumers are always looking for the next level of interaction and form factors: in this case, a touchscreen.

But with the devices and their operating systems merging, Rettinger says a big question mark is what’s next for the iPadOS. «How will that continue to evolve?»

Continue Reading

Technologies

New PS5 Update Lets DualSense Controllers Pair With Multiple Devices at the Same Time

A software update brings the controller pairing feature to all users and also adds a power-saving option for some games.

All you PlayStation 5 players can now pair your wireless DualSense controllers with up to four devices at a time and toggle among them. 

For some time, you’ve been able to pair your DualSense controllers not just with PS5s and other PlayStation consoles, but also with PCs, Macs, smartphones and other Bluetooth devices. But that pairing could happen with only one device at a time; using a different device with a DualSense required pairing all over again.

That changes now, with system update 25.05-12.00.00. PS5 owners who download and install the update, which is now available, can pair their DualSense controllers with up to four different devices. The new feature also works with DualSense Edge controllers.

How to enable and use multidevice pairing on DualSense controllers

To enable the feature (which has been available to beta testers since June):

  1. Hold the PS button for 5 seconds and then hold one of the action buttons (triangle, circle, square or X) until the light bar and player indicator flash twice.
  2. Turn on Bluetooth pairing on your device and select the DualSense controller.
  3. Once pairing is done on the other device, the light bar and player indicator light should blink in one of four slots.

To use the feature once you’ve assigned other devices:

  1. Make sure the device you want to switch to is on and has Bluetooth enabled.
  2. Hold the PS button and the action button that was assigned (triangle, circle, square or X)for 3 seconds.
  3. The player indicator light (1-4) should flash according to which device pairing has been activated.

For more detailed instructions and troubleshooting, click here. 

Other PS5 system updates

The PS5 system update also brings another new feature: Some games will support power saving, which reduces power consumption by scaling back game performance. You can enable or disable that for specific games by going to Settings > System > Power Saving > Use Power Saver.

In addition, Sony says, the update improves messages and usability on some system screens, as well as software performance and stability.

Continue Reading

Technologies

Forget the iPhone 17, the 2025 Moto G Power Is on Sale for a Record-Low $250

This midrange Motorola is one of our favorite budget-friendly phones on the market, and right now you can snag one for $50 off the usual price.

There’s a lot of buzz about the iPhone 17, which hits shelves this Friday. But not everyone wants to spend $800 (or substantially more) on the latest Apple phone. Motorola has some great options for those looking for a more affordable alternative, and right now you can grab one for even less.

Amazon has knocked $50 off the 2025 Motorola Moto G Power, which drops the price to a record-low $250. It’s also on sale at Motorola, where you’ll get an extra $100 in credit if you’re trading in an old phone.

CNET’s mobile device experts named the Moto G Power as the best phone you can get for less than $300, with Mike Sorrentino calling it the «lowest-priced Motorola phone worth buying.» It’s not the most advanced model on the market, but can still has the hardware needed to «handle the basics without breaking the bank.»

Hey, did you know? CNET Deals texts are free, easy and save you money.

If features a vibrant 6.8-inch display with FHD+ resolution and a 120Hz refresh rate, as well as an impressive 50-megapixel rear camera system that preforms surprisingly well in low light. Under the hood, it’s equipped with 8GB of RAM, 128GB of storage and a Mediatek Dimensity 8100 CPU, as well as a respectable 5,000-mAh battery. It also has a unique «RAM Boost» feature that converts a small amount of available storage into virtual RAM to temporarily boost performance. Plus, it supports 5G connectivity and has durable IP68 design so its resistant to water and dust.

MOBILE DEALS OF THE WEEK

Deals are selected by the CNET Group commerce team, and may be unrelated to this article.

Why this deal matters

As one of our favorite affordable models, the Motorola Moto G Power is already a decent value at full price, and a bargain whenever you can pick it up for less — especially a record-low price. It features decent specs and hardware, as well as a fairly rugged design, making it well worth the money at just $250.

Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source on Chrome.

Join Our Daily Deals Text Group!

Get hand-picked deals from CNET shopping experts straight to your phone.

By signing up, you confirm you are 16+ and agree to receive recurring marketing messages at the phone number provided. Consent is not a condition of purchase. Reply STOP to unsubscribe. Msg & data rates may apply. View our Privacy Policy and Terms of Use.

Continue Reading

Trending

Exit mobile version