Technologies
AI Is Bad at Sudoku. It’s Even Worse at Showing Its Work
Researchers did more than ask chatbots to play games. They tested whether AI models could describe their thinking. The results were troubling.
Chatbots are genuinely impressive when you watch them do things they’re good at, like writing a basic email or creating weird, futuristic-looking images. But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails.
That’s what researchers at the University of Colorado at Boulder found when they challenged large language models to solve sudoku. And not even the standard 9×9 puzzles. An easier 6×6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools).
A more important finding came when the models were asked to show their work. For the most part, they couldn’t. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they hallucinated and started talking about the weather.
If gen AI tools can’t explain their decisions accurately or transparently, that should cause us to be cautious as we give these things more control over our lives and decisions, said Ashutosh Trivedi, a computer science professor at the University of Colorado at Boulder and one of the authors of the paper published in July in the Findings of the Association for Computational Linguistics.
«We would really like those explanations to be transparent and be reflective of why AI made that decision, and not AI trying to manipulate the human by providing an explanation that a human might like,» Trivedi said.
Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.
The paper is part of a growing body of research into the behavior of large language models. Other recent studies have found, for example, that models hallucinate in part because their training procedures incentivize them to produce results a user will like, rather than what is accurate, or that people who use LLMs to help them write essays are less likely to remember what they wrote. As gen AI becomes more and more a part of our daily lives, the implications of how this technology works and how we behave when using it become hugely important.
When you make a decision, you can try to justify it, or at least explain how you arrived at it. An AI model may not be able to accurately or transparently do the same. Would you trust it?
Why LLMs struggle with sudoku
We’ve seen AI models fail at basic games and puzzles before. OpenAI’s ChatGPT (among others) has been totally crushed at chess by the computer opponent in a 1979 Atari game. A recent research paper from Apple found that models can struggle with other puzzles, like the Tower of Hanoi.
It has to do with the way LLMs work and fill in gaps in information. These models try to complete those gaps based on what happens in similar cases in their training data or other things they’ve seen in the past. With a sudoku, the question is one of logic. The AI might try to fill each gap in order, based on what seems like a reasonable answer, but to solve it properly, it instead has to look at the entire picture and find a logical order that changes from puzzle to puzzle.
Read more: 29 Ways You Can Make Gen AI Work for You, According to Our Experts
Chatbots are bad at chess for a similar reason. They find logical next moves but don’t necessarily think three, four or five moves ahead — the fundamental skill needed to play chess well. Chatbots also sometimes tend to move chess pieces in ways that don’t really follow the rules or put pieces in meaningless jeopardy.
You might expect LLMs to be able to solve sudoku because they’re computers and the puzzle consists of numbers, but the puzzles themselves are not really mathematical; they’re symbolic. «Sudoku is famous for being a puzzle with numbers that could be done with anything that is not numbers,» said Fabio Somenzi, a professor at CU and one of the research paper’s authors.
I used a sample prompt from the researchers’ paper and gave it to ChatGPT. The tool showed its work, and repeatedly told me it had the answer before showing a puzzle that didn’t work, then going back and correcting it. It was like the bot was turning in a presentation that kept getting last-second edits: This is the final answer. No, actually, never mind, this is the final answer. It got the answer eventually, through trial and error. But trial and error isn’t a practical way for a person to solve a sudoku in the newspaper. That’s way too much erasing and ruins the fun.
AI struggles to show its work
The Colorado researchers didn’t just want to see if the bots could solve puzzles. They asked for explanations of how the bots worked through them. Things did not go well.
Testing OpenAI’s o1-preview reasoning model, the researchers saw that the explanations — even for correctly solved puzzles — didn’t accurately explain or justify their moves and got basic terms wrong.
«One thing they’re good at is providing explanations that seem reasonable,» said Maria Pacheco, an assistant professor of computer science at CU. «They align to humans, so they learn to speak like we like it, but whether they’re faithful to what the actual steps need to be to solve the thing is where we’re struggling a little bit.»
Sometimes, the explanations were completely irrelevant. Since the paper’s work was finished, the researchers have continued to test new models released. Somenzi said that when he and Trivedi were running OpenAI’s o4 reasoning model through the same tests, at one point, it seemed to give up entirely.
«The next question that we asked, the answer was the weather forecast for Denver,» he said.
(Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
Explaining yourself is an important skill
When you solve a puzzle, you’re almost certainly able to walk someone else through your thinking. The fact that these LLMs failed so spectacularly at that basic job isn’t a trivial problem. With AI companies constantly talking about «AI agents» that can take actions on your behalf, being able to explain yourself is essential.
Consider the types of jobs being given to AI now, or planned for in the near future: driving, doing taxes, deciding business strategies and translating important documents. Imagine what would happen if you, a person, did one of those things and something went wrong.
«When humans have to put their face in front of their decisions, they better be able to explain what led to that decision,» Somenzi said.
It isn’t just a matter of getting a reasonable-sounding answer. It needs to be accurate. One day, an AI’s explanation of itself might have to hold up in court, but how can its testimony be taken seriously if it’s known to lie? You wouldn’t trust a person who failed to explain themselves, and you also wouldn’t trust someone you found was saying what you wanted to hear instead of the truth.
«Having an explanation is very close to manipulation if it is done for the wrong reason,» Trivedi said. «We have to be very careful with respect to the transparency of these explanations.»
Technologies
Today’s NYT Connections Hints, Answers and Help for Jan. 25 #959
Here are some hints and the answers for the NYT Connections puzzle for Jan. 25, No. 959
Looking for the most recent Connections answers? Click here for today’s Connections hints, as well as our daily answers and hints for The New York Times Mini Crossword, Wordle, Connections: Sports Edition and Strands puzzles.
Really, New York Times? The paper noted for being rather sedate actually put the words SUB and DOM next to each other in today’s NYT Connections puzzle. Of course, they didn’t mean what they could have meant, and they did not end up in the same category, but still. Read on for clues and today’s Connections answers.
The Times has a Connections Bot, like the one for Wordle. Go there after you play to receive a numeric score and to have the program analyze your answers. Players who are registered with the Times Games section can now nerd out by following their progress, including the number of puzzles completed, win rate, number of times they nabbed a perfect score and their win streak.
Read more: Hints, Tips and Strategies to Help You Win at NYT Connections Every Time
Hints for today’s Connections groups
Here are four hints for the groupings in today’s Connections puzzle, ranked from the easiest yellow group to the tough (and sometimes bizarre) purple group.
Yellow group hint: Like an understudy.
Green group hint: Delete is another one.
Blue group hint: Like penne.
Purple group hint: At the end of words.
Answers for today’s Connections groups
Yellow group: Act as a backup.
Green group: PC keyboard keys.
Blue group: Pasta shapes.
Purple group: Suffixes.
Read more: Wordle Cheat Sheet: Here Are the Most Popular Letters Used in English Words
What are today’s Connections answers?
The yellow words in today’s Connections
The theme is act as a backup. The four answers are cover, fill in, sub and temp.
The green words in today’s Connections
The theme is PC keyboard keys. The four answers are alt, enter, menu and windows.
The blue words in today’s Connections
The theme is pasta shapes. The four answers are bowtie, ribbon, shell and tube.
The purple words in today’s Connections
The theme is suffixes. The four answers are ate, dom, hood and ship.
Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.
Technologies
Today’s NYT Strands Hints, Answers and Help for Jan. 25 #693
Here are hints and answers for the NYT Strands puzzle for Jan. 25, No. 693.
Looking for the most recent Strands answer? Click here for our daily Strands hints, as well as our daily answers and hints for The New York Times Mini Crossword, Wordle, Connections and Connections: Sports Edition puzzles.
Today’s NYT Strands puzzle was a bit tricky at first, although the answers are fairly short and simple. If you need hints and answers, read on.
I go into depth about the rules for Strands in this story.
If you’re looking for today’s Wordle, Connections and Mini Crossword answers, you can visit CNET’s NYT puzzle hints page.
Read more: NYT Connections Turns 1: These Are the 5 Toughest Puzzles So Far
Hint for today’s Strands puzzle
Today’s Strands theme is: The straight and narrow
If that doesn’t help you, here’s a clue: Not curved.
Clue words to unlock in-game hints
Your goal is to find hidden words that fit the puzzle’s theme. If you’re stuck, find any words you can. Every time you find three words of four letters or more, Strands will reveal one of the theme words. These are the words I used to get those hints but any words of four or more letters that you find will work:
- KITE, KITES, CITE, CITES, LONG, NOTE, NOTES, PATE, PALE, BATE, SPOT, POTS, LION, LIONS, STEAK
Answers for today’s Strands puzzle
These are the answers that tie into the theme. The goal of the puzzle is to find them all, including the spangram, a theme word that reaches from one side of the puzzle to the other. When you have all of them (I originally thought there were always eight but learned that the number can vary), every letter on the board will be used. Here are the nonspangram answers:
- CANE, POLE, POST, BATON, DOWEL, STAKE, PICKET
Today’s Strands spangram
Today’s Strands spangram is STICKYSITUATION. To find it, start with the S that’s the bottom letter in the far-left row, and wind straight up, one over and then straight down.
Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.
Technologies
Every iPhone 17E Rumor and Leak That I Found: Dynamic Island, MagSafe and More
Apple’s reportedly releasing a lower-priced iPhone 17, and it might offer notable improvements over last year’s iPhone 16E.
Key Takeaways:
- Features: Apple might include MagSafe on the iPhone 17E.
- Release date: Possibly as soon as February.
- Price: There have been no leaks about price increases, which is good news at this point.
- Design: Could get the Dynamic Island and look more like an iPhone 15.
Apple might be continuing its lower-cost iPhone line, with an iPhone 17E reportedly releasing early this year. If that’s true, the sequel to last year’s iPhone 16E has a lot of room to step up.
Some rumors point to improvements borrowed from Apple’s iPhone 15, such as Dynamic Island and MagSafe. If these are true, it could make the lower-cost iPhone 17E a compelling value option with fewer trade-offs needed to hit a lower price.
Apple’s $599 iPhone 16E was a bit of an oddity when it was released last year. It replaced Apple’s $429 iPhone SE, effectively retiring the older iPhone SE design that included a home button with Touch ID. Apple’s new «budget» device was a pricier amalgamation, featuring the body of an iPhone 14 with a display notch. It also had the USB-C port from the iPhone 15 and the A18 processor from the iPhone 16 to support Apple Intelligence features.
To save money, Apple scaled back on features by including only a single 48-megapixel main camera and omitting Apple’s MagSafe clip-on capability (though it kept standard wireless charging). While the iPhone 16E is a solid starter iPhone, I found these omissions to be confusing, especially given that Apple increased the price of this entry-level iPhone from $429 to $599.
An iPhone 17E could follow a playbook closer to Samsung’s Galaxy S25 FE. It would have many of the same features as the iPhone 16 and iPhone 17, like the smaller screen notch and an A19 processor, along with smaller stepbacks to the hardware that might be less noticeable.
Apple hasn’t confirmed whether an iPhone 17E exists yet, but we’re keeping an eye out. Here are the rumors we’ve heard so far, with features that could help or hinder the more budget-friendly iPhone 17E.
iPhone 17E release date: February 2026
The iPhone 17E could be announced as early as February, according to a Mashable report citing the Digital Chat Station Weibo account. The phone is said to be launching in the first half of the year. This would align with the iPhone 16E’s February 2025 announcement, establishing winter as Apple’s preferred launch window for cheaper iPhone models.
There are even rumors suggesting the base iPhone 18 will launch in the first half of 2027, but let’s not get too ahead of ourselves.
iPhone 17E design: Gets a Dynamic Island
One aspect that made the iPhone 16E stand out was Apple’s new design, which featured the iPhone 14’s body, a USB-C port and a single camera.
The iPhone 17E, however, will allegedly look more like 2023’s iPhone 15, with a smaller Dynamic Island cutout, according to the same Digital Chat Station Weibo post. The iPhone 17E is rumored to have a 6.1-inch display with a cutout, including dynamically sized notifications for timers and app alerts, such as Uber pickups.
This design is corroborated by the Smart Pikachu Weibo account, which also notes that the iPhone 17E will have a 60Hz refresh rate screen rather than the 120Hz one seen across the iPhone 17 line and the iPhone Air. It’d be nice to see a 17E with a 120Hz display, dubbed ProMotion by Apple. But this is one area that could be less noticeable to people coming from a former iPhone SE or an older base model like the iPhone 14.
While Apple’s ProMotion displays have been available on Pro models for years — as well as on almost every Android phone that costs $300 and more — the smoother animations and always-on displays it provides won’t be as noticeable when switching from a phone that never had them.
iPhone 17E features: MagSafe wireless charging
It baffled me that Apple didn’t include MagSafe with last year’s iPhone 16E. The feature, which allows for sticking magnetic accessories like chargers and wallets without a case, has been on most iPhone models since 2020. It felt like a strange omission, since Apple contributed MagSafe’s charging and magnetic profiles to the Qi2 standard, both of which are on Google’s Pixel 10 phones, HMD’s Skyline, and the upcoming Clicks Communicator.
The iPhone 17E is rumored to have a glass back that supports magnetic wireless charging — likely meaning the phone would gain the ability to magnetically attach to MagSafe and Qi2 accessories, according to a report in The Information spotted by 9to5Mac. This would be a major improvement for someone coming to this phone from an iPhone SE or the iPhone 11, both of which do support Qi wireless charging but do not include magnets for attaching accessories and cases.
While we would need more details, hopefully the inclusion of MagSafe also means the iPhone 17E’s wireless charging speed would increase to at least 15 watts, matching the iPhone 15.
iPhone 17E pricing
We’ll keep updating this story as more iPhone 17E rumors arrive. While there isn’t much regarding the pricing of the rumored phone, last year’s iPhone 16E starts at $599 for a 128GB model. I’m hoping the iPhone 17E starts at 256GB of storage, like the base iPhone 17. Apple still sells both the 16E and the iPhone 16 at 128GB, with the latter starting at $699.
-
Technologies3 года agoTech Companies Need to Be Held Accountable for Security, Experts Say
-
Technologies3 года agoBest Handheld Game Console in 2023
-
Technologies3 года agoTighten Up Your VR Game With the Best Head Straps for Quest 2
-
Technologies4 года agoBlack Friday 2021: The best deals on TVs, headphones, kitchenware, and more
-
Technologies4 года agoGoogle to require vaccinations as Silicon Valley rethinks return-to-office policies
-
Technologies5 лет agoVerum, Wickr and Threema: next generation secured messengers
-
Technologies4 года agoOlivia Harlan Dekker for Verum Messenger
-
Technologies4 года agoiPhone 13 event: How to watch Apple’s big announcement tomorrow
