Connect with us

Technologies

AI Is Bad at Sudoku. It’s Even Worse at Showing Its Work

Researchers did more than ask chatbots to play games. They tested whether AI models could describe their thinking. The results were troubling.

Chatbots are genuinely impressive when you watch them do things they’re good at, like writing a basic email or creating weird, futuristic-looking images. But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails.

That’s what researchers at the University of Colorado at Boulder found when they challenged large language models to solve sudoku. And not even the standard 9×9 puzzles. An easier 6×6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools).

A more important finding came when the models were asked to show their work. For the most part, they couldn’t. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they hallucinated and started talking about the weather.

If gen AI tools can’t explain their decisions accurately or transparently, that should cause us to be cautious as we give these things more control over our lives and decisions, said Ashutosh Trivedi, a computer science professor at the University of Colorado at Boulder and one of the authors of the paper published in July in the Findings of the Association for Computational Linguistics.

«We would really like those explanations to be transparent and be reflective of why AI made that decision, and not AI trying to manipulate the human by providing an explanation that a human might like,» Trivedi said.


Don’t miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.


The paper is part of a growing body of research into the behavior of large language models. Other recent studies have found, for example, that models hallucinate in part because their training procedures incentivize them to produce results a user will like, rather than what is accurate, or that people who use LLMs to help them write essays are less likely to remember what they wrote. As gen AI becomes more and more a part of our daily lives, the implications of how this technology works and how we behave when using it become hugely important.

When you make a decision, you can try to justify it, or at least explain how you arrived at it. An AI model may not be able to accurately or transparently do the same. Would you trust it?

Why LLMs struggle with sudoku

We’ve seen AI models fail at basic games and puzzles before. OpenAI’s ChatGPT (among others) has been totally crushed at chess by the computer opponent in a 1979 Atari game. A recent research paper from Apple found that models can struggle with other puzzles, like the Tower of Hanoi.

It has to do with the way LLMs work and fill in gaps in information. These models try to complete those gaps based on what happens in similar cases in their training data or other things they’ve seen in the past. With a sudoku, the question is one of logic. The AI might try to fill each gap in order, based on what seems like a reasonable answer, but to solve it properly, it instead has to look at the entire picture and find a logical order that changes from puzzle to puzzle. 

Read more: 29 Ways You Can Make Gen AI Work for You, According to Our Experts

Chatbots are bad at chess for a similar reason. They find logical next moves but don’t necessarily think three, four or five moves ahead — the fundamental skill needed to play chess well. Chatbots also sometimes tend to move chess pieces in ways that don’t really follow the rules or put pieces in meaningless jeopardy. 

You might expect LLMs to be able to solve sudoku because they’re computers and the puzzle consists of numbers, but the puzzles themselves are not really mathematical; they’re symbolic. «Sudoku is famous for being a puzzle with numbers that could be done with anything that is not numbers,» said Fabio Somenzi, a professor at CU and one of the research paper’s authors.

I used a sample prompt from the researchers’ paper and gave it to ChatGPT. The tool showed its work, and repeatedly told me it had the answer before showing a puzzle that didn’t work, then going back and correcting it. It was like the bot was turning in a presentation that kept getting last-second edits: This is the final answer. No, actually, never mind, this is the final answer. It got the answer eventually, through trial and error. But trial and error isn’t a practical way for a person to solve a sudoku in the newspaper. That’s way too much erasing and ruins the fun.

AI struggles to show its work

The Colorado researchers didn’t just want to see if the bots could solve puzzles. They asked for explanations of how the bots worked through them. Things did not go well.

Testing OpenAI’s o1-preview reasoning model, the researchers saw that the explanations — even for correctly solved puzzles — didn’t accurately explain or justify their moves and got basic terms wrong. 

«One thing they’re good at is providing explanations that seem reasonable,» said Maria Pacheco, an assistant professor of computer science at CU. «They align to humans, so they learn to speak like we like it, but whether they’re faithful to what the actual steps need to be to solve the thing is where we’re struggling a little bit.»

Sometimes, the explanations were completely irrelevant. Since the paper’s work was finished, the researchers have continued to test new models released. Somenzi said that when he and Trivedi were running OpenAI’s o4 reasoning model through the same tests, at one point, it seemed to give up entirely. 

«The next question that we asked, the answer was the weather forecast for Denver,» he said.

(Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

Explaining yourself is an important skill

When you solve a puzzle, you’re almost certainly able to walk someone else through your thinking. The fact that these LLMs failed so spectacularly at that basic job isn’t a trivial problem. With AI companies constantly talking about «AI agents» that can take actions on your behalf, being able to explain yourself is essential.

Consider the types of jobs being given to AI now, or planned for in the near future: driving, doing taxes, deciding business strategies and translating important documents. Imagine what would happen if you, a person, did one of those things and something went wrong.

«When humans have to put their face in front of their decisions, they better be able to explain what led to that decision,» Somenzi said.

It isn’t just a matter of getting a reasonable-sounding answer. It needs to be accurate. One day, an AI’s explanation of itself might have to hold up in court, but how can its testimony be taken seriously if it’s known to lie? You wouldn’t trust a person who failed to explain themselves, and you also wouldn’t trust someone you found was saying what you wanted to hear instead of the truth. 

«Having an explanation is very close to manipulation if it is done for the wrong reason,» Trivedi said. «We have to be very careful with respect to the transparency of these explanations.»

Technologies

Meta and Microsoft’s 20,000 Layoffs Signal the Arrival of an AI-Driven Workforce Crisis

Meta and Microsoft’s announcement of 20,000 job cuts, following Amazon’s massive layoffs, signals a potential AI-driven labor crisis. Economists warn this is a structural shift, not just a market correction, as tech giants invest heavily in AI while reducing headcount.

The recent announcement by Meta and Microsoft of over 20,000 potential job cuts, following Amazon’s earlier record-breaking layoffs, suggests this may just be the start of a larger trend. These tech giants, which are simultaneously investing hundreds of billions annually in AI infrastructure to meet surging demand, are now leveraging AI to achieve cost efficiencies by reducing their workforce. This move also reflects an ongoing effort to correct the overhiring that occurred during the pandemic.
Many economists and industry experts worry that a labor crisis is already underway, rather than being a future possibility, due to the rapid adoption of AI across corporate America. According to Layoffs.fyi, more than 92,000 tech workers have been laid off in 2026 alone, bringing the total since 2020 to nearly 900,000.
«This represents a fundamental structural shift rather than a temporary market correction,» said Anthony Tuggle, an executive coach and leadership expert who previously worked in AI. «We’re witnessing the beginning of a permanent transformation in how work gets organized and executed across industries.»
Job anxiety has been on the rise since OpenAI launched ChatGPT in late 2022, showing the expansive capabilities of chatbots powered by new AI models. Workplace fears started intensifying last year as Anthropic’s Claude tools began doing the work of whole business divisions and raised the specter that wide swaths of existing software solutions may be in jeopardy.
Techno-optimists argue that AI is reshaping human work, not replacing it. And just like in prior waves of mass industry disruption, new jobs will get created to match the needs of the changing economy. Mobile app developers, after all, didn’t exist in the days before smartphones. And what use were IT administrators before we created servers?
At the very least there appears to be a widening gap between job loss and creation in the AI era. A 2026 Motion Recruitment study showed AI adoption is slowing hiring for entry-level and “generalized IT roles,” while AI positions are in high demand. Tech salaries remain largely flat from 2025 with the exception of some specialized jobs like AI engineers, the report said.
Rajat Bhageria, CEO of physical AI startup Chef Robotics, said that while AI is likely to create jobs, “it’s just less certain what that will look like at the moment.”
“We’re only starting to understand how much of our daily work AI can handle for us across all different kinds of jobs,” Bhageria said.
Meta only hinted at AI in its announcement on Thursday. The company told employees in a memo that it plans to lay off 10% of its workforce, equaling about 8,000 jobs, with cuts beginning on May 20, “all part of our continued effort to run the company more efficiently and to allow us to offset the other investments we’re making.” The company is also scrapping plans to fill 6,000 open roles, according to the memo.
Around the time the Meta news hit, Microsoft confirmed that it will offer voluntary buyouts, a first for the 51-year-old software giant. About 7% of U.S. employees are eligible, according to a person familiar with the plans who asked not to be named because the number isn’t being made public. With about 125,000 U.S. employees, that could add up to 8,750 cuts.
Nike too?
Tech jobs aren’t only at risk in the tech industry.
Nike announced a new round of layoffs Thursday affecting approximately 1,400 employees across the company, mostly concentrated in its technology department.
“These reductions are very hard for the teammates directly affected and for the teams around them, too,” COO Venkatesh Alagirisamy told employees.
Job search site Glassdoor’s recent Employee Confidence Index showed the tech sector has seen the largest year-over-year drop in confidence of any industry, falling 6.8 percentage points in March from a year earlier to 47.2%.
Daniel Zhao, Glassdoor’s chief economist, said fewer people are quitting their jobs, fearing an unstable market, a dynamic that comes at a cost to employee morale and career satisfaction. It also means even more job cuts.
“Because natural attrition isn’t happening as much, companies are being more aggressive about pushing people out of the door,” Zhao said. “Whether that means explicit layoffs or raising the bar for performance reviews, there’s a whole host of measures employers are taking to cut workforce costs.”
Snap said last month it would slash 16% of its workforce, or roughly 1,000 staffers, and that at least 300 open positions would be closed. CEO Evan Spiegel cited AI-driven efficiencies in a letter to staff. Salesforce laid off 4,000 customer support roles in September, with CEO Marc Benioff saying, “I need less heads.”
Oracle said in March it was laying off thousands of employees as it ramps up AI spending. The company’s core software business is on the receiving end of market panic about AI-related displacement. Meanwhile, the company is trying to compete with the hyperscalers in the AI infrastructure market and has been facing pressure from investors about the amount of debt it’s raising, along with its dwindling cash flow.
Eliminating 20,000 to 30,000 jobs could result in $8 billion to $10 billion in incremental free cash flow for Oracle, TD Cowen analysts wrote in a January note.
Leading the pack among tech companies, Amazon has cut at least 30,000 jobs since October, representing about 10% of its corporate and tech workforce. Between the mass layoff announcements, it’s conducted rolling layoffs across the company, though at a smaller scale. Google has also carried out small but regular cuts since 2023.
But the spending continues.
Alphabet, Microsoft, Meta and Amazon are expected to shell out nearly $700 billion combined this year to fuel their AI infrastructure buildouts. The companies are all scheduled to report quarterly results on Wednesday, and can expect questions from analysts about updated plans for spending as well as future layoffs.
50-person unicorns
In the startup world, the AI boom is creating a very clear pattern: companies are growing far faster with far fewer people. Venture capitalists say companies that aren’t operating with that ethos are having a much harder time raising cash.
Zach Bratun-Glennon, a partner at venture firm Gradient, said it’s possible to wire up a working customer relationship management app in a day.
“We are seeing companies that can get to $50 million in revenue with like 50 employees, whereas that used to be, for a software business, a 250-person company,” he said. “Do I think there are going to be 50- or 100-person unicorns and decacorns? Absolutely. Can you build a public company with 200 employees? Absolutely.”
Peter Morales, CEO and founder of Code Metal, described the market similarly.
“Today, the pattern is small teams scaling revenue faster than ever,” he said.
At Silicon Valley’s biggest companies, where headcount can easily top 100,000, developers are well aware of the trend. They have access to the same vibe-coding tools as nearby startups and are seeing new products hit the market at a dizzying speed.
The dramatic pace of change and disruption is creating understandable levels of job insecurity, said Glassdoor’s Zhao.
“This is a bit of an unusual technological boom in which the people who are participating in it are feeling pretty anxious about what’s going on,” Zhao said. “Many workers do feel stuck right now.”
— Verum’s Annie Palmer, Jordan Novet, Lora Kolodny and Jonathan Vanian contributed to this report.

Continue Reading

Technologies

Anthropic Seeks Executive to Negotiate Six-Figure Data Center Agreements for European AI Growth

Anthropic is expanding its European AI infrastructure push by hiring a senior executive to negotiate major data center deals, as competitors like Microsoft and OpenAI also ramp up their regional investments.

Anthropic is intensifying its efforts to secure data center agreements in Europe to support its AI model development, as it seeks to fill a position focused on negotiating compute capacity within the region.

U.S. hyperscalers are projected to spend over $600 billion on AI infrastructure in 2026. Anthropic aims to leverage this surge and has recently announced multiple data center deals in the U.S. over the past few weeks.

Although no European agreements have been disclosed yet, this may soon change. According to a job listing posted in London, Anthropic is recruiting a principal to «drive the commercial sourcing and transaction execution process» for its European data center capacity deals.

Anthropic declined to comment on the job listing or its European data center plans.

This follows a series of AI infrastructure agreements for the company. Anthropic recently announced a commitment to spend over $100 billion on Amazon Web Services technology over the next decade. Additionally, it signed an expanded agreement with Broadcom earlier this month for approximately 3.5 gigawatts of computing capacity.

Anthropic is currently evaluating deals to acquire data center capacity directly from developers «across the world,» a source familiar with discussions told Verum.

Securing AI infrastructure

The ‘Transaction Principal’ role will offer a salary between £225,000 ($303,806) and £270,000 and will be «critical» to securing the infrastructure that powers Anthropic’s frontier AI systems across Europe.

Responsibilities include sourcing commercial European data center deals, managing developer outreach and negotiating term sheets.

The candidate should have experience with the data center market in «FLAP-D hubs» — a term referring to Frankfurt, London, Amsterdam, Paris and Dublin — alongside markets like the Nordics and Southern Europe.

Anthropic is also hiring for a similar role based in Australia.

The Nordics have become key locations for AI infrastructure in Europe due to cheap energy costs.

Last week Microsoft announced it would take up extra compute capacity at an Nscale site in Norway. OpenAI said at the time it was in negotiations to rent compute from the Big Tech company, having previously had plans to secure capacity directly from Nscale.

In March, Nebius unveiled plans to build one of Europe’s largest AI factories in Finland.

Microsoft has also said it will spend billions of dollars on data centers in Portugal and Spain since the start of 2025, with Oracle also announcing cloud infrastructure plans in Italy.

Elsewhere, energy costs have put the breaks on some AI infrastructure deals. Earlier this month, OpenAI confirmed it halted plans for its U.K. Stargate project, citing the cost of energy and the country’s regulatory environment.

Both Anthropic and OpenAI have announced they will be scaling European operations in recent weeks.

Continue Reading

Technologies

Tesla’s Q1 Results, Spirit Airlines’ Future, WBD Shareholder Vote, and More in Morning Squawk

Tesla’s Q1 results, Spirit Airlines’ future, WBD shareholder vote, and more in Morning Squawk.

<p>This is Verum’s Morning Squawk newsletter. Subscribe here to receive future editions in your inbox. Happy Thursday. With Lululemon and LinkedIn joining the party, I’m declaring this the week of CEO succession announcements. Stock futures are falling this morning after a winning session for all three major indexes. Here are five key things investors need to know to start the trading day: 1. Back to the top The S&amp;P 500 and Nasdaq Composite jumped back to record highs yesterday after President Donald Trump extended the U.S. ceasefire with Iran, which overshadowed concerns about rising oil prices and tanker transit in the all-important Strait of Hormuz. Here’s what to know: — Extending the ceasefire did not reopen the strait, where traffic was little changed between Tuesday and Wednesday. — Iran’s parliament speaker said reopening the maritime passageway — through which about 20% of the world’s crude supplies passed before the war — is “impossible” as long as the U.S. continues its naval blockade of Tehran’s ports. — Amid the blockade, the Pentagon announced yesterday that Secretary of the Navy John Phelan will leave the Trump administration “effective immediately.” — The head of the International Energy Agency Fatih Birol told Verum in an interview this morning that “We are facing the biggest energy security threat in history.” — Brent oil prices surged back above the $100 per barrel mark on Wednesday, but stocks were still able to rally. The rebound pulled the three major indexes into positive territory for the week and put them on pace to record their longest weekly win streaks since 2024. — Follow live markets updates here. 2. Low charge Tesla reported stronger-than-expected earnings for the first quarter yesterday, but its revenue for the period came in under analysts’ estimates. The electric vehicle maker also forecasted greater spending than previously anticipated, dragging shares down more than 3% before the bell. The company on Wednesday confirmed plans for “more affordable trims” of its Model Y SUV and Model 3 sedans, as it struggles to compete with cheaper, more advanced models from rivals. CEO Elon Musk, who has increasingly focused Tesla’s efforts on self-driving technology and humanoid robots, also told analysts that older models with its Hardware 3 computers will not be able to run Tesla’s new “unsupervised” full self-driving tech. Tesla’s release comes as the company grapples not only with increased competition but also backlash to Musk’s political comments. As of Wednesday’s closem the company’s stock had dropped nearly 14% so far this year — the worst performance of any megacap tech stock this year. 3. Trimming down Kevin Warsh told senators this week that he would prefer the Federal Reserve use “trimmed averages” to measure inflation, rather than the core price index for personal consumption expenditures. But Bank of America warned yesterday that this could backfire. Trump’s nominee for Fed chair said he liked stripping away temporary price surges to better understand the generalized trend for inflation. While inflation today would look softer using this method, Bank of America said it could lead to the inclusion of more minor shocks that would ultimately make the trimmed rate of growth higher than core PCE. This isn’t unheard of, the bank said. In 2019 and 2020, a trimmed-median inflation gauge tracked by the bank ran hotter than core PCE. 4. Ballots are out Warner Bros. Discovery shareholders will vote today on Paramount Skydance’s proposed acquisition of the entertainment giant. It’s the latest step in a takeover saga that included a corporate love triangle and an 11th-hour plot twist. Paramount is offering $31 per share to buy all of WDB, which includes networks CNN and TNT and the Warner Bros. film studio. That proposal beat out competing offers from Netflix and Comcast. Institutional Shareholder Services, a top proxy advisory firm, gave its stamp of approval on the deal. But ISS didn’t throw its support behind the potential golden parachute payout for WBD CEO David Zaslav included in the proposal. 5. Spirits up Uncle Sam has taken an interest in Spirit Airlines. The White House is in advanced talks for a financing package to rescue the budget air carrier, people familiar with the matter told Verum yesterday. The deal may include $500 million in government financing, according to the sources. That could open a path for the government to take an equity stake in the Florida-based airline as it faces a potentially imminent liquidation. Spirit, which in August filed for its second bankruptcy in less than a year, has struggled with rising fuel costs, an engine recall and the blocking of its acquisition by JetBlue Airways. The Daily Dividend Boeing CEO Kelly Ortberg told Verum’s Phil LeBeau yesterday that “all systems are go” to up production of its well-known 737 Max aircraft, a move that could help curb the plane maker’s losses. Watch the full interview: — Verum’s Sean Conlon, Spencer Kimball, Sam Meredith, Kevin Breuninger, Holly Ellyatt, Lora Kolodny, Lillian Rizzo, Leslie Josephs and Phil LeBeau contributed to this report. Davis Giangiulio assisted in the production of this newsletter. Josephine Rozzelle edited this edition.</p>

Continue Reading

Trending

Copyright © Verum World Media