Technologies

OpenAI Yanked a ChatGPT Update. Here’s What It Said and Why It Matters

The company says it plans to be more careful when releasing updates in the future.

Recent updates to ChatGPT made the chatbot far too agreeable and OpenAI said Friday it’s taking steps to prevent the issue from happening again.

In a blog post, the company detailed its testing and evaluation process for new models and outlined how the problem with the April 25 update to its GPT-4o model came to be. Essentially, a bunch of changes that individually seemed helpful combined to create a tool that was far too sycophantic and potentially harmful.

How much of a suck-up was it? In some testing earlier this week, we asked about a tendency to be overly sentimental, and ChatGPT laid on the flattery: «Hey, listen up — being sentimental isn’t a weakness; it’s one of your superpowers.» And it was just getting started being fulsome.

«This launch taught us a number of lessons. Even with what we thought were all the right ingredients in place (A/B tests, offline evals, expert reviews), we still missed this important issue,» the company said.

OpenAI rolled back the update this week. To avoid causing new issues, it took about 24 hours to revert the model for everybody.

The concern around sycophancy isn’t just about the enjoyment level of the user experience. It posed a health and safety threat to users that OpenAI’s existing safety checks missed. Any AI model can give questionable advice about topics like mental health but one that is overly flattering can be dangerously deferential or convincing — like whether that investment is a sure thing or how thin you should seek to be.

«One of the biggest lessons is fully recognizing how people have started to use ChatGPT for deeply personal advice — something we didn’t see as much even a year ago,» OpenAI said. «At the time, this wasn’t a primary focus but as AI and society have co-evolved, it’s become clear that we need to treat this use case with great care.»

Sycophantic large language models can reinforce biases and harden beliefs, whether they’re about yourself or others, said Maarten Sap, assistant professor of computer science at Carnegie Mellon University. «[The LLM] can end up emboldening their opinions if these opinions are harmful or if they want to take actions that are harmful to themselves or others.»

(Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed on Ziff Davis copyrights in training and operating its AI systems.)

How OpenAI tests models and what’s changing

The company offered some insight into how it tests its models and updates. This was the fifth major update to GPT-4o focused on personality and helpfulness. The changes involved new post-training work or fine-tuning on the existing models, including the rating and evaluation of various responses to prompts to make it more likely to produce those responses that rated more highly.

Prospective model updates are evaluated on their usefulness across a variety of situations, like coding and math, along with specific tests by experts to experience how it behaves in practice. The company also runs safety evaluations to see how it responds to safety, health and other potentially dangerous queries. Finally, OpenAI runs A/B tests with a small number of users to see how it performs in the real world.

The April 25 update performed well in these tests, but some expert testers indicated the personality seemed a bit off. The tests didn’t specifically look at sycophancy, and OpenAI decided to move forward despite the issues raised by testers. Take note, readers: AI companies are in a tail-on-fire hurry, which doesn’t always square well with well thought-out product development.

«Looking back, the qualitative assessments were hinting at something important and we should’ve paid closer attention,» the company said.

Among its takeaways, OpenAI said it needs to treat model behavior issues the same as it would other safety issues — and halt a launch if there are concerns. For some model releases, the company said it would have an opt-in «alpha» phase to get more feedback from users before a broader launch.

Sap said evaluating an LLM based on whether a user likes the response isn’t necessarily going to get you the most honest chatbot. In a recent study, Sap and others found a conflict between the usefulness and truthfulness of a chatbot. He compared it to situations where the truth is not necessarily what people want — think about a car salesperson trying to sell a vehicle.

«The issue here is that they were trusting the users’ thumbs-up/thumbs-down response to the model’s outputs and that has some limitations because people are likely to upvote something that is more sycophantic than others,» he said.

Sap said OpenAI is right to be more critical of quantitative feedback, such as user up/down responses, as they can reinforce biases.

The issue also highlighted the speed at which companies push updates and changes out to existing users, Sap said — an issue that’s not limited to one tech company. «The tech industry has really taken a ‘release it and every user is a beta tester’ approach to things,» he said. Having a process with more testing before updates are pushed to every user can bring these issues to light before they become widespread.

Technologies

Google races to put Gemini at the center of Android before Apple’s AI reboot

Google is using its latest Android rollout to position Gemini as the AI layer across phones, Chrome, laptops and cars.

Google is using its latest Android rollout to make Gemini less of a chatbot and more of an operating layer across the phone, browser, car and laptop, just weeks before Apple is expected to show its own Gemini-powered Apple Intelligence reboot at WWDC.
Ahead of its Google I/O developer conference next week, the company previewed a number of Android updates, including AI-powered app automation, a smarter version of Chrome on Android, new tools for creators, a redesigned Android Auto experience, and a sweeping set of new security features.
Alphabet is counting on Gemini to help Google compete directly with OpenAI and Anthropic in the market for artificial intelligence models and services, while also serving as the AI backbone across its expansive portfolio of products, including Android. Meanwhile, Gemini is powering part of Apple’s new AI strategy, giving Google a role in the iPhone maker’s reset even as it races to prove its own version of personal AI on the phone is further along.
Sameer Samat, who oversees Google’s Android ecosystem, told CNBC that Google is rebuilding parts of Android around Gemini Intelligence to help users complete everyday tasks more easily.
“We’re transitioning from an operating system to an intelligence system,” he said.
As part of Tuesday’s announcements. Google said Gemini Intelligence will be able to move across apps, understand what’s on the screen and complete tasks that would normally require a user to jump between multiple services. That means Android is moving beyond the traditional assistant model, where users ask a question and get an answer, and acting more like an agent.
For instance, Google says Gemini can pull relevant information from Gmail, build shopping carts and book reservations. Samat gave the example of asking Gemini to look at the guest list for a barbecue, build a menu, add ingredients to an Instacart list and return for approval before checkout.
A big concern surrounding agentic AI involves software taking action on a user’s behalf without permissions. Samat said Gemini will come back to the user before completing a transaction, adding, “the human is always in the loop.”
Four months after announcing its Gemini deal with Google, Apple is under pressure to show a more capable version of Apple Intelligence, which has been a relative laggard on the market. Apple has long framed privacy, hardware integration and control of the user experience as its advantages.
Google’s Android push is designed to show it can bring AI deeper into the device experience while still giving users control over what Gemini can see, where it can act and when it needs confirmation.
The app automation features will roll out in waves, starting with the latest Samsung Galaxy and Google Pixel phones this summer, before expanding across more Android devices, including watches, cars, glasses and laptops later this year.
The company is also redesigning Android Auto around Gemini, turning the car into another major surface for its assistant. Android Auto is in more than 250 million cars, and Google says the new release includes its biggest maps update in a decade and Gemini-powered help with tasks like ordering dinner while driving.
Alphabet’s AI strategy has been embraced by Wall Street, which has pushed the company’s stock price up more than 140% in the past year, compared to Apple’s roughly 40% gain. Investors now want to see how Gemini can become more central to the products people use every day.
WATCH: Alphabet briefly tops Nvidia after report of $200 billion Anthropic cloud deal

Technologies

Waymo recalls 3,800 robotaxis after glitch allowed some vehicles to ‘drive into standing water’

Waymo issued a voluntary recall of about 3,800 of its robotaxis to fix software issues that could allow them to drive into flooded roadways.

Waymo is recalling about 3,800 robotaxis in the U.S. to fix software issues that could allow them to “drive onto a flooded roadway,” according to a letter on the National Highway Traffic Safety Administration’s website.
The voluntary recall is for Waymo vehicles that use the company’s fifth and sixth generation automated driving systems (or ADS), the U.S. auto safety regulator said in the letter posted Tuesday.
Waymo autonomous vehicles in Austin, Texas, were seen on camera driving onto a flooded street and stalling, requiring other drivers to navigate around them. It’s the latest example of a safety-related issue for the Alphabet-owned AV unit that’s rapidly bolstering its fleet of vehicles and entering new U.S. markets.
Waymo has drawn criticism for its vehicles failing to yield to school buses in Austin, and for the performance of its vehicles during widespread power outages in San Francisco in December, when robotaxis halted in traffic, causing gridlock.
The company said in a statement on Tuesday that it’s “identified an area of improvement regarding untraversable flooded lanes specific to higher-speed roadways,” and opted to file a “voluntary software recall” with the NHTSA.
“Waymo provides over half a million trips every week in some of the most challenging driving environments across the U.S., and safety is our primary priority,” the company said.
Waymo added that it’s working on “additional software safeguards” and has put “mitigations” in place, limiting where its robotaxis operate during extreme weather, so that they avoid “areas where flash flooding might occur” in periods of intense rain.
WATCH: Waymo launches new autonomous system in Chinese-made vehicle

Technologies

Qualcomm tumbles 13% as semiconductor stocks retreat from historic AI-fueled surge

Semiconductor equities reversed sharply after a broad AI-driven advance, with Qualcomm suffering its worst day since 2020 amid inflation concerns and rising oil prices.

Semiconductor stocks fell sharply on Tuesday, reversing course after an extensive rally that had expanded the artificial intelligence investment theme well past Nvidia and driven the industry to unprecedented levels.

Qualcomm plunged 13% and was on track for its steepest single-day decline since 2020. Intel shed 8%, while On Semiconductor and Skyworks Solutions each lost more than 6%. The iShares Semiconductor ETF, which benchmarks the overall sector, fell 5%.

The sell-off came after a key gauge of consumer prices came in above forecasts, and as conflict in Iran pushed crude oil higher—prompting investors to shift away from riskier assets.

The preceding advance had widened the AI opportunity set beyond longtime industry leader Nvidia, which for much of the past several years had largely carried the market to new peaks on its own.

Explosive appetite for central processing units, along with the graphics processing units that power large language models, has sent chipmakers to all-time highs.

Market participants are wagering that the shift from AI model training to autonomous agents will lift demand for additional AI hardware. Among the beneficiaries are memory chip producers, which are raising prices as supply remains tight.

Micron Technology slid 6%, and Sandisk cratered 8%. Sandisk’s stock has surged more than six times over since January.