Technologies

OpenAI Yanked a ChatGPT Update. Here’s What It Said and Why It Matters

The company says it plans to be more careful when releasing updates in the future.

Recent updates to ChatGPT made the chatbot far too agreeable and OpenAI said Friday it’s taking steps to prevent the issue from happening again.

In a blog post, the company detailed its testing and evaluation process for new models and outlined how the problem with the April 25 update to its GPT-4o model came to be. Essentially, a bunch of changes that individually seemed helpful combined to create a tool that was far too sycophantic and potentially harmful.

How much of a suck-up was it? In some testing earlier this week, we asked about a tendency to be overly sentimental, and ChatGPT laid on the flattery: «Hey, listen up — being sentimental isn’t a weakness; it’s one of your superpowers.» And it was just getting started being fulsome.

«This launch taught us a number of lessons. Even with what we thought were all the right ingredients in place (A/B tests, offline evals, expert reviews), we still missed this important issue,» the company said.

OpenAI rolled back the update this week. To avoid causing new issues, it took about 24 hours to revert the model for everybody.

The concern around sycophancy isn’t just about the enjoyment level of the user experience. It posed a health and safety threat to users that OpenAI’s existing safety checks missed. Any AI model can give questionable advice about topics like mental health but one that is overly flattering can be dangerously deferential or convincing — like whether that investment is a sure thing or how thin you should seek to be.

«One of the biggest lessons is fully recognizing how people have started to use ChatGPT for deeply personal advice — something we didn’t see as much even a year ago,» OpenAI said. «At the time, this wasn’t a primary focus but as AI and society have co-evolved, it’s become clear that we need to treat this use case with great care.»

Sycophantic large language models can reinforce biases and harden beliefs, whether they’re about yourself or others, said Maarten Sap, assistant professor of computer science at Carnegie Mellon University. «[The LLM] can end up emboldening their opinions if these opinions are harmful or if they want to take actions that are harmful to themselves or others.»

(Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed on Ziff Davis copyrights in training and operating its AI systems.)

How OpenAI tests models and what’s changing

The company offered some insight into how it tests its models and updates. This was the fifth major update to GPT-4o focused on personality and helpfulness. The changes involved new post-training work or fine-tuning on the existing models, including the rating and evaluation of various responses to prompts to make it more likely to produce those responses that rated more highly.

Prospective model updates are evaluated on their usefulness across a variety of situations, like coding and math, along with specific tests by experts to experience how it behaves in practice. The company also runs safety evaluations to see how it responds to safety, health and other potentially dangerous queries. Finally, OpenAI runs A/B tests with a small number of users to see how it performs in the real world.

The April 25 update performed well in these tests, but some expert testers indicated the personality seemed a bit off. The tests didn’t specifically look at sycophancy, and OpenAI decided to move forward despite the issues raised by testers. Take note, readers: AI companies are in a tail-on-fire hurry, which doesn’t always square well with well thought-out product development.

«Looking back, the qualitative assessments were hinting at something important and we should’ve paid closer attention,» the company said.

Among its takeaways, OpenAI said it needs to treat model behavior issues the same as it would other safety issues — and halt a launch if there are concerns. For some model releases, the company said it would have an opt-in «alpha» phase to get more feedback from users before a broader launch.

Sap said evaluating an LLM based on whether a user likes the response isn’t necessarily going to get you the most honest chatbot. In a recent study, Sap and others found a conflict between the usefulness and truthfulness of a chatbot. He compared it to situations where the truth is not necessarily what people want — think about a car salesperson trying to sell a vehicle.

«The issue here is that they were trusting the users’ thumbs-up/thumbs-down response to the model’s outputs and that has some limitations because people are likely to upvote something that is more sycophantic than others,» he said.

Sap said OpenAI is right to be more critical of quantitative feedback, such as user up/down responses, as they can reinforce biases.

The issue also highlighted the speed at which companies push updates and changes out to existing users, Sap said — an issue that’s not limited to one tech company. «The tech industry has really taken a ‘release it and every user is a beta tester’ approach to things,» he said. Having a process with more testing before updates are pushed to every user can bring these issues to light before they become widespread.

Technologies

Amazon Unveils AI-Using Warehouse Robot With Human-Like Sense of Touch

Amazon’s new Vulcan robot uses physical AI to carefully stow and pick everything from socks to fragile electronics at fulfillment centers.

Amazon’s new Vulcan fulfillment center robot doesn’t look humanoid, but it has some very human characteristics, like the ability to «feel» the items it’s handling.

Amazon introduced Vulcan at its Delivering the Future event in Germany on May 7.

«Built on key advances in robotics, engineering, and physical AI, Vulcan is our first robot with a sense of touch,» the company said in a statement. The event is a showcase for Amazon’s technology innovations.

Vulcan can stow or pick items from the fabric-covered pods Amazon uses for inventory storage. It has a human–like finesse when handling objects. Force feedback sensors help the robot avoid damaging the merchandise.

A suction cup and camera system comes into play when Vulcan is pulling items out of bins.

«While the suction cup grabs it, the camera watches to make sure it took the right thing and only the right thing, avoiding what our engineers call the risk of ‘co-extracting non-target items,'» Amazon said.

Vulcan is in place at fulfillment centers in Spokane, Wash. and Hamburg, Germany. It’s primarily tasked with reaching items stored low that require a human to bend down, or items stored up high that require an employee to use a stepladder.

The rise of robots in traditionally human-powered workplaces can be a sensitive subject. Amazon makes it clear it sees Vulcan as an assistant to its employees rather than a replacement for them.

Vulcan can handle 75% of the types of items stocked at the fulfillment centers. It’s designed to know which ones it can move and which ones it needs to ask for human help for — like a robot-human tag team.

The robot uses a physical AI system that includes «algorithms for identifying which items Vulcan can or can’t handle, finding space within bins, identifying tubes of toothpaste and boxes of paper clips and much more.» The AI was trained on everything from socks to electronics and continues to learn as the robot works.

Humans and robots can effectively coexist in distribution centers, said logistics and operations researchers Rene de Koster of Erasmus University in the Netherlands and Debjit Roy of the Indian Institute of Management Ahmedabad.

«Right now, at least, distribution center automation with people in the mix is often a more efficient, flexible and cost-effective bet than a completely automated center,» the team said last year in a summary of their research for the Harvard Business Review.

Robots have long been part of Amazon’s operations with over 750,000 robots deployed in its fulfillment centers, the company said.

Vulcan will roll out to more centers in Europe and the US over the next couple of years, increasing the chances of your future Amazon shipments having Vulcan’s unseen «fingerprints» on them.

Technologies

Why the Fed’s Interest Rate Pause Could Bring Mortgage Rate Volatility

Technologies

Xbox Handheld Console Seemingly Glimpsed in New Asus Leak

Rumors of a handheld gaming device made by Asus in collaboration with Xbox got a shot in the arm after an alleged prototype surfaced in leaked photos.

Remember those rumors about an Xbox-branded handheld gaming machine? While nothing’s official yet, things are looking a bit more concrete after a big new leak from the FCC.

On Wednesday, images surfaced online from the FCC certification of unannounced new handhelds supposedly on the way from Asus, specifically the successors to its ROG Ally handheld PC, as reported earlier by Engadget. Microsoft’s plans for an Xbox handheld were previously speculated to involve partnering with another company, and now it appears that the ROG Ally 2 could boast an Xbox-branded model, with some different hardware under the hood.

Originally launched in 2023, the Ally is a handheld gaming machine running Windows that allows PC games to be played on the go. It’s emerged as one of the main competitors to Valve’s Steam Deck, which kickstarted a new wave of interest in handheld PCs.

Based on the images circulating online, the Ally 2 appears to be a bit thicker than its predecessor, with grips on the side of the unit redesigned to more closely resemble traditional controller handles. Not much appears different with the Xbox model, aside from a branded Xbox button on the top left.

According to the leaked FCC filings, the Xbox version would run on an AMD 8-Core 36W Ryzen Z2 Extreme processor and 64GB or LPDDR5X memory, while the standard edition will boast an AMD 4-core 20W AMD Aeirth Plus chip with an unspecified amount of memory. Both models at this time feature 7-inch 120hz screens.

Aside from those hardware differences, the Xbox edition of the Ally 2 is expected to be differentiated by a greater integration with features like the Xbox Game Bar and services like Game Pass. As a Windows PC, the ROG Ally is already compatible with Game Pass for PC, so it remains to be seen what a deeper integration with the service will look like.

Xbox and Asus did not respond to requests for comment before publishing.