Advanced AI chatbots are less likely to admit they don’t have all the answers

Claudio Ctin5 hours ago5 hours ago5 mins

Researchers have spotted an apparent downside of smarter chatbots. Although AI models predictably become more accurate as they advance, they’re also more likely to (wrongly) answer questions beyond their capabilities rather than saying, “I don’t know.” And the humans prompting them are more likely to take their confident hallucinations at face value, creating a trickle-down effect of confident misinformation.

“They are answering almost everything these days,” José Hernández-Orallo, professor at the Universitat Politecnica de Valencia, Spain, told Nature. “And that means more correct, but also more incorrect.” Hernández-Orallo, the project lead, worked on the study with his colleagues at the Valencian Research Institute for Artificial Intelligence in Spain.

The team studied three LLM families, including OpenAI’s GPT series, Meta’s LLaMA and the open-source BLOOM. They tested early versions of each model and moved to larger, more advanced ones — but not today’s most advanced. For example, the team began with OpenAI’s relatively primitive GPT-3 ada model and tested iterations leading up to GPT-4, which arrived in March 2023. The four-month-old GPT-4o wasn’t included in the study, nor was the newer o1-preview. I’d be curious if the trend still holds with the latest models.

The researchers tested each model on thousands of questions about “arithmetic, anagrams, geography and science.” They also quizzed the AI models on their ability to transform information, such as alphabetizing a list. The team ranked their prompts by perceived difficulty.

The data showed that the chatbots’ portion of wrong answers (instead of avoiding questions altogether) rose as the models grew. So, the AI is a bit like a professor who, as he masters more subjects, increasingly believes he has the golden answers on all of them.

Further complicating things is the humans prompting the chatbots and reading their answers. The researchers tasked volunteers with rating the accuracy of the AI bots’ answers, and they found that they “incorrectly classified inaccurate answers as being accurate surprisingly often.” The range of wrong answers falsely perceived as right by the volunteers typically fell between 10 and 40 percent.

“Humans are not able to supervise these models,” concluded Hernández-Orallo.

The research team recommends AI developers begin boosting performance for easy questions and programming the chatbots to refuse to answer complex questions. “We need humans to understand: ‘I can use it in this area, and I shouldn’t use it in that area,’” Hernández-Orallo told Nature.

It’s a well-intended suggestion that could make sense in an ideal world. But fat chance AI companies oblige. Chatbots that more often say “I don’t know” would likely be perceived as less advanced or valuable, leading to less use — and less money for the companies making and selling them. So, instead, we get fine-print warnings that “ChatGPT can make mistakes” and “Gemini may display inaccurate info.”

That leaves it up to us to avoid believing and spreading hallucinated misinformation that could hurt ourselves or others. For accuracy, fact-check your damn chatbot’s answers, for crying out loud.

You can read the team’s full study in Nature.

This article originally appeared on Engadget at https://www.engadget.com/ai/advanced-ai-chatbots-are-less-likely-to-admit-they-dont-have-all-the-answers-172012958.html?src=rss

Please follow and like us:

Stiri similare

Zillow is adding climate risk data to all US for-sale listings

Claudio Ctin42 mins ago10 mins ago

As extreme weather events become ever more common, climate risks are playing a role in many people’s long-term decision-making. And few things are more long-term than buying real estate. In response, Zillow has announced a new partnership to bring climate risk information to its for-sale listings. Property listing pages in the US will include data…

X suspends journalist Ken Klippenstein after he published J.D. Vance dossier

Claudio Ctin1 hour ago9 mins ago

X suspended journalist Ken Klippenstein’s account earlier this afternoon. X’s Safety account says they issued the temporary suspension “for violating our rules on posting unredacted private personal information, specifically Sen. [J.D.] Vance’s physical address and the majority of his social security number.” Several news outlets that received the vetting dossier of the Republican vice presidential…

See Hurricane Helene landfall live on these Florida beach cams

Claudio Ctin1 hour ago9 mins ago

Helene is, as of this writing on Thursday afternoon, a “dangerous major hurricane,” and conditions are expected to rapidly worsen in the next several hours as landfall approaches. Tweet may have been deleted The west coast of the Florida peninsula saw storm surge and rain all day, but what was visible Thursday may be deceptive,…

‘Ballerina’ trailer: Ana de Armas unleashes her inner John Wick

Claudio Ctin2 hours ago1 hour ago

“John Wick” spin-off “Ballerina” hits theaters June 6. Please follow and like us:

Child ‘content creators’ granted protections in California by Gov. Newsom

Claudio Ctin2 hours ago1 hour ago

California has taken a huge step in protecting children placed in the online spotlight, passing two new pieces of legislation providing financial safety nets for minors starring in digital content. Gov. Gavin Newsom was joined at the bill signing by singer and former child star Demi Lovato, who recently made headlines for her appearance in…

New California law will force companies to admit you don’t own digital content

Claudio Ctin2 hours ago1 hour ago

California Governor Gavin Newsom has signed AB 2426, a new law that requires digital marketplaces to make clearer to customers when they are only purchasing a license to access media. The law will not apply to cases of permanent offline downloads, only to the all-too-common situation of buying digital copies of video games, music, movies,…

Zillow will now show climate risks for property listings in the US

Claudio Ctin2 hours ago2 hours ago

Zillow will now display climate risks and make insurance recommendations for listings in the US. | Image: Zillow Zillow has announced that its real estate property listings in the US will soon feature details about climate risks, including the potential for wildfires, flooding, extreme temperatures, high winds, and poor air quality. Buyers will also see…

The Final Fantasy Pixel Remaster series finally arrives on Xbox

Claudio Ctin3 hours ago2 hours ago

Square Enix’s terrific Final Fantasy Pixel Remaster series has finally made its way to Xbox. The 1980s and ’90s classics, which arrived on PC and mobile starting in 2021 and Switch and PS4 last year, are now available on Xbox Series X/S. The Xbox Store sells the six-game series in a $75 bundle ($60 for…

Sony’s Horizon Zero Dawn remaster may cost $20 more than we thought

Claudio Ctin3 hours ago2 hours ago

Horizon Zero Dawn Remastered. | Image: Sony If you thought you’d buy a new copy of Horizon Zero Dawn on Sony’s digital storefront for a smooth $20 and just pay an extra $10 for the new remastered version when it arrives on October 31st, think again. Sony has quietly doubled the price of Horizon Zero…

Google’s NotebookLM can help you dive deeper into YouTube videos

Claudio Ctin3 hours ago2 hours ago

Illustration by Alex Castro / The Verge NotebookLM, Google’s AI note-taking app, can now summarize and help you dig deeper into YouTube videos. The new capability works by analyzing the text in a YouTube video’s transcript, including autogenerated ones. Once you add a YouTube link to NotebookLM, it will use AI to provide a brief…

X blocks links to hacked JD Vance dossier

Claudio Ctin3 hours ago2 hours ago

Illustration by Kristen Radtke / The Verge; Getty Images X is preventing users from posting links to a newsletter containing a hacked document that’s alleged to be the Trump campaign’s research into vice presidential candidate JD Vance. The journalist who wrote the newsletter, Ken Klippenstein, has been suspended from the platform. Searches for posts containing…

Ford’s BlueCruise 1.4 update lets you keep your hands off the wheel much longer

Claudio Ctin3 hours ago3 hours ago

BlueCruise asking the driver to put their hands on the wheel as rain falls. | Image: Umar Shakir / The Verge Ford is releasing a new version of its hands-free driving BlueCruise software, version 1.4, which it claims will let you keep your hands off the wheel twice as long. In fact, the company tells…

FCC fines political consultant $6 million for deepfake robocalls

Claudio Ctin4 hours ago3 hours ago

The Federal Communications Commission (FCC) has officially issued its full recommended fine against political consultant Steve Kramer for a series of illegal robocalls using deepfake AI technology and caller ID spoofing during the New Hampshire primaries. Kramer must pay $6 million in fines in the next 30 days or the Department of Justice will handle…

‘The Last of Us’ Season 2 teaser is here to bring you to tears

Claudio Ctin4 hours ago3 hours ago

“The Last of Us” Season 2, starring Pedro Pascal and Bella Ramsey, hits HBO and Max in 2025. Please follow and like us:

Nvidia’s RTX 5090 will reportedly include 32GB of VRAM and hefty power requirements

Claudio Ctin4 hours ago3 hours ago

Photo by Tom Warren / The Verge Nvidia is reportedly planning to ship its upcoming GeForce RTX 5090 graphics card with 32GB of GDDR7 memory. Hardware leaker Kopite7kimi has published rumored specifications for the RTX 5090 and RTX 5080 today, and both cards will reportedly be more power-hungry as Nvidia looks to debut more capable…

Twitch’s BibleThump will soon go to emote heaven

Claudio Ctin4 hours ago4 hours ago

Image: Edmund McMillen; The Verge Pretty soon, Twitch users will no longer be able to express their sadness with the BibleThump emote. According to Twitch, on September 30th, its rights to display the popular crying pink blob will expire after over a decade of being one of the foundational Twitch emotes along with Kappa, FrankerZ,…

An out-of-warranty battery almost left this paralyzed man’s exoskeleton useless

Claudio Ctin5 hours ago4 hours ago

Image: Michael Straight via Facebook Michael Straight, a former jockey paralyzed from the waist down, was left unable to walk for two months after the company behind his $100,000 exoskeleton refused to fix a battery issue, as reported earlier by the Paulick Report and 404 Media. “I called [the company] thinking it was no big…

A deepfake caller pretending to be a Ukrainian official almost tricked a US senator

Claudio Ctin5 hours ago4 hours ago

Cath Virginia / The Verge | Photos from Getty Images The head of the Senate Foreign Relations Committee took a Zoom call with someone using deepfake technology to pose as a top Ukrainian official, The New York Times reports. Sen. Ben Cardin (D-MD) received an email last Thursday that appeared to be from Dmytro Kuleba,…

Google’s new Nest Learning Thermostat is discounted for the first time

Claudio Ctin5 hours ago4 hours ago

Photo by Owen Grove / The Verge The recently launched Google Nest Learning Thermostat has its first notable discount, as Wellbots is offering it in all three color options for $259.99 ($20 off) with checkout code NLT4VERGE. Google took nearly nine years to replace the last model with this sleeker, Pixel Watch-looking design, but you…

Google Maps is cracking down on fake business reviews

Claudio Ctin5 hours ago4 hours ago

Businesses are trying to game Google Maps with fake reviews and Google has had enough. Google has started restricting profiles of businesses that are found to have hosted fake reviews. On its support website, Google laid out what exactly can happen to such businesses. Some of the possible punishments include, but are apparently not limited…

Volvo’s head of sustainability on why the brand tweaked its ‘EV or bust’ strategy

Claudio Ctin5 hours ago4 hours ago

Image: Volvo Earlier this month, Volvo became the latest automaker to announce that it was delaying its plans to sell only electric vehicles. The decision was a reflection of the stark reality of the market: the demand was just not there. “We reduced the ambitions we had set to go 100 percent electric by 2030,”…

John Wick: Ballerina passes the torch to a tiny dancer in first trailer

Claudio Ctin5 hours ago5 hours ago

Courtesy of Lionsgate The first trailer for director Len Wiseman’s awkwardly named From the World of John Wick: Ballerina spinoff film is here, and while it doesn’t feature a premiere date, it does have an unhinged spin on Elton John’s “Tiny Dancer” that honestly kinda works. Set some time between John Wick: Chapter 3 –…