Anthropic releases Claude 3.5 Sonnet. 3 things to know about

Anthropic has a new generative AI model to rival OpenAI’s GPT-4o with intelligence, speed, and vision capabilities.

On Thursday, the AI company that touts itself as the ethical and responsible alternative to OpenAI, announced Claude 3.5 Sonnet. Within Anthropic’s family of models, Claude Sonnet is the middle child that combines speed and performance for most everyday tasks. By comparison, Claude Haiku is the lightest and fastest model, and Claude Opus is the industrial-strength model for complex math and coding tasks.

Claude 3.5 Sonnet is a more advanced version of Claude 3 Sonnet, which the company claims surpasses Claude 3 Opus in intelligence. In the announcement

Claude 3.5 Sonnet (marginally) beats GPT-4o on several benchmarks

The benchmark comparison has become commonplace for every new AI model release. Whether it’s Google Gemini, OpenAI’s GPT-4o, or Meta’s Llama 3, what the public really wants to know is how they compare to their rivals on the standard evaluation tests.

Tweet may have been deleted

In Anthropic’s testing, Claude 3.5 Sonnet outperforms GPT-4o, Gemini 1.5 Pro and Llama in several key categories like reasoning and coding. It also beat GPT-4o in graduate level reasoning and equaled it in undergraduate level knowledge. That’s not nothing, but Claude 3.5 Sonnet only beats its rivals on most benchmarks by a few percentage points. So to the average user, there might not be a noticeable difference for handling everyday tasks.

As leading AI scientist and professor Gary Marcus notes, the computational gains have slowed lately. “The field spent over $50B last year trying to decisively beat GPT-4, but so far what [I] see evidence for is convergence, rather than continued exponential growth.” Besides the fact that AGI might not be as close as we think, Claude 3.5 Sonnet will probably seem pretty similar to other advanced models out there.

Tweet may have been deleted

Claude 3.5 Sonnet has vision capabilities with varying degrees of access

Claude 3.5 Sonnet is Anthropic’s first free version to have vision capabilities. Like its competitor GPT-4o, which came out in May, Anthropic’s latest model can interpret charts and graphs, transcribe text from images, and generally understand visuals and images. A demo in the announcement shows Claude 3.5 Sonnet transcribing data from genome sequencing milestones and a graph of costs over time, and then combining the data into one chart. Next, it puts together a slideshow presentation for a genomics class.

Anthropic says it includes vision capabilities as a feature for the free version of Claude 3.5 Sonnet. But the free version has a window limit that depends on daily usage and capacity. When we tried uploading a screenshot of an image on Facebook, we were told that the limit was exceeded even though it was below the file size maximum. This could be a bug or due to high demand during certain times of day. But just like ChatGPT, 20 bucks a month will get you the Pro version with priority bandwidth and availability.

Claude 3.5 Sonnet doesn’t generate images

Claude 3.5 Sonnet can understand and interpret uploaded images (more successfully if you’re paying for the Pro version) but it can’t generate images. Unlike OpenAI’s DALL-E 3, Anthropic doesn’t currently have an AI image generator. This might be because of Anthropic’s more cautious approach to deploying generative AI. And AI-generated images take companies into a particularly risky realm when it comes to misuse of the technology.

“Detecting and mitigating prohibited uses of our technology are essential to preventing bad actors from misusing our models to generate abusive, deceptive, or misleading content,” said Anthropic describing its approach in the white paper announcing the Claude 3 model family. “User prompts that are flagged as violating the [Acceptable Use Policy] trigger an instruction to our models to respond even more cautiously.”

Despite this drawback, users are praising the model for its speed and coding abilities. So there’s still enough wow factor to go around.

Tweet may have been deleted