Will Smith consuming spaghetti and different bizarre AI benchmarks that took off in 2024

When an organization releases a brand new AI video generator, it’s not lengthy earlier than somebody makes use of it to make a video of actor Will Smith consuming spaghetti.

It’s turn into one thing of a meme in addition to a benchmark: Seeing whether or not a brand new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself parodied the development in an Instagram submit in February.

Google Veo 2 has finished it.

We at the moment are consuming spaghett finally. pic.twitter.com/AZO81w8JC0

— Jerrod Lew (@jerrod_lew) December 17, 2024

Will Smith and pasta is however one among a number of weird “unofficial” benchmarks to take the AI group by storm in 2024. A 16-year-old developer constructed an app that provides AI management over Minecraft and assessments its capacity to design constructions. Elsewhere, a British programmer created a platform the place AI performs video games like Pictionary and Join 4 towards one another.

It’s not like there aren’t extra educational assessments of an AI’s efficiency. So why did the weirder ones blow up?

Picture Credit:Paul Calcraft

For one, most of the industry-standard AI benchmarks don’t inform the common particular person very a lot. Firms usually cite their AI’s capacity to reply questions on Math Olympiad exams, or work out believable options to Ph.D.-level issues. But most individuals — yours actually included — use chatbots for issues like responding to emails and fundamental analysis.

Crowdsourced {industry} measures aren’t essentially higher or extra informative.

Take, for instance, Chatbot Enviornment, a public benchmark many AI lovers and builders observe obsessively. Chatbot Enviornment lets anybody on the internet charge how properly AI performs on explicit duties, like creating an internet app or producing a picture. However raters have a tendency to not be consultant — most come from AI and tech {industry} circles — and solid their votes based mostly on private, hard-to-pin-down preferences.

The Chatbot Enviornment interface.Picture Credit:LMSYS

Ethan Mollick, a professor of administration at Wharton, lately identified in a submit on X one other downside with many AI {industry} benchmarks: they don’t examine a system’s efficiency to that of the common particular person.

“The fact that there are not 30 different benchmarks from different organizations in medicine, in law, in advice quality, and so on is a real shame, as people are using systems for these things, regardless,” Mollick wrote.

Bizarre AI benchmarks like Join 4, Minecraft, and Will Smith consuming spaghetti are most definitely not empirical — and even all that generalizable. Simply because an AI nails the Will Smith take a look at doesn’t imply it’ll generate, say, a burger properly.

Mcbench — Word the typo; there’s no such mannequin as Claude 3.6 Sonnet.Picture Credit:Adonis Singh

One skilled I spoke to about AI benchmarks prompt that the AI group concentrate on the downstream impacts of AI as a substitute of its capacity in slim domains. That’s wise. However I’ve a sense that bizarre benchmarks aren’t going away anytime quickly. Not solely are they entertaining — who doesn’t like watching AI construct Minecraft castles? — however they’re straightforward to know. And as my colleague Max Zeff wrote about lately, the {industry} continues to grapple with distilling a know-how as complicated as AI into digestible advertising.

The one query in my thoughts is, which odd new benchmarks will go viral in 2025?

Will Smith consuming spaghetti and different bizarre AI benchmarks that took off in 2024

how does Temu reply to tariff threats?

The Psychology of ‘Shared Silence’ in {Couples}

David Moyes revels within the Merseyside derby “mayhem” as draw retains “title race alive” says Tim Sherwood | Soccer Information

Valentine’s Traditions

Wonderful Romantic Lodges & Experiences for {Couples} in Japan

Related articles

Saudi’s BRKZ closes $17M Collection A for its development tech platform

Samsung’s Galaxy S25 telephones, OnePlus 13 and Oura Ring 4

Pour one out for Cruise and why autonomous car check miles dropped 50%

Anker’s newest charger and energy financial institution are again on sale for record-low costs

Follow us

Company

Latest news

The Lodge at Gulf State Park: Alabama’s Sustainable Getaway

how does Temu reply to tariff threats?

The Psychology of ‘Shared Silence’ in {Couples}

Popular news

Public and Non-public Sector Payroll Jobs Throughout Presidential Phrases

Common Fundamental Earnings Might Double World’s GDP And Slash Emissions : ScienceAlert

The magical great thing about the Higher Lakes of the Plitvice Lakes Nationwide Park