By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Stay Current on Political News—The US FutureStay Current on Political News—The US FutureStay Current on Political News—The US Future
  • Home
  • USA
  • World
  • Business
    • Realtor
    • CEO
    • Founder
    • Entrepreneur
    • Journalist
  • Sports
    • Athlete
    • Coach
    • Fitness trainer
    • Life Style
  • Education
  • Health
    • Doctor
    • Plastic surgeon
    • Beauty cosmetics
  • Politics
  • Technology
    • Space
    • Cryptocurrency
  • Weather
Reading: A new AI coding challenge just published its first results — and they aren’t pretty
Share
Font ResizerAa
Font ResizerAa
Stay Current on Political News—The US FutureStay Current on Political News—The US Future
  • Home
  • USA
  • World
  • Business
  • Cryptocurrency
  • Economy
  • Life Style
  • Health
  • Politics
  • Space
  • Sports
  • Technology
  • Weather
  • Entertainment
  • Cybersecurity
Search
  • Home
  • USA
  • World
  • Business
    • Realtor
    • CEO
    • Founder
    • Entrepreneur
    • Journalist
  • Sports
    • Athlete
    • Coach
    • Fitness trainer
    • Life Style
  • Education
  • Health
    • Doctor
    • Plastic surgeon
    • Beauty cosmetics
  • Politics
  • Technology
    • Space
    • Cryptocurrency
  • Weather
Follow US
Stay Current on Political News—The US Future > Blog > Business > A new AI coding challenge just published its first results — and they aren’t pretty
Business

A new AI coding challenge just published its first results — and they aren’t pretty

Sarah Mitchell
Sarah Mitchell
Published July 24, 2025
Share

A new IA coding challenge has revealed its first winner, and established a new bar for software with AI.

On Wednesday at 5 PM PT, the non -profit institute Laude Institute announced the first winner of the K Prize, an IA multi -braid coding challenge launched by Data Tabricks and the co -founder of Perplexity Andy Konwinski. The winner was a Brazilian fast engineer named Eduardo Rocha de Andrade, who will receive $ 50,000 for the prize. But more surprising than victory was his final score: he won with correct answers to only 7.5% of the questions in the test.

“We are building a reference point that is actually difficult,” Konwinski said. “The reference points should be difficult if they are going to matter,” he continued, adding: “The scores would be different if the big laboratories had entered with their biggest models. But that is a child of The Point. The K award K runs a limited compensation, its limited. Levels the playing field.”

Konwinski has promised $ 1 million to the first open source model that can obtain more than 90% in the test.

Similar to the well-known Swe-Bench system, the K prize proves the models against marked problems of Github as a test or how well models can deal with real world programming problems. But although Swe-Bench is based on a fixed set of problems that the models can train again, the K prize is designed as a “free version of H for the first round, the models had to win before March 12. The organizers of the K Prize then built the test using only github problems marked after that date.

The upper 7.5% score is in marked contrast with Swe-Bench Itelf, which currently shows a higher score of 75% in its “easiest” verified “test and 34% in its harder” complete “test. Konwinski is not still sure whether the disparity is due to pollution in the SWE bank or simply to the challenge of collecting new problems of Github, but expects the K project K of the prize answer the question soon.

“As we get more races of the thing, we will have a better meaning,” he told TechCrunch, “because we hope that people adapt to the dynamics of competing in this every few months.”

Techcrunch event

San Francisco
|
October 27, 2025

It may seem a strange place to fall short, given the wide range of AI coding tools that are already publicly mitigible, but with reference points that become too easy, many critics see projects such as the K prize as a necessary wood city to solve The growing AI evaluation problem.

“I am quite optimistic about the new tests for existing Benchaks,” says Princeton Sayash Kapoor researcher, who presented a similar idea In a recent article. “Without such experiments, we cannot say if the problem is pollution, or simply go to the Swe-Bench classification table with a human in the loop.”

For Konwinski, it is not only a better reference point, but an open challenge for the rest of the industry. “If you listen to the exaggeration, it is as if we should see AI and AI lawyers and AI software engineers, and that is not true,” he says. “If we cannot obtain more than 10% in a chickens bank without pollution, that is the verification of reality for me.”

Popular News
USA

Paranormal investigator Joe Nickell, known as ‘real-life Scully,’ dead at 80

Sophia Martin
Sophia Martin
April 22, 2025
Educators Fear Their Homeless Students Could Become a Target for Trump Cuts
More women now make as much as their husbands, but still do more at home
Shannon Sharpe Offered Rape Accuser $10 Million to Settle
Context Clues in Reading: 7 Strategies
Stay Current on Political News—The US Future
The USA Future offers real-time updates, expert analysis, and breaking stories on U.S. politics, culture, and current events.
  • USA
  • World
  • Politics
  • Education
  • Weather
  • Business
  • Entrepreneur
  • Founder
  • Journalist
  • Realtor
  • Health
  • Doctor
  • Beauty cosmetics
  • Plastic surgeon
  • Sports
  • Athlete
  • Coach
  • Fitness trainer
© 2017-2025 The USA Future . All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?