A new IA coding challenge has revealed its first winner, and established a new bar for software with AI.
On Wednesday at 5 PM PT, the non -profit institute Laude Institute announced the first winner of the K Prize, an IA multi -braid coding challenge launched by Data Tabricks and the co -founder of Perplexity Andy Konwinski. The winner was a Brazilian fast engineer named Eduardo Rocha de Andrade, who will receive $ 50,000 for the prize. But more surprising than victory was his final score: he won with correct answers to only 7.5% of the questions in the test.
“We are building a reference point that is actually difficult,” Konwinski said. “The reference points should be difficult if they are going to matter,” he continued, adding: “The scores would be different if the big laboratories had entered with their biggest models. But that is a child of The Point. The K award K runs a limited compensation, its limited. Levels the playing field.”
Konwinski has promised $ 1 million to the first open source model that can obtain more than 90% in the test.
Similar to the well-known Swe-Bench system, the K prize proves the models against marked problems of Github as a test or how well models can deal with real world programming problems. But although Swe-Bench is based on a fixed set of problems that the models can train again, the K prize is designed as a “free version of H for the first round, the models had to win before March 12. The organizers of the K Prize then built the test using only github problems marked after that date.
The upper 7.5% score is in marked contrast with Swe-Bench Itelf, which currently shows a higher score of 75% in its “easiest” verified “test and 34% in its harder” complete “test. Konwinski is not still sure whether the disparity is due to pollution in the SWE bank or simply to the challenge of collecting new problems of Github, but expects the K project K of the prize answer the question soon.
“As we get more races of the thing, we will have a better meaning,” he told TechCrunch, “because we hope that people adapt to the dynamics of competing in this every few months.”
Techcrunch event
San Francisco
|
October 27, 2025
It may seem a strange place to fall short, given the wide range of AI coding tools that are already publicly mitigible, but with reference points that become too easy, many critics see projects such as the K prize as a necessary wood city to solve The growing AI evaluation problem.
“I am quite optimistic about the new tests for existing Benchaks,” says Princeton Sayash Kapoor researcher, who presented a similar idea In a recent article. “Without such experiments, we cannot say if the problem is pollution, or simply go to the Swe-Bench classification table with a human in the loop.”
For Konwinski, it is not only a better reference point, but an open challenge for the rest of the industry. “If you listen to the exaggeration, it is as if we should see AI and AI lawyers and AI software engineers, and that is not true,” he says. “If we cannot obtain more than 10% in a chickens bank without pollution, that is the verification of reality for me.”


