
New Study Challenges the Productivity Hype of AI Coding Tools for Experienced Developers
Artificial intelligence continues to reshape the modern software engineering landscape, with tools like GitHub Copilot and Cursor promising to accelerate development, debug code, and streamline testing. Backed by powerful AI models from OpenAI, Google DeepMind, Anthropic, and xAI, these tools have become increasingly prevalent in the workflows of both novice and seasoned developers across the United States and beyond.
However, a newly published study from the non-profit AI research group METR casts doubt on the universally assumed productivity gains of these AI-driven coding assistants—particularly for experienced developers working in real-world, large-scale codebases.
In one of the first randomized controlled trials of its kind, METR recruited 16 veteran open-source developers and assigned them 246 real tasks across repositories they frequently contribute to. Half of the tasks permitted the use of advanced AI tools—such as Cursor Pro—while the other half restricted access to any AI assistance.
Developers initially predicted that AI tools would reduce their completion time by 24%. In reality, the study found the opposite: tasks completed with AI assistance took 19% longer on average.
The unexpected slowdown raises critical questions about the practicality of “vibe coders”—a term increasingly used for AI copilots that offer code suggestions based on context rather than structured engineering logic. According to METR, developers using AI tools spent more time crafting prompts and waiting for responses than actually writing and reviewing code. Moreover, AI struggled to navigate the intricacies of large, complex repositories—a challenge where human expertise still clearly holds an edge.
It’s important to note that only 56% of the participants had prior hands-on experience with Cursor, though nearly all had used web-based LLMs in some capacity before. The researchers provided training on Cursor before the trial, yet the learning curve and unfamiliarity with specific tooling may have contributed to the delays.
While these findings challenge the prevailing narrative of AI as a panacea for developer productivity, METR’s researchers caution against broad generalizations. They acknowledge that AI models are advancing at a rapid pace, and results could vary significantly just months from now. Furthermore, previous large-scale industry studies have shown measurable improvements in speed and output among less experienced coders or in more controlled environments.
Nonetheless, METR’s research underscores the need for a more nuanced understanding of how and when AI enhances software development—especially within professional and enterprise-level projects. It also adds to a growing body of evidence suggesting that while AI can be a powerful assistant, it’s not without its drawbacks. Misleading outputs, security vulnerabilities, and over-reliance on AI-generated code remain ongoing concerns.
As the U.S. continues to lead the global charge in AI integration across industries, the findings serve as a reminder that technological hype should be matched with empirical rigor and critical evaluation.
As AI coding tools become more embedded in development pipelines, will the future of engineering be defined by human intuition guiding machines—or by machines reshaping how humans think and build altogether?