Is Devin 2.0 getting much better? Is world's first AI programmer real or fake?
In March, Devin, the "world's first AI programmer", was unveiled and immediately became a cult favourite. Devin is said to be able to plan and execute complex engineering tasks involving thousands of decisions, remember the context of each step, and learn and correct mistakes over time. For a while, various programmers panicked.
Recently, Andrew Gao, a former LangChain employee, broke the news about the new features of the upcoming Devin 2.0. First, an interactive mode will be enabled to help Devin browse the web. It's very useful when you're stuck on something like an image captcha. Admittedly, it's a bit slow (they admit it), but it works well enough to make clicks. Secondly, the inability to edit code with Devin, which people complained about before, can now be done by running Web VSCode.
Another update is cookies, which allow Devin to use a user's account to log in to the site without having to give Devin the user's password; PhantomBuster does a similar thing, and Devin has also added "machine snapshots", which allow the user to save Devin's state so that when the server is ready, Devin will have the ability to save the user's account. Devin is also adding a "machine snapshot" feature that allows users to save the state of a Devin so that if the server shuts down, users can restart it, and it also supports integration with GitHub, which allows Devin to make commits.
It should be noted that Cognition, the company behind Devin, has not yet officially released these features.
Devin experienced two significant moments: its initial release on 13 March and the accusation of being a fake more than two weeks later.
In a recent blog post, Karl, a software engineer with over 35 years of experience, questioned the authenticity of Devin. He replayed the demo video frame-by-frame and highlighted discrepancies between the demo and the actual capabilities of Devin.
Devin is believed to be able to solve arbitrary Upwork tasks. However, in the video demo, the problem being asked to be solved does not match the requirements specified by the customer. Devin is fixing bugs in a GitHub repository feed, but the file it is editing does not actually exist in that repository. Some of the bugs it is fixing are nonsensical and of the type that a human would never make. Devin executes meaningless shell commands such as "head -n 5 foo | tail -n 5". Devin's code changes are also of poor quality.
Carr believes that Cognition Labs has overstated Devin's capabilities, and that the video descriptions and tweets contain misinformation that causes confusion and misunderstanding. Carr advises against blindly repeating and amplifying claims found online without proper research.
There was a great deal of anticipation for Cognition to respond to these queries, but as of yet there has been no explanation from the team. A mid-April tweet offers a somewhat vague glimpse of Scott's attitude towards Devin's shortcomings. Today's Devin is far from perfect. Devin works a lot, but he also makes mistakes, writes bugs, or gets into trouble a lot. Scott believes that Devin's stronger areas are in the areas of DevOps and Dev setup. "Devin's first significant achievement for us was the rotation of the database tables and the launch of Kubernetes." Another significant use case is data analytics. Scott highlights that Devin plays a pivotal role in this area, ensuring that requirements are accurately understood and then translated into code.
Scott declined to provide further details about the underlying technology, stating that his team had developed a unique approach to combining large language models (LLMs), such as the OpenAI GPT-4, with reinforcement learning techniques. Cognition also declined to disclose the extent to which Devin relies on other existing LLMs.