The "First AI Programmer" Was A Lie
The much-talked-about AI agent, Devin, who was claimed to be capable of solving real-world tasks on Upwork, turned out to be a fake.
Today, I felt betrayed.
As a technical writer whose focus is on generative AI, it’s my responsibility to inform my readers about the latest and most exciting news in the AI space. I take this responsibility seriously, as I know many people, both technical and non-technical, rely on accurate reporting to stay informed about the rapidly evolving world of artificial intelligence.
Last month, a Bay Area startup Cognition revealed Devin, the first AI software engineer who could transform the way we build software. Among other features, the one that easily caught my attention is its ability to complete tasks on Upwork. I was impressed and shared my thoughts in an article “The First AI Software Engineer Is Here”.
However, today, a YouTuber named Karl analyzed the demo video and debunked Devin completing and getting paid for freelance jobs on Upwork.
In the video, Karl examined the claims made about Devin, purportedly the “world’s first AI software engineer.” He is skeptical of such marketing hyperbole surrounding Devin, particularly a specific claim that viewers can watch Devin handling complex tasks on Upwork to earn money.
He deconstructs this claim by dissecting the tasks Devin was shown performing in a promotional video, showing discrepancies, and exaggerations in its capabilities.
Devin did not deliver what was specifically requested and instead performed simpler, unrelated tasks that were falsely represented as significant achievements.
This lie has caused a lot of non-technical, and even some technical people to believe that AI might replace programmers soon.
Aside from the fake demo of Devin doing an actual Upwork job, it also made mistakes in the coding examples:
Devin is shown fixing errors in the source of a GitHub repo, but the files it’s shown editing don’t actually exist in that repo and some of the errors it's fixing are nonsensical, of the type that’d never be made by a human.
Devin’s code changes are bad, e.g. writing its low-level file read loop instead of using the standard library properly.
Although the video makes it look like Devin did the task quickly, and the video creator was able to do the requested task in ~30 minutes, the timestamps in the chat show the task stretching over many hours and even into the next day.
Devin does nonsensical shell commands like
`head -n 5 foo | tail -n 5`
Medium writer Devansh also did his own investigation of the video and explained his findings in an article Did the makers of Devin AI lie about their capabilities?
I took this news personally because I am one of those who got fooled. As a technical writer, it’s disheartening to realize that I unintentionally contributed to the spread of misinformation.
Such misleading portrayals harm public understanding of AI’s current abilities, leading non-technical people to overestimate its capabilities and contribute to a distorted narrative about AI replacing human jobs.
I have no idea what LLM model is used in Devin. Looking at these amateur mistakes, it doesn’t seem to be using the top AI models like GPT-4, Claude Opus, or Google Gemini. The fact that Devin’s performance falls short of the capabilities demonstrated by leading AI models raises questions about the technical competence and honesty of Cognition.
Now I wonder what the Devin team will say in response if anything. Will they acknowledge the discrepancies and apologize for the misleading demo, or will they double down on their claims?
What is AI Washing?
The phenomenon of AI washing, where companies overhype their product’s AI capabilities to attract investment and mislead consumers, is a growing concern in the industry.
The term “AI washing” was first coined by the research firm Cognilytica, which defines it as “the use of the term ‘AI’ (Artificial Intelligence) to sell products, services, or solutions that either have nothing to do with AI or have very limited AI capabilities.”
Remember Google’s demo video of Gemini?
The demo is meant to highlight Gemini’s multimodal capabilities. It was an impressive video, except for one problem — it was fake.
We created the demo by capturing footage in order to test Gemini’s capabilities on a wide range of challenges. Then we prompted Gemini using still image frames from the footage, and prompting via text.
This not only undermines the credibility of the company but also creates a distorted narrative about AI’s capabilities and potential.
Another example is the recently viral product Humane AI Pin which is a tiny wearable computer with a built-in AI assistant, camera, and a little projector that blasts its UI onto your hand.
This wearable is designed to be clothing-based with no screen, using sensors and contextual data to enable AI interactions throughout the day. The AI Pin is designed to be a more convenient way to interact with AI, providing relevant notifications and AI suggestions based on the user’s location and activities.
It could potentially replace smartphones as the primary means of interacting with AI.
However, a review by a popular YouTuber and tech product reviewer MKBHD has revealed that the device’s responses to commands were often slow and incorrect. He slammed the device’s battery life, pointing out that in his review, it could only last a couple of hours during heavy use.
Marques was brutally honest with his review and has caused quite a stir with his recent review of Humane’s AI Pin. Some even accuse him of causing the collapse of Cognition.
It’s not just him making bad reviews about the Humane AI Pin. If you Google “Humane AI Pin review,” you’ll be surprised that both traditional media, independent bloggers, and YouTubers are unanimous in their criticism. The consistency of negative reviews from various sources adds weight to the concerns raised about the device’s capabilities and performance.
Going back to Devin, to be fair to them, while the demo was indeed misleading and exaggerated, it is important to remember that LLMs are still a relatively new and rapidly evolving technology. AI agents aren’t that different. Even bigger companies still haven’t released AI agents that are capable of completing jobs on Upwork that are much faster, more efficient, and correct than a real human. As Devansh said in his blog post, we are essentially trying to build skyscrapers on shifting sands.
However, it is also important to note that AI is improving at a rapid pace, and we can expect future iterations to be better and more reliable.
Final Thoughts
This is a reminder to not easily believe whatever tech companies advertise about their products. They will do whatever they can to make us believe and ride on the hype to make noise on the internet about whatever they are building.
I’d like to emphasize the importance of honesty and transparency from companies developing AI tools and criticize the media and influencers for not verifying such claims before promoting them.
I am urging users, developers, and content creators to maintain skepticism and adhere to truthfulness to prevent misinformation and its broader implications on society and technology.
And if you are an investor, be aware. I see this as a future trend to get your money and do the rug pull.
Hi @Jim Clyde Monge nice to see you here on Substack !
I’m one of your writers for your publication on medium 🙂👋🏾
Definitely don't beat yourself up too much over this!
There's, sadly, a bit too much hype in GenAI, to the extent where even people who are genuinely excited about the possibilities (like me) are also getting weary of hyperbolic claims and promises.
I'm not even remotely surprised about the Humane Ai Pin disaster. I remember calling that this was a flop back when it was first announced. I simply can't see the necessity of a having a standalone AI device that's this expensive (and requires a monthly fee forever) when it's this slow and limited in capabilities.
It's almost painful, because you can see how much care seems to have gone the hardware and the overall build. I think the Ai Pin could've been more successful if it was a solid but cheaper, less overengineered alternative to the Apple Watch / smarwatch that lived on your shirt and still offered instant access to the phone's AI features, apps, etc, while sporting its own camera for scanning the environment and taking photos/videos. Then you get the benefits of an always-ready camera for any vision-capable LLMs to work with while piggybacking on your phone to do the heavy lifting in terms of AI and processing (instead of going to the cloud for every query, making users wait for up to 30 seconds)
As for Devin, I'm more excited about the glimpses of what it means for agents as a whole rather than its coding capabilities. We haven't succeeded in making AI agents work reliably just yet, but Devin's nascent ability to create its own action plan, work in multiple workspaces, and double-check / troubleshoot its own work is a promising start.