The Reality of AI in the Workplace
Imagine you're redesigning your living space. You could hire an interior designer for thousands of dollars, or you could ask an AI tool like ChatGPT to do it instead. But can AI actually do the work?
A groundbreaking study compared how top AI systems and human workers performed on hundreds of real work assignments, including producing a digital version of a hand-drawn floor plan.
The results were eye-opening:
- The human produced a professional-looking floor plan
- The best-performing AI system made a plausible-looking floor plan, but with much less detail
- The AI version was completely wrong
This failed floor plan illustrates a crucial disconnect three years after ChatGPT's release that has implications for the entire economy.
The Study That Changed Everything
Researchers collected hundreds of real projects from freelancing platforms where humans had been paid to complete tasks like:
- Making 3D product animations
- Transcribing music
- Coding web video games
- Formatting research papers for publication
They then gave each task to AI systems including OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude.
The shocking finding: The best-performing AI system successfully completed only 2.5% of the projects.
"Current models are not close to being able to automate real jobs in the economy," said Jason Hausenloy, one of the researchers on the Remote Labor Index study.
Where AI Falls Short
Another assignment involved creating an interactive dashboard visualizing data from the World Happiness Report. At first glance, the AI results looked adequate, but closer examination revealed:
- Countries inexplicably missing data
- Overlapping text
- Legends using wrong colors or no colors at all
The AI systems failed on nearly half of the projects by producing poor-quality work, and they left more than a third incomplete. Nearly 1 in 5 had basic technical problems like producing corrupt files.
"A lot of the failures were kind of prosaic," Hausenloy said, pointing to two major limitations:
- No long-term memory - AI can't learn from previous mistakes or remember feedback over time
- Struggles with visual understanding - Problems with graphic design or spatial reasoning
The Visual Challenge
This failure became apparent in a project asking for promotional material for tech earbuds. The task involved taking images and creating a 3D model with short video clips demonstrating the design.
No AI system produced acceptable work:
- OpenAI's GPT-5 and Anthropic's Sonnet created poor 3D models
- Manus didn't create a 3D model at all
- In some results, the earbuds changed appearance across clips
Graham Neubig, a Carnegie Mellon University professor who has researched AI systems, explained one reason for these failures: "They don't use the same tools a human expert would use."
A human creating a product rendering would use 3D modeling software with a visual interface, while a chatbot asked to make a 3D model will usually try to generate images by writing code.
Where AI Shows Promise
The AI systems performed better on a task involving producing a web-based video game. The best version made without human work was actually playable - an impressive feat. However, the AI system ignored the instruction that the game have a brewing theme.
The Economic Implications
If AI systems could perform remote work assignments autonomously, businesses using human contractors could instead send that work to a chatbot. This would mean huge cost savings for companies and no work for those contractors.
The study suggests this scenario is far from reality, at least for now.
The Future Trajectory
Though all AI systems failed most projects, newer models showed improvement. The team recently tested Google's Gemini 3 Pro, released in November. It completed 1.3% of tasks, compared with the company's previous version getting through 0.8%.
"The trend lines are there," Hausenloy noted.
AI can still disrupt the labor market without fully replacing individual workers. Companies may need fewer employees if each one can do more with a chatbot's help. But if the trend toward greater autonomy continues, the economics of work could become challenging for many people.
Consider this: A human made the video game for $1,485. The researchers had Sonnet make it for less than $30.
Whether AI systems need minor tweaks or fundamental breakthroughs to successfully do real work remains "the key question in the AI field at the moment," according to Hausenloy.





Comments
Join Our Community
Sign up to share your thoughts, engage with others, and become part of our growing community.
No comments yet
Be the first to share your thoughts and start the conversation!