AI vs Human Jobs: Shocking Study Reveals How Close AI Really Is to Taking Your Work
The Washington Post1 month ago
900

AI vs Human Jobs: Shocking Study Reveals How Close AI Really Is to Taking Your Work

INDUSTRY INSIGHTS
ai
jobsecurity
futureofwork
technology
automation
Share this content:

Summary:

  • AI systems successfully completed only 2.5% of real work assignments in a comprehensive study comparing human and AI performance

  • The best-performing AI failed on visual tasks like creating accurate floor plans and 3D product models, often producing completely wrong results

  • Major limitations include no long-term memory and poor visual understanding, preventing AI from learning from mistakes or handling spatial reasoning

  • Despite predictions of widespread job replacement, current AI models are not close to automating real jobs in the economy

  • Newer AI models show improvement but still struggle with complex tasks, with Google's Gemini 3 Pro completing just 1.3% of assignments

The Reality of AI in the Workplace

Imagine you're redesigning your living space. You could hire an interior designer for thousands of dollars, or you could ask an AI tool like ChatGPT to do it instead. But can AI actually do the work?

A groundbreaking study compared how top AI systems and human workers performed on hundreds of real work assignments, including producing a digital version of a hand-drawn floor plan.

The results were eye-opening:

  • The human produced a professional-looking floor plan
  • The best-performing AI system made a plausible-looking floor plan, but with much less detail
  • The AI version was completely wrong

This failed floor plan illustrates a crucial disconnect three years after ChatGPT's release that has implications for the entire economy.

The Study That Changed Everything

Researchers collected hundreds of real projects from freelancing platforms where humans had been paid to complete tasks like:

  • Making 3D product animations
  • Transcribing music
  • Coding web video games
  • Formatting research papers for publication

They then gave each task to AI systems including OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude.

The shocking finding: The best-performing AI system successfully completed only 2.5% of the projects.

"Current models are not close to being able to automate real jobs in the economy," said Jason Hausenloy, one of the researchers on the Remote Labor Index study.

Where AI Falls Short

Another assignment involved creating an interactive dashboard visualizing data from the World Happiness Report. At first glance, the AI results looked adequate, but closer examination revealed:

  • Countries inexplicably missing data
  • Overlapping text
  • Legends using wrong colors or no colors at all

The AI systems failed on nearly half of the projects by producing poor-quality work, and they left more than a third incomplete. Nearly 1 in 5 had basic technical problems like producing corrupt files.

"A lot of the failures were kind of prosaic," Hausenloy said, pointing to two major limitations:

  1. No long-term memory - AI can't learn from previous mistakes or remember feedback over time
  2. Struggles with visual understanding - Problems with graphic design or spatial reasoning

The Visual Challenge

This failure became apparent in a project asking for promotional material for tech earbuds. The task involved taking images and creating a 3D model with short video clips demonstrating the design.

No AI system produced acceptable work:

  • OpenAI's GPT-5 and Anthropic's Sonnet created poor 3D models
  • Manus didn't create a 3D model at all
  • In some results, the earbuds changed appearance across clips

Graham Neubig, a Carnegie Mellon University professor who has researched AI systems, explained one reason for these failures: "They don't use the same tools a human expert would use."

A human creating a product rendering would use 3D modeling software with a visual interface, while a chatbot asked to make a 3D model will usually try to generate images by writing code.

Where AI Shows Promise

The AI systems performed better on a task involving producing a web-based video game. The best version made without human work was actually playable - an impressive feat. However, the AI system ignored the instruction that the game have a brewing theme.

The Economic Implications

If AI systems could perform remote work assignments autonomously, businesses using human contractors could instead send that work to a chatbot. This would mean huge cost savings for companies and no work for those contractors.

The study suggests this scenario is far from reality, at least for now.

The Future Trajectory

Though all AI systems failed most projects, newer models showed improvement. The team recently tested Google's Gemini 3 Pro, released in November. It completed 1.3% of tasks, compared with the company's previous version getting through 0.8%.

"The trend lines are there," Hausenloy noted.

AI can still disrupt the labor market without fully replacing individual workers. Companies may need fewer employees if each one can do more with a chatbot's help. But if the trend toward greater autonomy continues, the economics of work could become challenging for many people.

Consider this: A human made the video game for $1,485. The researchers had Sonnet make it for less than $30.

Whether AI systems need minor tweaks or fundamental breakthroughs to successfully do real work remains "the key question in the AI field at the moment," according to Hausenloy.

Comments

0

Join Our Community

Sign up to share your thoughts, engage with others, and become part of our growing community.

No comments yet

Be the first to share your thoughts and start the conversation!

Newsletter

Subscribe our newsletter to receive our daily digested news

Join our newsletter and get the latest updates delivered straight to your inbox.

OR
AustraliaJobs.app logo

AustraliaJobs.app

Get AustraliaJobs.app on your phone!