On Thursday, OpenAI unveiled a new feature, dubbed Operator, that lets ChatGPT take control of a virtual browser to perform real-world tasks like ordering food or booking flights. But so far, it’s aimed at rich people.
The tool, currently available only to Pro subscribers ($200/month) in the U.S., marks the company’s first venture into autonomous web browsing.
It highlights the emergence of a tiered financial system, where those who pay more gain access to the best AI features. At the same time, lower-paying users are limited to less capable models with restricted functionality—arguably not that democratic.
The system works through operator.chatgpt.com, where users can ask ChatGPT to handle various online chores.
There have been some attempts to do similar things in the past, from the OpenAI plugin store to the promise of Large Action Models popularized by Rabbit. Still, their reliance on APIs made them inconvenient and challenging to set up.
What makes this different is how it works. Instead of relying on APIs as its predecessors, Operator controls a cloud-based browser, clicking buttons and filling forms just like a human would.
Every time Operator makes a move, it snaps a screenshot to show you what it’s doing.
For example, if you need to book a ticket to a game, the AI will open up its own browser, go to a specific site, look for the game in question, and find the best options before asking you to confirm the payment.
It will also walk you through its decision-making process with visual proof. If things go sideways, there’s a “Take Control” button that lets humans grab the wheel.
To succeed where others failed, OpenAI had to build its own AI model to visually understand the information shown by a web browser and control actions with keyboard and mouse inputs. The new model, powered by GPT-4o, was named Computer User Agent (CUA).
This isn’t just about following scripts. The AI can read and understand website layouts, adapt to different designs, and even handle unexpected pop-ups or error messages.
The system shows off some impressive party tricks. Hand it a photo of your messy handwritten shopping list, and it’ll not only use GPT-Vision to read it but actually order everything from your preferred grocery store.
OpenAI has partnered with several companies to ensure smooth operations across their platforms.
When booking a ride or ordering food, the AI can navigate services like Uber and DoorDash without hiccups since it’s preconfigured to have an understanding of their interfaces.
However, for unsupported websites, the system still attempts to complete tasks using its browser control capabilities. This is where Operator beats other alternatives.
As usual, OpenAI shared some benchmarks: It beats other State-of-the-art models, scoring 38.1% on OSWorld (proficiency at handling standard Operating Systems) vs. 22% by the best competitor and 58.1% on WebArena (handling of e-commerce sites) vs. 36.2% by the competitors.
That said, the team emphasized Operator is still a research preview, so errors and bugs are expected.
One potential sticking point might make security-minded users pause: you need to trust Operator with your login credentials.
The cloud browser requires access to your accounts to get anything done, and since it’s not compatible with local browsers, logging in with a remote web browser trusting on OpenAI’s pinky promise to not store sensitive data may seem like a bit of a red flag.
The feature is set for a broader rollout soon, with Plus subscribers next in line. Developers won’t be left out either—OpenAI plans to release Operator through its API in the coming weeks, potentially spawning a new generation of AI-powered automation tools.
OpenAI says more instances are coming beyond cloud web browsing control. The team said during their demonstration that they’re also working on expanding the roster of AI agents beyond the current general-purpose assistant.
Edited by Sebastian Sinclair and Josh Quittner