2026.06|6 min#ai#engineering#codex#productivity

The Agent Left the IDE

Computer use turns AI coding from text generation into software operation.

The most interesting thing about AI coding agents right now is not that they can write code.

It is that they are starting to operate computers.

That sounds like a small distinction until you feel it in the workflow. A code generator lives inside a text box. It waits for a prompt, returns a patch, and leaves the rest of the job to you. A software operator can inspect the app, click through the broken flow, read the console, run the server, reproduce the issue, change the code, and check whether the thing actually works.

That is a different kind of tool.

OpenAI's May 29 Codex update points in that direction. Codex now supports computer use on Windows in the Codex app for eligible users, so it can see, click, and type in Windows applications while testing and refining software. The same release also expands remote control, letting a user steer work from ChatGPT on mobile or Codex on Mac while the Windows machine remains the host for the project files, shell, app server, and local context.

I do not think the important part is Windows support by itself.

The important part is the new shape of work.

Coding Was Never Just Typing

For a while, the AI coding story was mostly about generation. Could the model write a component? Could it scaffold an API route? Could it refactor a file without losing the plot?

Useful, but narrow.

Real software work has always been messier than text generation. You open the app. You notice the layout is wrong. You click a button. Nothing happens. You check the terminal. The dev server crashed. You restart it. The page loads, but the empty state is off. You resize the browser. The mobile nav breaks. You skim the network tab. The request is fine, but the UI state is stale.

None of that is "write code" in the pure sense.

It is operating the system around the code.

That is why computer use matters. It gives the agent access to the loop that human engineers actually live in: observe, diagnose, change, verify. The text editor is only one stop in that loop.

The IDE Is Too Small

The IDE was a natural starting point for AI coding tools because code is text. Put the model near the text and it can help.

But the product surface of software is not the IDE. It is the browser, the terminal, the database, the logs, the design tool, the cloud dashboard, the test runner, the email preview, the mobile simulator, and sometimes a random desktop app that only exists because some enterprise workflow depends on it.

If the agent can only see the repository, it is always working from a partial truth.

It can infer what should happen. It can read tests. It can inspect types. It can even run commands if the environment allows it. But it cannot fully understand the gap between the code and the experience unless it can look at the experience.

This is why frontend work has been such a revealing test. A model can produce valid React and still ship an interface that feels wrong. It can pass tests and still overlap text on mobile. It can implement the requested behavior and miss that the loading state jumps the layout.

The browser catches what the diff cannot.

An agent that can look, click, and iterate has a better shot at closing that gap.

Remote Control Changes the Cadence

The remote-control part may end up being just as important as computer use.

When an agent can keep working on the host machine while you check in from somewhere else, the job starts to feel less like a chat session and more like delegated work. You do not need to sit there watching every command. You can let the agent run until it hits a decision point, then answer the question, redirect it, or approve the next step.

That changes the cadence of engineering.

The old cadence was synchronous. You were either coding or you were not. If you stepped away, the work stopped.

The new cadence is supervisory. You define the goal, give the agent enough context, and let it move through the loop. Your job is to keep the judgment layer alive. Is this still the right approach? Did it choose the right tradeoff? Is the patch too broad? Did it verify the thing that matters?

That is closer to managing a capable junior engineer than using autocomplete.

And like managing a junior engineer, the value depends on the quality of your delegation.

The Risk Moves Too

There is a tempting version of this story where more agent autonomy simply means more productivity.

That is not the full picture.

An agent with computer use has a wider action surface. It can click the wrong thing. It can misunderstand a modal. It can test against the wrong environment. It can mistake a locally cached state for a real fix. It can spend time polishing the visible symptom while missing the deeper bug.

More access is only useful when the workflow has boundaries.

That means you still need clear permissions, disposable environments, human approval for risky actions, and a review process that treats agent work like real work. Especially when the agent is touching systems outside the editor.

The mistake is assuming that because the agent can operate more of the computer, it should be allowed to operate everything.

Good delegation is scoped. Give the agent a sandbox. Give it the app server, the browser, the test suite, and enough project context to make progress. Keep production credentials, irreversible actions, billing changes, and sensitive user data behind a stronger gate.

The point is not to make the agent fearless.

The point is to make it useful without making it dangerous.

The Best Workflows Will Be Visible

As agents become more operational, the winning workflows will be the ones that make the agent's work easy to inspect.

I want to see what it tried. I want screenshots when the UI changes. I want terminal output when a test fails. I want a short explanation of why it chose one fix over another. I want the final diff to be boring and the verification trail to be clear.

This is the difference between autonomy and trust.

Autonomy means the agent can move. Trust means I can understand what happened after it moved.

That is also where many teams will get the first productivity gains. Not from letting agents do huge open-ended tasks, but from handing them bounded loops:

▸Reproduce this bug and propose the smallest fix
▸Run the app and check the empty states
▸Add the test that proves this permission boundary
▸Verify this onboarding flow on mobile
▸Compare the implementation to the design and list mismatches

Those are not glamorous tasks. They are exactly the tasks that slow teams down every day.

Computer use makes them more delegable.

What Builders Should Do Now

If you are building with AI agents, I would not wait for the perfect tool before changing your habits.

Start by making your work easier for an agent to operate.

Keep local setup simple. Document the command that runs the app. Make tests deterministic. Write down the flows that matter. Keep secrets out of default environments. Add screenshots or acceptance criteria when the task is visual. Ask the agent to verify behavior, not just change files.

Most of this is just good engineering hygiene.

That is the recurring pattern with AI tools. The better your system is for humans, the better it tends to be for agents. Clear docs, clear tests, clear boundaries, clear review paths. AI does not remove the need for that discipline. It makes the payoff more obvious.

The agent leaving the IDE does not mean engineers leave the process.

It means the process needs to be legible enough that an agent can participate in it.

The Bottom Line

The next phase of AI coding is not about prettier autocomplete.

It is about agents that can operate the software environment around the code. They will run apps, inspect interfaces, respond to prompts, test changes, and keep moving while humans supervise from the judgment layer.

That is a big shift.

The IDE was where AI coding started because it was the easiest surface to understand. But software does not live in the IDE. It lives in the messy loop between code, runtime, product, and user experience.

Now the agents are entering that loop.

The builders who benefit most will not be the ones who hand over everything. They will be the ones who design tight, visible, reviewable workflows where agents can do real operating work and humans still own the decisions that matter.