OpenTelemetry and Microsoft Copilot Studio Brings Computer-Using Agents to the Enterprise

Introduction

OpenTelemetry is becoming more important as enterprise automation moves beyond APIs and into computer-using agents. Microsoft Copilot Studio now targets workflows that live inside legacy apps, vendor portals, and proprietary systems where teams still rely on screen clicks and manual data entry. That shift matters because it changes what automation means in practice: instead of only orchestrating deterministic infrastructure tasks, teams are increasingly asking software to interpret context, choose actions, and operate across interfaces that were never designed for machine control.

This is not just a tooling story. It is a platform engineering and governance story. The move from task automation to decision automation raises new questions about trust, auditability, failure handling, and operational visibility. DevOps teams that have spent years standardizing pipelines and infrastructure now need to understand how to observe AI-driven agents, measure their behavior, and keep them within policy. In that environment, OpenTelemetry is a natural foundation for tracing what happened, where it happened, and how confidently the system acted.

Key Insights

Microsoft Copilot Studio is being positioned for enterprise workflows that still depend on human interaction with screens, especially when systems do not expose APIs. That makes computer-using agents relevant to the long tail of operational work that traditional automation has struggled to reach.
The core problem described is familiar to DevOps and IT teams: legacy apps, vendor portals, and proprietary line-of-business platforms often force manual clicking and data entry. Computer-using agents aim to reduce that friction by acting directly on the interface rather than waiting for perfect integration.
The broader industry shift is from deterministic infrastructure automation to AI-driven probabilistic judgment. That means automation is no longer only about executing known steps reliably; it is increasingly about making operational decisions under uncertainty, which changes the engineering model.
As automation becomes more decision-oriented, trust becomes a first-class concern. Teams will need to know not only whether an agent completed a task, but why it chose a path, what signals it used, and how to detect when it drifted from expected behavior.
Governance becomes harder when agents can operate across systems that were never built for automation. Policy enforcement, approval boundaries, and exception handling must be designed around agent behavior, not just around API calls or infrastructure events.
OpenTelemetry is especially relevant because AI-driven workflows need observability across multiple layers: user intent, agent actions, system responses, retries, and downstream effects. Without consistent telemetry, debugging an agent failure can become guesswork.
The transition described in the sources suggests a platform engineering evolution. Teams will need shared standards for logging, tracing, and metrics so that AI agents can be managed like production services rather than treated as opaque assistants.
Anthropic’s hiring of Andrej Karpathy to lead Claude pre-training research signals that major AI vendors continue investing heavily in model capability. For enterprise teams, that means the surrounding control plane, observability, and governance stack must mature just as quickly as the models themselves.

Implications

The most important implication of Microsoft Copilot Studio bringing computer-using agents to the enterprise is that automation is expanding into environments where the interface itself becomes the integration layer. For years, teams have accepted that some systems would remain outside the reach of clean API-based automation. In those cases, humans became the fallback integration mechanism, moving data between portals, reconciling records, and completing repetitive transactions. Computer-using agents promise to absorb part of that burden, but they also introduce a new class of operational risk because the agent is now interpreting the interface, not just calling a stable endpoint.

That shift changes how platform teams should think about observability. Traditional automation is usually easy to reason about: a pipeline step runs, a job succeeds or fails, and logs show the result. AI-driven agents are different. They may inspect a page, decide which button to click, recover from a transient error, or choose an alternate path when the interface changes. Those decisions are useful, but they are also probabilistic. If an agent misreads a page or takes an unexpected branch, the failure may not be obvious from a single status code. This is where OpenTelemetry becomes strategically important. Traces can connect intent to action to outcome, while metrics can reveal drift, latency, retry storms, or unusual error patterns. Logs can preserve the detailed context needed for post-incident analysis and governance review.

There is also a major governance implication. When automation crosses from infrastructure into business operations, the blast radius expands. A misconfigured deployment pipeline might affect a service. A misbehaving computer-using agent might affect procurement, finance, customer support, or compliance workflows. That means teams need stronger controls around identity, approvals, and scope. They also need a way to answer basic questions after the fact: what did the agent see, what did it decide, what policy allowed it, and what downstream system accepted the result? Without that evidence, enterprise adoption will stall at the pilot stage.

The source about the industry moving from automating infrastructure to automating decisions is a useful warning. The engineering challenge is no longer only about reliability in the narrow sense. It is about operational reasoning. Teams must decide which decisions can be delegated, which require human review, and which should remain deterministic. In practice, that may mean using agents for low-risk, high-volume tasks first, such as data reconciliation or portal updates, while keeping approvals and financial commitments behind explicit human gates. The more sensitive the workflow, the more important it becomes to instrument every step and preserve a clear audit trail.

Finally, the Anthropic hiring news reinforces that model capability will keep improving, but capability alone does not solve enterprise adoption. Better models increase the number of tasks that agents can attempt, yet they also increase the need for guardrails. The organizations that succeed will not be the ones that simply deploy the most capable model. They will be the ones that pair model progress with observability, policy enforcement, and operational discipline. In that sense, OpenTelemetry is not a side concern. It is part of the control plane that makes AI automation safe enough to trust in production.

Actionable Steps

Start by inventorying workflows that still depend on manual screen work. Focus on vendor portals, legacy line-of-business tools, and internal systems without APIs. Rank them by volume, error rate, and business impact so you can identify where computer-using agents would create measurable value without immediately touching high-risk processes.
Define a clear observability model for agent activity using OpenTelemetry. Treat each user request, agent action, page interaction, retry, and downstream system update as part of one traceable workflow. This makes it easier to reconstruct incidents, compare agent behavior over time, and spot patterns that indicate interface drift or model instability.
Establish policy boundaries before enabling broad agent access. Decide which actions can be executed autonomously, which require approval, and which are prohibited entirely. For example, an agent might update a support ticket automatically but require human confirmation before submitting a purchase order or changing a customer record.
Build a review process for failures and near misses. Do not only examine outright errors. Review cases where the agent succeeded but took an unexpected path, needed multiple retries, or selected a fallback route. Those events often reveal weak prompts, brittle interfaces, or hidden assumptions that will become more serious at scale.
Instrument business outcomes, not just technical outcomes. Measure completion time, manual intervention rate, exception frequency, and rework volume. If a computer-using agent reduces clicks but increases downstream corrections, the automation may be creating hidden operational debt rather than delivering value.
Create a sandbox or staging environment that mirrors the real interface as closely as possible. Agents can behave differently when page layouts, session timing, or authentication flows change. A realistic test environment helps teams validate behavior before production rollout and reduces the chance that a small UI change causes a large operational incident.
Assign ownership across platform engineering, security, and business operations. Computer-using agents sit at the intersection of all three. Platform teams need telemetry and reliability controls, security teams need identity and access boundaries, and business owners need to define acceptable outcomes and escalation paths.
Treat model upgrades as change-managed events. As vendors improve their models, agent behavior may change even when the workflow configuration stays the same. Re-run regression tests, compare trace patterns, and verify that approval logic and audit outputs still meet policy before expanding usage.

Call to Action

If your organization is exploring computer-using agents, do not start with the model alone. Start with the workflow, the risk profile, and the observability layer that will make the system understandable in production. OpenTelemetry gives DevOps and platform teams a practical way to trace agent behavior, measure reliability, and support governance as automation moves into decision-making. The teams that prepare now will be better positioned to adopt AI safely, prove value quickly, and avoid opaque automation failures later.

Sources

Microsoft Copilot Studio Brings Computer-Using Agents to the Enterprise, 2026-05-18, https://devops.com/microsoft-copilot-studio-brings-computer-using-agents-to-the-enterprise/
We Spent 15 Years Automating Infrastructure. Now We’re Automating Decisions, 2026-05-20, https://devops.com/we-spent-15-years-automating-infrastructure-now-were-automating-decisions/
Anthropic hires OpenAI co-founder Andrej Karpathy to lead Claude pre-training research, 2026-05-19, https://thenewstack.io/andrej-karpathy-anthropic-pretraining/