Cloudflare Cuts 20% of Staff to Pivot Toward an “AI-First Agentic” Future
The post Cloudflare Cuts 20% of Staff to Pivot Toward an “AI-First Agentic” Future appeared first on Daily CyberSecurity.

AI tools designed to assist developers are no longer staying in the background. They are starting to shape what actually gets built and deployed.
They open pull requests.
They modify dependencies.
They generate infrastructure templates.
They interact directly with repositories and CI/CD pipelines.
At some point, this stops being assistance.
It becomes participation.
And participation changes the problem.
The shift from generative to agentic behavior is the inflection point.
Earlier tools operated inside a tight loop. A developer prompted. The system suggested. The developer reviewed. Nothing moved without human intent.
That boundary is eroding.
Newer systems propose changes, update libraries, remediate vulnerabilities and interact with development pipelines with limited human intervention. They don’t just accelerate developers. They begin to shape the artifacts that move through the software supply chain — code, dependencies, configurations and infrastructure definitions.
That makes them something different.
Not tools.
Participants.
And once something participates in the supply chain, it inherits the same question every other participant does:
How is it governed?
Consider a common pattern already emerging in many environments.
An AI system identifies a vulnerable dependency.
It opens a pull request updating the library.
A workflow triggers automated tests.
The change is promoted into a staging environment.
Four steps.
No human review.
No explicit governance checkpoint.
Each step is individually valid. Nothing looks wrong in isolation.
But taken together, they create something fundamentally different: A system that can change enterprise software without human intent being re-established at any point. Research from Black Duck found that while 95% of organizations now use AI in their development process, only 24% properly evaluate AI-generated code for security and quality risks.
This is autonomous change propagation across the software supply chain.
Many organizations rely on a “human-in-the-loop” (HITL) requirement as a safety mechanism for AI-generated code.
At low volumes, this works.
At scale, it breaks.
When an AI system generates dozens of pull requests in a short window, review becomes a throughput problem, not a control. The cognitive load of validating machine-generated logic exceeds what a human can realistically govern.
What remains is not oversight, but a checkpoint.
And checkpoints without effective review are not controls.
Most governance models assume a stable truth: Humans are the primary actors.
Controls tied identity to individuals, approvals to intent and audit trails to accountability.
Even automation systems are treated as extensions of human intent — predictable, bounded and deterministic.
AI systems break that model.
They can generate new logic, act on it and propagate changes across systems. Yet in most environments, they are still governed as if they were static tools.
That mismatch is the gap.
One way to see this clearly is through identity.
Every interaction an AI system has — repository access, pipeline execution, API calls — requires credentials. In practice, these systems operate as machine identities.
But they are not traditional machine identities.
A service account executes predefined logic. Its behavior is known in advance. Its risk is bounded by what it was configured to do.
An AI-driven system is different. It generates the logic it then executes.
It can propose new code paths, interact with new systems and trigger actions that were not explicitly predefined at the time access was granted.
That is a category change.
Not just a new identity type, but a new attack surface: Identities that can generate the behavior they are authorized to execute.
The World Economic Forum has identified this class of non-human identity as one of the fastest-growing and least-governed security risks in enterprise AI adoption.
Most organizations already track access-related metrics. Those metrics were designed for human-driven systems.
They are no longer sufficient.
If AI systems are participating in the software supply chain, organizations need to measure where and how that participation introduces risk.
A few signals matter immediately:
These are not abstract concerns. They are measurable.
And until they are measured, they are not governed.
This is not just a technical shift. It is a governance and liability shift.
As regulatory expectations evolve — from AI accountability frameworks to cybersecurity disclosure requirements — organizations are increasingly responsible for explaining and controlling automated decisions inside their environments.
If an AI-driven change introduces a vulnerability or leads to a material incident, “the system generated it” will not be an acceptable answer.
Accountability will still sit with the enterprise.
That raises the bar: Governance must extend to how autonomous systems act, not just how they are accessed.

Puneet Bhatnagar
The issue is not that any one control is missing.
It is that AI systems operate across the seams of systems designed to govern within their own boundaries.
Repositories enforce code controls.
Pipelines enforce deployment controls.
Identity systems enforce access controls.
Security tools enforce policy checks.
Each works as designed.
But AI systems move across all of them.
They read from one system, generate changes, trigger another and influence a third. Authority is exercised across systems, while governance remains within them.
That is the architectural gap.
Most organizations will respond to this shift by trying to extend existing access controls. That instinct is understandable — and insufficient.
The problem is no longer just who or what can access a system. It is how control is maintained when authority can generate new actions dynamically.
This requires a different model of governance.
One that treats software systems as actors whose behavior must be bounded, observed and continuously evaluated across workflows — not just permitted or denied at a point of access. Governance becomes less about static permissions and more about controlling the shape and impact of actions across systems.
That is the shift.
The conversation around AI in software development often focuses on productivity.
But as AI systems begin to participate in producing and modifying enterprise software, the more important question becomes governance.
AI is not just accelerating the software development lifecycle. It is becoming part of the software supply chain itself.
And that changes the problem.
The challenge for CIOs is no longer just managing developers, tools or pipelines. It is understanding and governing the authority that software systems exercise across them.
Because in a world where software can act on behalf of the enterprise, governance is no longer just about access.
It is about authority — what systems are allowed to do, and how that authority is controlled and measured over time.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?


금융 기술 기업 FIS가 화요일 금융 범죄 탐지용 신규 AI 에이전트를 개발했다. 이 에이전트는 앤트로픽이 자체 개발한 커넥터와 템플릿을 기반으로 구축됐으며, 개발 과정에서 앤트로픽의 파견형 엔지니어(FDE) 팀이 내부에 투입됐다.
기업 CIO들은 자체 데이터 품질 문제와 AI 모델 활용의 복잡성으로 인해, AI 벤더의 FDE(Forward Deployed Engineer) 즉 파견형 엔지니어 서비스에 점점 더 많은 비용을 지불하고 있다.
다만 이러한 팀을 어떤 방식과 목적로 도입하느냐에 따라, 기업이 AI 역량을 한 단계 끌어올릴 수 있을지 아니면 끝이 없는 컨설팅 비용 구조에 묶이게 될지가 갈린다.
FIS는 캐나다 몬트리올은행(BMO)과 아말가메이티드 은행을 해당 에이전트의 첫 도입 기업으로 공개했다. 이 에이전트는 은행 핵심 시스템 전반에서 데이터를 수집해 자금세탁 방지 조사 시간을 수시간에서 수분으로 단축하고, 가장 위험도가 높은 사례를 선별해 제공하며 모든 의사결정 과정에 대한 감사 가능성과 추적성을 확보한다.
FIS는 4일 보도자료를 통해 “앤트로픽의 응용 AI(Applied AI) 팀과 FDE가 함께 금융 범죄 AI 에이전트를 공동 설계하고 있으며, FIS가 향후 독립적으로 추가 에이전트를 구축·확장할 수 있도록 지식 이전도 진행하고 있다”라고 밝혔다.
뉴욕 기반 기술 컨설팅 기업 트라이베카 소프트텍의 최고전략책임자 아만 마하파트라는 유사한 AI 벤더 협업을 평가할 때 비용 흐름을 면밀히 살펴야 한다고 조언했다.
마하파트라는 “FIS와 앤트로픽 모델에서 구조적으로 가장 중요한 부분은 실제로 FDE 비용을 누가 부담하느냐”라며 “이는 CIO들이 반드시 던져야 할 질문이지만 대부분 간과하고 있다”라고 지적했다.
가트너의 수석 디렉터 애널리스트 알렉스 코케이루의 최근 보고서에 따르면, FDE 비용은 일부 AI 프로젝트를 위태롭게 만들 수 있다. 코케이루는 “2028년까지 기업의 70%가 높은 벤더 비용과 내부 역량 부족으로 인해 FDE 중심 협업에서 구축된 에이전틱 AI 솔루션을 포기하게 될 것”이라고 전망했다.
이 문제는 전적으로 AI 벤더의 책임만은 아니라는 지적이다. 많은 IT 조직이 데이터를 정제하고 AI 활용에 적합하도록 만드는 사전 준비를 충분히 하지 않고 있으며, 조직 내부의 정치적 역학과 개인 간 이해관계도 중요한 변수로 작용한다.
코케이루는 보고서에서 “FDE 성공에 가장 중요한 도메인 전문가일수록 이를 방해할 유인이 가장 크다”라며 “자신의 전문성이 에이전틱 자동화로 흡수된다고 인식한 전문가는 실제 업무 프로세스가 아닌 형식적인 절차만 제공하고, 그 결과 해당 기반으로 구축된 AI 에이전트는 의도적으로 누락된 예외 상황에서 실패하게 된다”라고 분석했다.
이어 “여러 차례 배포 이후에도 FDE 투입 규모가 줄지 않는다면 이는 역량이 아니라 의존성이 형성됐다는 신호”라며 “활용 사례가 성숙해져도 투입 노력이 감소하지 않는다면 기업은 스스로 운영해야 할 영역에 컨설팅 비용을 지불하고 있는 것”이라고 지적했다.
FIS와 앤트로픽 협업 사례에 대해 마하파트라는 “BMO와 아말가메이티드 은행이 분기별 컨설팅 비용 형태로 앤트로픽의 FDE에 직접 비용을 지불하는 구조가 아니다”라며 “FIS가 FDE 비용을 흡수해 전체 은행 고객군에 분산시키는 방식”이라고 설명했다.
이어 “각 은행이 개별적으로 엔지니어링 팀을 구성해 동일한 컨텍스트 경계, 섀도 자율성 통제, 탈옥(jailbreak) 저항 테스트를 반복 수행하는 방식보다 훨씬 경제적인 구조”라고 평가했다.
마하파트라는 이러한 문제의 상당 부분이 생성형 AI와 에이전틱 AI의 마케팅 방식에서 비롯됐다고 지적했다. “AI를 통해 더 적은 인력으로 더 많은 일을 할 수 있다는 초기 ROI 논리는 규제가 엄격한 금융 업무 환경에서는 현실과 맞지 않는 메시지였다”라고 말했다.
보안 AI 연합(CoSAI) 회원이자 ACM AI 보안 프로그램(AISec) 위원인 닉 케일은 FIS의 발표를 두고 “최첨단 AI가 아직 제품 단계에 이르지 못했음을 인정한 것”이라고 평가했다. 이어 “CIO들은 소프트웨어를 구매한다고 생각했지만 실제로는 전문 서비스 계약을 체결하고 있는 것”이라며 “이는 모든 기업 AI 도입에서 비용 구조, 의존성 구조, 거버넌스 모델을 바꾸는 요소”라고 설명했다.
케일은 발표 문구 자체가 에이전틱 전략의 방향성을 보여준다고 분석했다. “FIS는 모든 에이전트 의사결정이 추적 가능하고 감사 가능하다고 밝혔는데, 이는 사실이지만 핵심 질문은 아니다”라며 “진짜 어려운 문제는 에이전트가 어떤 결정을 내렸는지를 검증하는 것이 아니라, 애초에 어떤 결정을 맡길 것인지 정하는 것”이라고 짚었다.
이어 “은행은 수십 년간 의사결정 권한 체계를 구축해 왔지만, 외부 엔지니어가 만든 에이전트 구조에는 이를 그대로 적용하기 어렵다”라고 덧붙였다.
또한 “FDE 팀이 철수한 이후에도 조직이 에이전틱 워크플로를 운영하고, 모니터링하며, 문제를 제기하고, 안전하게 수정할 수 있는지가 CIO의 핵심 판단 기준”이라며 “그렇지 않다면 이는 성공적인 구축 프로젝트일 수는 있어도 아직 기업 역량이라고 보기는 어렵다”라고 강조했다.
컨설팅 기업 액셀리전스의 CEO이자 전 맥킨지 북미 사이버보안 책임자였던 저스틴 그라이스 역시 이 같은 견해에 동의했다.
그라이스는 “진짜 위험은 비용이 아니라 의존성”이라며 “수십만 달러를 들여 시스템을 운영 환경에 올리는 것 자체는 문제가 아니다”라고 말했다. 이어 “문제는 해당 시스템을 벤더만이 운영하거나 확장할 수 있고, 심지어 완전히 이해할 수 있는 구조가 되는 순간부터 발생한다”라고 지적했다.
일부 컨설팅 구조의 문제는 IT 역량 부족을 가리는 데 있는 것이 아니라, AI 도입 과정에서 ‘지름길’을 허용한다는 점이다.
그레이하운드 리서치의 수석 애널리스트 산치트 비르 고기아는 “FDE에 비용을 지불하는 구조는 에이전틱 AI의 ROI 자체를 훼손하는 것이 아니라, 단순화된 ROI 논리를 무너뜨리는 것”이라며 “이 차이는 매우 중요하다”라고 말했다.
이어 “지난 2년간 기업 AI는 지나치게 깔끔한 인력 절감 스토리로 포장돼 왔다. 모델을 도입하고, 업무를 자동화하고, 인력을 줄여 비용을 절감한다는 식의 접근은 이사회에는 매력적으로 보일 수 있지만 현실을 충분히 반영하지 못한다”라고 설명했다.
고기아는 “대기업은 자동화를 기다리는 정형화된 업무의 집합이 아니라, 예외 상황, 레거시 시스템, 취약한 통합 구조, 접근 통제, 문서화되지 않은 임시 대응, 규제 요구, 그리고 ‘프로세스로 위장된 인간 판단’이 얽힌 복잡한 구조”라며 “FDE는 AI를 실제로 작동하게 만들기 위한 비용 청구서에 가깝다. 이는 혁신이 아니라 더 정교해진 의존성”이라고 강조했다.
또 다른 FDE 관련 우려는 이해 상충 가능성이다. 복잡성을 해결하기 위해 비용을 받는 AI 벤더가, 동시에 그 복잡성의 상당 부분을 만들어낸 주체일 수 있다는 점이다.
프리랜서 기술 분석가 카미 레비는 이러한 비즈니스 구조가 기업의 목표를 저해할 수 있다고 지적했다. 레비는 “AI 에이전트가 조직 전반에서 고도화된 워크플로를 자율적으로 생성·배포·운영하는 것이 목표라면, 기존에 높은 수익을 창출해 온 유지보수 계약 모델과 충돌할 수 있다”라고 말했다. 이어 “고객과 함께 에이전트를 구축하기 위해 FDE를 지속 투입해야 한다면, 장기적인 지원이 필요 없는 수준까지 에이전트를 고도화할 유인이 과연 존재하는지 의문”이라고 덧붙였다.
또한 “FDE 중심 비즈니스 모델이 초기 모델 설계에까지 영향을 미칠 수 있으며, 지속적인 FDE 지원이 필요하도록 AI 플랫폼이 의도적으로 설계됐을 가능성도 있다”라고 분석했다.
dl-ciokorea@foundryco.com


When financial tech vendor FIS announced its new AI agent for detecting financial crimes on Tuesday, it made much of its embedding of a team of forward deployed engineers (FDEs) from Anthropic to make it happen. It’s just one of the dozen or so companies working with Anthropic on developing agents for financial services using new connectors and so-called “ready-to-run” templates Anthropic announced the same day.
Enterprise CIOs are increasingly paying for the services of AI vendors’ FDEs, given their own data quality issues and the complexity of working with AI models.
But how and why such teams are brought in can make the difference between whether the enterprise is helped to get to the next AI level or becomes a hostage to never-ending consulting costs.
FIS listed the Bank of Montreal (BMO) and Amalgamated Bank as the first two companies to deploy its agent, which it said will compress anti-money-laundering investigations from hours to minutes, assembling evidence across a bank’s core systems and surfacing the riskiest cases for review with full auditability and traceability of decisions. “Anthropic’s Applied AI team and forward-deployed engineers (FDEs) are embedded with FIS to co-design the Financial Crimes AI Agent and transfer knowledge so FIS can build and scale additional agents independently over time,” it said.
Aman Mahapatra, chief strategy officer for Tribeca Softtech, a New York City-based technology consulting firm, suggests CIOs follow the money when evaluating similar work with AI vendors.
“The structurally interesting thing about the FIS-Anthropic model is who actually pays the FDE cost. This is the question CIOs should be asking but mostly are not,” Mahapatra said.
The cost of FDEs could put some AI projects in jeopardy according to a recent report by Alex Coqueiro, a senior director analyst with Gartner. He predicted that by 2028, “70% of enterprises will be forced to abandon agentic AI solutions from FDE-led engagements because of high vendor costs and lack of internal skills to evolve them independently.”
He argued that the problem is not entirely the fault of the AI vendor. Many IT operations don’t put in the necessary preparatory work to clean their data and to make it AI-friendly. Internal corporate politics/personalities is another critical factor.
“The domain experts most critical to FDE success have the strongest incentive to undermine it. An expert who perceives the FDE as capturing their expertise for agentic automation will give the official process instead of the real one, and the AI agent built on it will fail on the exact edge cases they chose not to mention,” Coqueiro said in the report. “Flat FDE effort across successive deployments is the signal that an engagement has produced a dependency, not a capability. When effort does not decrease as use cases mature, the organization is paying consulting rates for operations it should own.”
In the case of FIS’s work with Anthropic, said Mahapatra, “BMO and Amalgamated are not writing direct checks to Anthropic for forward-deployed engineers at quarterly consulting rates. FIS is absorbing the FDE engagement and amortizing it across its banking customer base.”
That approach, he said, “is meaningfully better economics than direct Anthropic engagements where each bank funds its own embedded engineering team to redesign the same context boundaries, shadow autonomy controls, and the jailbreak resistance testing in isolation.”
Mahapatra said much of this problem stems from how generative and agentic AI have been marketed. The original ROI thesis, he said, was that AI enables enterprises to do more with fewer people, but that was “a marketing pitch that was never going to survive contact with regulated banking workflows.”
Nik Kale, a member of the Coalition for Secure AI (CoSAI) and of ACM’s AI Security (AISec) program committee, said that he sees FIS’s presentation of its work with Anthropic as “a concession that frontier AI isn’t a product yet. CIOs thought they were buying software. They’re actually buying a professional services engagement. That changes the cost model, the dependency model and the governance model for every enterprise AI deployment.”
Kale said the statement’s wording gives a clue about the agentic strategy.
“The FIS release says every agent decision is traceable and auditable. True statement, wrong sentence. The harder question isn’t auditing what the agent decided. It’s deciding which decisions are the agent’s to make in the first place. Banks have decades of decision-rights frameworks. They don’t translate cleanly to agent harnesses built by someone else’s engineers,” Kale said. “The CIO test is simple: after the forward-deployed team leaves, can your organization still operate, monitor, challenge, and safely modify the agentic workflow? If the answer is no, it’s not mature yet. It may be a successful implementation project, but it’s not yet an enterprise capability.”
Justin Greis, CEO of consulting firm Acceligence and former head of the North American cybersecurity practice at McKinsey, agreed with Kale.
“The bigger risk isn’t the cost of these engagements. It’s the dependency they can create. Spending a few hundred thousand dollars to get something into production isn’t the issue,” Greis said. “Ending up with a system that only the vendor can operate, extend, or even fully understand is where things start to break down.”
The problem with some of these consulting arrangements is not that they hide IT deficiencies as much as they enable AI shortcuts.
Enterprises paying FDE teams “do not undermine the ROI case for agentic AI. They undermine the lazy version of the ROI case. That distinction matters,” said Sanchit Vir Gogia, chief analyst at Greyhound Research. “For the past two years, too much of the enterprise AI narrative has been sold as a tidy labor-reduction story. Buy the model. Automate the work. Reduce the people. Capture the savings. It is neat, board-friendly, and deeply incomplete. Large enterprises are not collections of clean tasks waiting to be automated. They are collections of exceptions, legacy systems, fragile integrations, access controls, undocumented workarounds, compliance obligations, and human judgement pretending to be process. Forward deployed engineers are the invoice for making AI real. That is not transformation. That is dependency with better stationery.”
Another FDE concern is the inevitable conflict of interest that can exist where the AI vendor that is being paid to fix the complexity is also the vendor that created much of that complexity in its model.
Carmi Levy, an independent technology analyst, said the business case can undermine enterprise objectives. “If AI agents are supposed to autonomously create, deploy, and manage super-capable workflows at all levels of the organization, their very capability threatens the future viability of vendors who have long attached lucrative support contracts to those very same deployments. If the FDE is going to be engaged to work alongside customers to make their AI agents come alive, where is the incentive for AI vendors to build agentic systems that are so capable that they don’t require ongoing support? The FDE business model influences up-front model design, and it’s entirely possible that AI platforms are being deliberately designed to require persistent FDE support.”


As organizations rebrand themselves as AI companies, most of the conversation is focused on knowledge workers rather than the people in retail, manufacturing, and healthcare who can benefit from AI just as much. Prakash Kota, CIO of UKG, one of the largest HR tech platforms in the market, which delivers a workforce operating platform utilized by 80,000 organizations in 150 countries, explains how his company uses agentic AI, voice agents, and a democratized innovation framework to transform the frontline worker experience, and why the CIO-CHRO partnership is critical to making it stick.
How do you leverage AI for growth and transformation at UKG?
UKG is one of the largest HR, pay, and workforce management tech platforms in the market, and our expertise is in creating solutions for frontline workers, which account for 80% of the world’s workforce. This is important because when companies rebrand themselves as AI for knowledge workers, they’re not talking about frontline workers. But people in retail, manufacturing, healthcare, and so on also benefit from AI capabilities.
So the richness of our data sets, and our long history with the frontline workforce, positions us well for AI driven workforce transformation.
What are some examples?
We use agentic AI for dynamic workforce operations, which shows us real-time labor demand. Our customers employ thousands of frontline workers, and the timely market insights and suggested actions we give them are new and valuable.
We also provide voice agents. Traditionally, when a frontline worker requests a shift, managers would review availability, fill out paperwork or update scheduling software, and eventually offer an appropriate job. With voice agents, AI works directly with the frontline worker, going through background and skills validation, communication, and even workflow execution. The worker can also ask if they can swap shifts or even get advice on how to make more money in a particular month. This is where AI changes the entire frontline worker experience.
We also launched People Assist, an autonomous employee support agent. Typically, when an employee is onboarded, IT and HR need to trigger and approve workflows. People Assist not only tracks workflows, but also performs those necessary IT and HR onboarding activities so new employees are productive from day one.
What framework do you use to create these new capabilities?
For internal AI usage for our own employee experience, we use an idea-to-implementation framework, which involves a community of UKG power users who are subject matter experts in their area. Ideas can come from anybody, and since we started nine months ago, more than 800 ideas have been submitted. The power users set our priorities by choosing the ideas that will make the most impact.
Rather than funneling ideas through a small central team — a linear process that kills momentum — we’ve democratized innovation across the business. We give teams the governance frameworks, change models, and risk guardrails they need to move quickly. With AI, the most important thing isn’t to launch, but to land.
But before we adopted the framework, we defined internal personas so we could collaborate with different employee groups across the company, from sales to finance.
With the personas and the framework, we can prioritize ideas by persona, which also facilitates crowd sourcing. You’re asking an entire persona which of these 10 ideas will make their lives better, rather than senior leaders making those decisions for them.
Why do so many CIOs focus on personas for their AI engine?
Across the enterprise, every function has a role to play. We hire marketing, sales, and finance for a particular purpose. Before AI, we gave generic packaged tools to everyone. AI allows us to build capabilities to make a specific job more effective. Even our generic AI tools are delivered by persona. Its impact on specific roles is the reason personas are so important right now. Our focus is on the actual jobs, the people who do them, the skills and tasks needed, and the outcomes they want to achieve.
We know our framework and persona focus work from employee data. In our most recent global employee engagement survey, 90% said they’re getting the right AI tools to be effective. For the AI tools we’ve launched broadly across the company, eight out of 10 employees use them. For me, AI isn’t about launching 10,000 tools, because if no one uses them, it’s just additional cost for the CIO and the company.
Is the build or buy question more challenging in this nascent stage of AI?
The lifecycle of technology has moved from three years to three hours, so whenever we build at UKG, we use an open architecture, which allows us to build with a commercial product if one comes on the market.
Given the speed of innovation, we lean toward augmentation rather than build. There are areas, like our own native products, where a dedicated engineering team makes sense. But for most of our AI capabilities — customer support and voice agents, for example — we work with our vendor partners. We test and learn with multiple vendors, and decide on one usually within two weeks.
This is what AI is giving all CIOs: flexibility, rapid adoption, interoperability, and the ability to quickly switch vendors. It’s IT that’s very different from what it used to be.
Given the shift to augmentation, how will the role of the software engineer change?
For software builders, business acumen — the ability to understand context — is no longer optional. In the past, the business user would own the business context, and the developer, who owns the technology, brings that business idea to life. Going forward, the builder has the business context to create the right prompts to let AI do the building, and the human in the loop is no longer the technology builder, but the provider of context, prompts, and validation of the work. So the engineer doesn’t go away, however they now finish a three-week scope of work in hours. With AI, engineers operate at a different altitude. The SDLC stays, but agility increases where a two-week concept compresses into two days.
At UKG, you’re directly connected to the CHRO community. What should they be thinking about how the workforce is changing with AI?
The best CHROs are thinking about the skills they’ll need for the future, and how to train existing talent to be ready. They’re not questioning whether we’ll need people, but how to sharpen our teams for new roles. The runbooks for both IT and HR are evolving, which is why the CIO-CHRO partnership has never been more critical to create the right culture for AI transformation.
CIOs can deliver a wealth of employee data like roles, skillsets, and how people spend their time. And as HR leaders help business leaders think through their roadmap for talent — both human and AI — IT leaders can equip them with exactly that intelligence.
What advice would you give to CIOs driving AI adoption?
Invest in AI fluency, not just AI tools. Your people don’t need to become data scientists, but they do need a new kind of literacy — the ability to work alongside AI, question its outputs, and know when to override it. That’s a training and culture investment, not a software investment.
And redesign work before you redeploy people. Don’t just drop AI into existing workflows. Use this moment to ask what work really matters. AI is forcing us to have the job design conversations we should’ve had years ago, so it’s important to be transparent about the journey. What’s killing workforce trust now is ambiguity. Your people can handle hard truths but not silence. Leaders who communicate openly about where AI is taking the organization will retain the talent they need to get there.


Technology is evolving faster than the language we use to describe it. As a result, people are often talking past each other about what software, AI and automation are. These are treated as single categories when in reality they contain several fundamentally different disciplines and economic models. And when reality changes faster than our language, confusion follows.
That’s roughly where we are with technology right now.
This challenge is not technical, it is semantic. When different groups use the same words to mean different things, alignment becomes difficult. A software engineer, product manager and executive may all use the word “software,” but they are often referring to entirely different categories of work.
This lack of precision becomes more problematic as systems scale. Decisions about hiring, tooling and strategy depend on understanding what kind of work is being done. Without clear vocabulary, those decisions and the resulting actions are often based on incorrect assumptions.
We need terms that clarify understanding and convey a clear concept so that we can properly express the intended meaning. Software, AI, content generation and many other tech terms are being discussed; each can now have multiple meanings. They contain several fundamentally distinct ideas, disciplines and economic models. Because we lack clearly differentiated terms, people often end up talking past each other.
So, I’m going to propose a few terms. They may not be the ones that ultimately stick, but we need to start somewhere.
Bizware is already the dominant form of software. I’ve previously used this term to describe the class of software that exists primarily to support business infrastructure rather than advance computing itself. Tools like Docker, Kubernetes, React and Angular exist to help organizations assemble and operate the digital part of a business. They solve operational and integration problems rather than fundamental computing problems. Millions of developers now work primarily in this ecosystem. It has its own tools, expectations and culture that are distinct from traditional computer science. Concepts like sprints, deployment pipelines and infrastructure orchestration dominate bizware and arise from the intersection of software and business rather than from computing itself.
The rise of bizware can be seen in the widespread adoption of platforms, like the aforementioned Docker and Kubernetes, and exist to standardize the deployment of software infrastructure at scale. Docker, for example, enables developers to package applications into consistent environments, reducing variability between systems. Kubernetes extends this by orchestrating those environments across distributed systems, allowing organizations to manage complex deployments reliably.
These tools are not advancing computing theory. They are solving operational problems that arise when software becomes infrastructure. That distinction is what defines bizware.
Usage example: Our company builds bizware to integrate AWS datasets with high-speed data queries for front-end rendering.
I obviously didn’t invent the term AI Slop, but it still lacks a precise definition despite heavy use. And not all AI output has the same value. I propose AI Slop should differentiate between content that has some purpose and content that is fundamentally useless. Therefore, AI Slop is content that exists, or seems to exist, for no purpose other than existing or content that is so fundamentally flawed it cannot be used for any intended purpose.
An example of this is the videos of Will Smith eating spaghetti. It exists because people are entertained by the fact that it can exist. Anthropic’s C compiler would fit into the latter category. It is so flawed that it has no applicable use case, nor does it do anything novel, particularly with respect to existing solutions.
One of the reasons the blanket term “AI” creates confusion is that it produces outputs across multiple categories at once. The same system generates truly useless content, while also generating content that can serve a function and generate value.
Without language to distinguish these outcomes, discussions about AI tend to become circuitous. If two people didn’t agree on what the color red is, it would be very difficult to discuss art. Right now, people don’t agree on the term “AI Slop” so we have a challenge coming to a consensus about the nature of what AI generates.
Usage example: Anthropic’s C compiler is AI Slop.
Not everything AI produces is useless. The real divide is economic, not technical. I’ve often said that AI automates mediocrity. But in many circumstances, mediocre output is economically valuable.
I refer to this category as GEA: Good Enough AI.
GEA is AI-generated material that performs its intended function even if the quality is far from exceptional. The output may require small corrections or modifications, but it is good enough to complete the task. In a business context, “working” is often far more valuable than “excellent.” If someone needs a simple Android app to track gym workouts, AI can generate code that isn’t elegant but still does the job. In that situation, perfection has little economic value.
The important distinction here is, as mentioned above, mostly economic, not technical. GEA is generated content that has value, whereas AI slop does not. It doesn’t imply a quality of the output, only that the quality is high enough that it represents value to the prompter.
This is where many organizations struggle. They attempt to apply a single standard of quality across all outputs, rather than recognizing that different categories of work require different thresholds. In many business contexts, speed and cost efficiency outweigh perfection. In others, precision and originality are critical. Treating all outputs as if they should meet the same standard leads to inefficiency and misaligned expectations.
Usage example: With the right prompts, Claude produced GEA SQL queries roughly 75% of the time.
Some work will remain human by definition and some categories will require human expertise. I propose we refer to these as HRC: Human Required Content. Even when AI produces higher-quality output, that output is instantly accessible to everyone. As a result, it tends to redefine the baseline for mediocrity rather than the ceiling for excellence. Since the best work will always command an economic premium, there will always be economic value in humans that outperform AI.
This class of work is not going away. If anything, it is probably going to demand a higher premium as companies decide what about their business should be “industry-leading” versus what part of their business can merely function.
Usage example: Our clients demand high-quality HRC for their customer-facing frontend products.
For companies, adopting this vocabulary has practical implications. It allows leaders to better define roles, set expectations and allocate resources. It also helps clarify where AI can be effectively deployed and where human expertise remains essential.
More importantly, it reduces confusion. When teams can clearly distinguish between different types of work, they can make better decisions about how to approach each one.
Technological change always outpaces language. When a new technology emerges, we initially try to describe it using the vocabulary we already have. Eventually, that stops working. New terms appear to describe new categories of work, new economic realities and new technical disciplines.
We are currently in that transitional moment with AI and modern software.
Bizware represents one new category of software work. AI Slop, GEA and HRC describe different tiers of AI-generated output and the economic roles they play.
These terms may not be the ones that ultimately stick, but the categories they describe already exist. As AI capabilities stabilize and genuine business models emerge, our language will evolve to reflect how these systems are used.
When that happens, the conversation around AI and software will become a lot clearer.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?


El agua está tan integrada en nuestra vida cotidiana que ya ni siquiera nos parece algo especial. Abrimos el grifo, tiramos de la cisterna o activamos el chorro de la ducha y allí está, esperando. Sin embargo, para que eso ocurra tienen que pasar muchas cosas, un complejo ciclo del agua que garantiza no solo que circule sino también que sea óptima para el consumo humano. Es un proceso en el que la tecnología también está muy presente.
“Se puede decir que toda el agua es tecnológica. Otra cosa es que sea analógica o digital”, explica Luis Babiano, gerente de la Asociación Española de Operadores Públicos de Abastecimiento y Saneamiento (AEOPAS). “Es un sector altamente tecnificado. Otra cosa es que estemos en el inicio de la digitalización. Nos falta todavía mucho para ser unos auténticos campeones digitales”, reconoce a CIO ESPAÑA.
“Aunque el agua sigue siendo un recurso físico, su gestión hoy es cada vez más digital”, explica al otro lado del correo electrónico María Gil, responsable de Idrica en España. Las utilities han incorporado a nivel global “sensores IoT, sistemas SCADA avanzados, telelectura, plataformas de analítica y, más recientemente, arquitecturas de datos tipo data lake que permiten integrar información de toda la operación”, apunta, lo que permite hacer una gestión más basada en datos.
Aun así, la digitalización del ciclo del agua es uno de los retos a los que se enfrenta el sector, uno que se vuelve mucho más acuciante cuando se tiene en cuenta el contexto en el que opera el agua. “La importancia es enorme porque el agua es un recurso cada vez más escaso y sometido a una gran presión”, explica Gil.
Las organizaciones ecologistas llevan años alertando sobre el impacto que tiene la presión creciente sobre los acuíferos, así como el coste que la crisis climática pasa en término de sequías. Según un informe de la ONU publicado en enero, el mundo ha entrado ya en una fase de “bancarrota hídrica”. “Muchas regiones han vivido muy por encima de sus posibilidades hidrológicas. Es como tener una cuenta bancaria a la que se le extrae dinero cada día sin que entre un solo depósito. El saldo ya es negativo”, explicaba entonces Kaveh Madani, el autor principal del informe.
España es, de hecho, uno de los terrenos más complejos en lo que presión hídrica se refiere. WWF advierte de que el país “se queda sin agua”, por ejemplo, y cada vez se habla más de estrés hídrico. La situación es compleja, porque, como advierte la propia industria del agua, también se pierden cantidades importantes por culpa de los problemas de las propias infraestructuras que dan soporte al ciclo del agua. Algunas estimaciones hablan de que entre el 19 y 20% del agua se desperdicia por fugas o averías.
La digitalización podría ayudar a ser más eficaces y, sobre todo, a mejorar la eficiencia y resiliencia del ciclo del agua. Como apuntan las fuentes expertas, se podría prever situaciones complejas, identificar problemas, optimizar redes y mejorar las cosas.
En este proceso de salto a la digitalización, hay luces y elementos positivos, pero también hay matices que invitan a poner en cierta perspectiva el optimismo. Esto es, hablar con el sector deja claro que se están haciendo cosas y que existe mucho interés, pero que se necesita mucha más inversión y mucha más sensibilidad ante la importancia del problema y la necesidad de actuar para mejorar esas infraestructuras del agua.
“España es uno de los países más avanzados en gestión del agua y eso se está trasladando también al ámbito digital”, defiende Gil. “Estamos viendo utilities que ya operan con plataformas integradas, modelos de gemelo digital, analítica avanzada y despliegues amplios de telelectura”, ejemplifica.
El PERTE del agua (que destinó parte de los fondos europeos del Plan de Recuperación, Transformación y Resiliencia a la digitalización del ciclo del agua) ha servido para dar impulso a la transformación. “El PERTE del agua ha sido una auténtica semilla para sembrar la digitalización en el sector y esto es muy positivo”, señala Babiano. También Gil confirma que “está acelerando” el cambio. Así, ya existen proyectos que incorporan herramientas clave y que “pueden servir de locomotoras”, como apunta el gerente de AEOPAS. Pero esto es solo una parte de la foto. “El reto no es tanto tecnológico —la tecnología ya existe— como de adopción, integración y cambio cultural dentro de las organizaciones”, indica Gil.
Babiano es claro a la hora de pintar el panorama del sector: la digitalización del agua necesita financiación, una que llegue de forma sostenida. Puede que esto lleve a que cambien las tarifas del agua, pero Babiano apunta que se necesitan “también fuentes públicas para su desarrollo”. “Entre otras cosas, porque la digitalización debe ir de la mano con un proyecto país”, defiende. Un aspecto clave por el que es importante que se integre en una visión a nivel Estado y no se quede solo en algo de casos concretos es que se necesita que la digitalización llegue a todas partes. O, como asegura el experto, “no solo nos debemos centrar en las ciudades, sino también en los municipios pequeños”. Se trata de evitar que existan “dos velocidades”, una para municipios capaces de ser digitales y otra para aquellos que se quedarán con “unas carencias importantes en todo tipo de infraestructura, incluida la digitalización”.
Las ‘utilities’ han incorporado a nivel global “sensores IoT, sistemas SCADA avanzados, telelectura, plataformas de analítica y, más recientemente, arquitecturas de datos tipo ‘data lake’ que permiten integrar información de toda la operación”, apunta María Gil (Idrica)
Aquí entra, además, otro factor importante en el que incide Babiano. La digitalización del ciclo del agua necesita una base sólida: antes, hay que optimizar la propia infraestructura física que lleva el agua a la ciudadanía. Puede que hablar de cañerías y plantas de depuración no sea tan cool como hablar de IA, pero esa es la base del ciclo del agua y ahí es donde aparecen los primeros problemas. Ahora mismo, todavía existen zonas de España sin depuradoras (a pesar de que la normativa comunitaria lo penaliza). Además, en líneas generales, la infraestructura del agua tiene ya sus décadas, lo que crea focos de tensión. “Más del 30% de nuestras redes tiene más de 40 años”, recuerda Babiano. Para entenderlo solo hay que pensar en la reforma del baño de casa: llega un momento en el que cambiar las cañerías es inevitable. Aquí pasa a una mayor escala.
“La digitalización nos permite pasar de un nivel razonable de solvencia y mantenerlo en el tiempo”, afirma Babiano. Pero la transformación digital no debe ir sola: el experto advierte que “primero, se trata de optimizar nuestras pérdidas, invertir en nuestras redes, etc y luego entrar (o entrar en paralelo) en la digitalización”.
Todo esto ocurre, igualmente, en ese momento lleno de retos para el sector que no se debe perder de vista. “Estamos ante una necesidad imperiosa de una transición”, asegura Babiano. Las cuencas hidrográficas se enfrentan a sequías, a danas (que, como recuerda el experto, llevan al límite en tiempos récord a las infraestructuras, como a las plantas depuradoras que deben asumir una avalancha de agua) y a una mayor presión. “Y, sin embargo, no tenemos un proyecto muy claro en torno a cómo invertir en esta transición hídrica”, asegura. Babiano compara la situación de esta transición con la que viven la transición energética o la de movilidad, en las que existen planes, medidas fiscales e incentivos para la inversión con los que ellos no cuentan. La transición hídrica no cuenta con una situación parecida, aunque desde el sector insisten en que debería serlo.
En ese contexto de transición, la digitalización podría convertirse en una aliada para afrontar los retos del agua. “La tecnología no es la única solución, pero sí es un habilitador clave”, indica Gil. “Los grandes retos del agua (sequía, estrés hídrico, sobreexplotación) tienen una dimensión estructural, climática y también de gobernanza”, explica, pero recuerda que “sin tecnología es prácticamente imposible gestionarlos de forma eficiente”. Permite ver qué está ocurriendo, qué puede fallar y tomar mejores decisiones, al tiempo que “aporta transparencia y trazabilidad”. Como resume Babiano, “la digitalización aumenta exponencialmente nuestra excelencia”. “Por ejemplo, si monitorizas toda tu red, sabes la localización inmediata de los puntos donde está perdiendo más agua de lo normal”, muestra. Se puede avisar al usuario final de lo que está pasando y localizar la fuga (y solventarla).
En España, asegura Babiano, ya existen este tipo de soluciones. “Gran parte de la reducción de muchos de nuestros consumos está viniendo de la mano de los contadores inteligentes y de la monitorización y digitalización de nuestras redes”, apunta. “Lo que no estamos logrando todavía es mayores automatismos”, señala, recordando que alcanzar los niveles más elevados de mejoras llevará un tiempo. “Todavía estamos en una fase de, podemos decir, paso del ‘novato digital’ a la ‘integración vertical’”, resume.
Pero ¿qué herramientas TI son las que esperan a la vuelta de la esquina cuando se alcanza un nivel avanzado en la digitalización?
Unas cuentas tecnologías se han convertido en emergentes en la gestión global del agua, según concluye un informe de la plataforma de software Xylem Vue. Según enumera su análisis son la colaboración entre la administración pública y la empresa privada, las arquitecturas basadas en agentes, la ciberseguridad, los sistemas de alerta temprana y, por supuesto, la ya ubicua IA generativa.
El salto a la digitalización tiene otra cara, la de las potenciales amenazas de ciberseguridad
“La inteligencia artificial está empezando a jugar un papel muy relevante, especialmente cuando ya existe una base sólida de datos”, explica Gil (Idrica es, junto con Xylem, quienes están detrás de Xylem Vue). “Su principal aportación es la capacidad de encontrar patrones complejos y optimizar decisiones en entornos con múltiples variables”, apunta. “Es importante entender que la IA no sustituye al conocimiento experto de la operación”, recuerda, pero señala que cuando se combinan ambos se logran grandes resultados. Otro de los puntos destacados son los sistemas de alerta temprana, que, como explica la experta, “son uno de los mayores cambios de paradigma en la gestión del agua”. En lugar de esperar a que el fallo se produzca e impacte en el propio servicio, se adelantan a lo que va a ocurrir. “El valor está en ganar tiempo: pasar de reaccionar a prevenir. Y en un sistema tan complejo y sensible como el del agua, esa anticipación tiene un impacto directo en la continuidad del servicio, en los costes operativos y en la confianza del ciudadano”, indica.
Aunque, eso sí, el salto a la digitalización tiene otra cara, la de las potenciales amenazas de ciberseguridad. El agua no deja de ser una infraestructura crítica y muy sensible. “Sin duda, la digitalización amplía la superficie de exposición, y el sector del agua no es ajeno a ello”, reconoce Gil, que suma que esto se ha convertido ya “en una prioridad creciente”. “Lo que estamos viendo es una evolución hacia modelos de seguridad más maduros”, afirma. “También hay una mayor concienciación en el sector”, suma. “La clave está en que la digitalización y la ciberseguridad avancen de la mano. No son elementos independientes”.


ServiceNow has unveiled updates to its workflow management platform advancing its redefinition of itself as the “AI control tower for business reinvention” at its Knowledge customer event this week.
The AI Control Tower product itself, introduced at last year’s event, gets new integrations with Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP) and other LLM providers to extend governance and observability of enterprise infrastructure, adding to its existing links with OpenAI and Anthropic. The integrations also span applications such as SAP, Oracle, and Workday. In addition, Control Tower can now discover non-human identities and connected devices to bring OT and IoT under the same governance as AI agents and cloud services.
All this ties in to the ServiceNow Action Fabric, which opens the platform to any AI agent, whether built on ServiceNow or from another source, via a Model Context Protocol (MCP) server, the company said.
And thanks to the recent acquisition of Traceloop, Control Tower now provides more extensive observability into agent behavior at runtime. Five new risk frameworks aligned with NIST and EU Act standards offer compliance controls.
To expand the reach of what ServiceNow calls the Autonomous Workforce, a group of specialist AI agents announced in February that began with a single L1 IT service desk agent, it has added “AI teammates” that work alongside humans in CRM, IT, employee services, and security and risk management.
The autonomous IT cohort includes an AIOps agent that detects anomalies, correlates events, and triggers remediation, and a specialist for site reliability engineering (SRE) that performs incident triage and postmortem documentation. Other new agents assist with asset lifecycle management and portfolio planning.
Autonomous CRM offers specialist agents for sales qualification and quoting, order fulfillment, managing invoice disputes, and service and renewal, and in the world of employee services, AI specialists act as digital employees with role-specific skills in HR, workplace services, legal, finance, procurement, supplier management, and health and safety.
To round out the offerings, ServiceNow announced Autonomous Security & Risk, designed to span the entire threat landscape from finding and remediating vulnerabilities through examining third party vendor risk.
ServiceNow EmployeeWorks, the previously announced “conversational front door for the enterprise”, is now generally available. In addition, ServiceNow announced Otto, an AI assistant that unifies Now Assist, Moveworks, and AI Experience, and operates across the enterprise.
“Rather than living inside a single application, ServiceNow Otto sits across the entire enterprise, understanding intent, routing work to the right agent, and executing it to completion,” the company said. “Employees, customers, and support teams talk, chat, search, browse, analyze, and build. ServiceNow Otto is designed to handle the rest, adapting to each employee’s role and location without requiring them to know which system handles their request. Actions are governed by AI Control Tower, which can log each AI interaction, enforce enterprise policies, and provide explainability for every decision.”
Otto is already available in EmployeeWorks and the AI Control Tower, and will be rolled out in all other products “in the year ahead.”
According to Nenshad Bardoliwalla, ServiceNow’s group VP of AI products, all this means that “together with a new commercial model that bundles everything customers need to deploy AI quickly, we’ve made it clear the era of sidecar AI is over.”
What technology analyst Carmi Levy finds most interesting in these announcements is how quickly we’re seeing AI-enabled workflows extend beyond their initial entry point in IT.
“What was once the exclusive domain of senior IT leaders and planners is now filtering across all operational areas of the typical organization, including CRM, HR, IT operations, security and risk,” he said. “AI is also deeply embedded in the average worker’s desktop and is rewriting their work experiences in the process. Likewise, it puts highly autonomous tools in the hands of organizations intent on improving productivity, sharpening customer responsiveness, and driving operational efficiencies.”
Stephen Elliot, group VP at IDC, added, “The agentic focus is critical as the company continues to expand its specialist agent library. Customers can adopt these across core workflows to realize business value and increase productivity. The recent commercial pricing model complements the agentic capabilities. It meets customers where they are in their AI maturity journey enabling a pragmatic approach to adoption.”
But, he added, “Customers should consider the combination of workflows, AI, data, governance, and security as they deploy AI capabilities. No one model can do it all.”
Indeed, he said, “We are hearing from some CIOs that they are pausing some AI use cases because of the security and governance risks.”
Charles Betz, VP principal analyst at Forrester, said that ServiceNow is on the right track, especially with its continued focus on data. “The data governance, provenance, and currency issues are not trivial. Agents reasoning at machine speed over a stale graph are going to produce wrong outputs, and it’ll be data-quality-based hallucination,” he said. In addition, “documenting decision traces within the AI domain is super important.”
Levy agreed. “ServiceNow’s offerings reflect a keen understanding of where AI can drive optimal benefit throughout all areas of the business, what those workflows might look like, and how the tools and supports need to evolve,” he said.


I keep hearing the same AI conversation everywhere I go. Better models, faster inference, more capable agents. The race is on and everyone wants in.
But something is missing.
Most organizations I work with have already moved past experimentation. AI is embedded in workflows, shaping customer interactions, processing internal documents, informing operational decisions. The question of whether AI works has largely been answered. What has not been answered, in most cases, is something far more basic.
Where does our data actually go when it flows through an LLM? Who can access it? Under which jurisdiction is it processed? Could it end up improving someone else’s model?
These are not hypothetical concerns. These are the questions that surface when a regulator asks how your organization handles personal data, when a client wants to know what happens to the documents they share with your AI-powered service, or when a board member reads about a policy change at one of the major AI providers and wants to know what it means for the business.
In my experience, most organizations cannot answer these questions clearly. Not because they do not care, but because the adoption moved faster than the governance. Teams were encouraged to experiment, pilots became production and somewhere along the way, the data conversation got left behind.
This is not a fringe problem. Recent industry data suggests that most enterprise leaders are now actively redesigning their data architectures, not because the AI did not work, but because the way it was connected to their data became a liability.
The capability conversation has dominated for the last two years. I think the next two years will be defined by a different question entirely: Not what can AI do, but who controls what it knows?
When I talk to CIOs about AI risk, the conversation almost always starts with model accuracy, hallucinations or bias. Rarely does anyone open with: “Do we actually know where our data goes when someone on the team uses an LLM?”
That question matters more than most people realise. Not because of some hypothetical future breach, but because right now, most organizations are operating across a mix of LLM tiers and tools with no unified picture of what data is going where or under what terms.
OpenAI, Anthropic and Google all operate a two-tier system. At the enterprise and API level, based on publicly available policies, the commitments are clear: Your data is not used for model training. But those protections only apply if everyone in your organization is using the enterprise tier. In practice, that is almost never the case.
Teams sign up for free accounts to test things quickly. Employees paste internal documents into consumer-tier tools because it is faster than raising a ticket. Contractors use personal subscriptions for client work. None of this is malicious. All of it is invisible to leadership.
And the consumer tiers operate under very different rules. OpenAI’s consumer ChatGPT may use conversations for model improvement unless the user opts out. Google’s free Gemini tier works similarly. In September 2025, Anthropic introduced changes to its consumer terms: Conversations are now eligible for training by default, with data retention extending from 30 days to up to five years.
This is the shadow AI problem. Corporate data entering consumer-tier systems where it may be retained for extended periods and processed under terms nobody in the organization approved. Not because anyone made a bad decision, but because no one made a deliberate one.
When a regulator in Riyadh asks how your organization handles personal data processed through an LLM, or a client in Doha wants to know where their documents went after your team used AI to summarise them, “we think we are on the enterprise tier” is not a defensible answer. The problem is not that something has gone wrong. It is that most organizations could not prove things are going right.
Most conversations about data sovereignty still default to one question: Where is the data stored? In the context of AI, that is not enough.
I work across the UK, the Gulf and Europe. Each region is moving toward stronger data protection, but they are getting there differently, at different speeds and with different expectations. For any organization operating across borders, that creates real tension.
In Europe, GDPR set the foundation and the EU AI Act is raising the bar further. In Saudi Arabia, the PDPL is no longer a paper exercise. SDAIA issued 48 enforcement decisions in 2025 and published cross-border transfer rules requiring a four-step risk assessment before personal data leaves the Kingdom. In Qatar, the PDPPL has been in place since 2016, but enforcement was historically light. That changed in late 2024, with the National Data Privacy Office now issuing binding decisions against organizations found in violation.
Now add the LLM layer.
When an organization sends data through a cloud-based LLM, the question is not just where the data is stored. It is where the data is processed at inference time. Your infrastructure might sit in Riyadh, but if the model processes your prompt on a server in another jurisdiction, most legal frameworks would say sovereignty has not been preserved.
And as organizations move toward agentic AI, this gets harder still. Agents do not respond to a single prompt. They retrieve context from multiple sources, call external tools and chain decisions across systems. Each step is a potential jurisdiction question and a potential compliance gap that nobody mapped.
Sovereignty is not just geography. It has at least four dimensions: Where data and compute reside, who manages them, who owns the underlying technology and who governs it. Most organizations are only thinking about the first one.
Once an organization recognises the sovereignty problem, the natural instinct is to bring everything in-house. Run your own models, keep your data on your own infrastructure, remove the dependency on external providers entirely.
That instinct is understandable. It is also expensive.
Local models like Llama and Mistral give you full control. No data leaves your boundary. No third-party terms to worry about. No inference happening in a jurisdiction you did not choose. On paper, it solves the problem.
In practice, a production-grade on-premise deployment for a 70 billion parameter model costs anywhere from $40,000 to $190,000 in hardware alone. Self-hosting only becomes cost-effective if you are processing above roughly two million tokens per day. Below that, the API is cheaper. On top of the hardware, you need the talent to deploy, fine-tune, secure, patch and maintain these systems over time. That is not a one-off cost. It is an ongoing operational commitment that most organizations underestimate.
And there is a capability gap. The frontier models, the ones that perform best on complex reasoning, coding, analysis and multi-step tasks, are not available for self-hosting. If your use case demands the best available performance, you are using an API. That means your data is leaving your boundary, processed under someone else’s terms, in someone else’s infrastructure.
So, the trade-off is real. At the extremes, you are either paying serious money to keep your data close, or you are paying with your data by accepting terms you may not fully understand. Most organizations sit somewhere in between, but very few have made that choice deliberately. It happened by default. Someone picked a tool, someone else signed up for an account, a pilot became production and suddenly the organization is operating across a patchwork of tiers, agreements and jurisdictions that nobody designed and nobody fully controls.
This is not a technology decision. It is a strategic one. And it belongs in the boardroom, not buried in an IT procurement process.
If you want to know where enterprise AI is heading, follow the money.
The sovereign cloud market is projected to grow from $154 billion in 2025 to over $800 billion by 2032. That is not a forecast driven by hype. It is driven by enterprise buyers telling their providers: We need to control where our data lives and how it is processed.
The response has been significant. Microsoft launched Foundry Local, which lets organizations run large AI models on their own hardware in fully disconnected environments, and committed to processing Copilot interactions in-country for 15 nations by the end of 2026. Google and Oracle are pushing a model where AI services move to where the data lives rather than the other way around, deploying their cloud stacks inside customer infrastructure and sovereign regions.
These are not experimental initiatives. They are multi-billion-dollar structural shifts. And they tell me something important: The providers are not leading this conversation. They are responding to it.
But it is worth being honest about what sovereign offerings deliver today. They come with cost premiums, longer deployment timelines and in some cases a reduced feature set. The trade-off does not disappear. It changes shape. CIOs still need to understand what sovereignty means for their specific context, not just trust that a sovereign label on a cloud product solves the problem.
If I were advising a CIO today, I would not start with tools or vendors. I would start with visibility.
Know exactly what data flows through which LLM and under what terms. Not at the contract level, at the actual usage level. Which teams are using which tools? Are they on consumer or enterprise tiers? Who approved the terms? If you cannot answer those questions today, that is the first problem to solve.
Map your data exposure against every jurisdiction you operate in, and do not stop at storage. Understand where inference happens. Understand where context is retrieved from. If you are operating across the EU, Saudi Arabia and Qatar, those are three different regulatory frameworks with three different enforcement postures, and the LLM layer touches all of them.
Audit for shadow AI. Not as a one-off exercise, but as a recurring part of your governance. Employees are not going to stop using AI tools. The goal is not to block adoption. It is to make sure adoption happens on terms the organization has chosen deliberately.
Do not default to local models out of fear or cloud models out of convenience. Make the trade-off intentionally, with real cost and capability analysis behind it. Understand what you gain and what you give up in each direction and make sure that decision is documented and owned at the right level.
Build procurement frameworks that treat LLM data handling as a first-class requirement. Not a footnote in a vendor assessment, but a core criterion alongside security, resilience and performance. If a provider cannot clearly explain what happens to your data, that is not a gap in their documentation. It is a gap in their offering.
The readiness gap is real. 95% of enterprise leaders say they plan to build sovereign AI foundations. Based on current research, only 13% are on track. The organizations that close that gap first will scale faster, win more trust and defend their choices with confidence. The rest will have the conversation forced on them.
For the last couple of years, the focus has been on what AI can do. Bigger models, faster outputs, more automation. That progress is real and I do not think anyone should slow down.
But I think the next phase will be defined by something different. Not capability, but control. Not what the model can do, but whether you can prove you know where your data went, who had access to it, what terms governed it and what happens to it next.
CIOs will not be judged on whether they adopted AI. They will be judged on whether they adopted it in a way they can defend. To a regulator. To a client. To a board. In plain language, with evidence they can stand behind.
In my first article for CIO Network, I argued that explainability is the control layer that makes AI safe to scale. Data sovereignty is the other half of that equation. Explainability answers “why did the system do that?” Sovereignty answers “where did the data go and who controls it?”
If you can answer both, you can scale with confidence. If you cannot, you are building on a foundation you do not fully own.
And once that foundation is questioned, it is very difficult to rebuild.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?


El escenario no es hipotético: algunas de las empresas que más lejos llegaron en sustituir personas por IA han tenido que revertir parte del camino. Para el CIO, ese desajuste es especialmente relevante, porque los agentes están rediseñando cómo TI detecta problemas, decide y responde. Y porque esa perspectiva completa es justo la que la dirección y otras áreas necesitan que alguien ponga sobre la mesa.
Klarna se convirtió en 2024 en una referencia europea de lo que la IA podía hacer por una empresa. Su asistente de IA llegó a gestionar dos tercios de los chats de atención al cliente en su primer mes, con un trabajo equivalente al de 700 agentes a tiempo completo. Como resultado, decidieron congelar las contrataciones, y la plantilla bajó de unos 5.000 a 3.800 empleados. Apenas un año después, el propio CEO admitió que la compañía había ido demasiado lejos al sustituir personas por agentes, lo que fue en detrimento del servicio y del producto. De hecho, la empresa revirtió el camino: volvió a contratar agentes humanos para asegurarse de que el cliente siempre pudiera hablar con una persona.
Lo interesante aquí no es entenderlo como que la IA había fallado. El problema fue otro: entender la función de atención al cliente en clave de productividad y costes, sin ver el conjunto. Si se medía por tiempos de respuesta y FTE equivalentes, la automatización era óptima. Medida por satisfacción, calidad percibida y capacidad de resolver casos complejos, el resultado era otro, y acabó obligando a dar marcha atrás.
Es tentador leer Klarna en clave de una historia de atención al cliente. Pero el patrón afecta a cualquier función de negocio. Introducir agentes de IA no es añadir una herramienta más: reordena la toma de decisiones, el aprendizaje del día a día y, en última instancia, cómo se entrega el servicio.
Si solo se piensa en términos de productividad (es decir, qué se automatiza, cuánto se ahorra, cuántas FTE equivalentes libera), es fácil perder de vista las implicaciones más profundas. Es fácil terminar descubriendo tarde que lo que se entrega ya no es lo mismo, aunque sobre el papel se produzca más.
Esto es algo difícil de ver al principio. Una función puede realizarse peor y, aun así, mostrar mejores métricas operativas durante meses. Las consecuencias aparecen en otras áreas, lejos de la función que se automatizó: en reputación, en clientes perdidos o en decisiones mal tomadas.
En el terreno del CIO este patrón aparece antes y con más fuerza. Cuando un agente deja de ser un asistente que ayuda y pasa a intervenir, entonces llegan los cambios. Por ejemplo, condiciona qué alertas llegan al equipo, qué modificaciones de código se proponen o qué incidencias se priorizan. Esto va más allá de acelerar el trabajo: se decide lo que el equipo ve o deja de ver, y se desplaza el espacio donde se toman las decisiones.
En otras palabras, los agentes no solo ejecutan. Cambian la forma de detectar los problemas, la forma de responder e incluso la de aprender. Si este fenómeno se evalúa únicamente con métricas de rendimiento, se corre exactamente el riesgo de Klarna dentro de casa: ganar velocidad y perder perspectiva.
De ahí la paradoja que muchos responsables de TI empiezan a notar. La organización puede actuar más rápido, entregar más volumen y automatizar más decisiones, y a la vez perder el contacto con la complejidad de la realidad.
Antes, un equipo de soporte aprendía no solo resolviendo incidencias, sino viendo dónde fallaban las integraciones, o qué comportamientos del usuario revelaban un problema más profundo. Si ese trabajo pasa por sistemas automatizados, la organización puede seguir resolviendo, pero los empleados pierden recorrido de aprendizaje.
El riesgo que corre el equipo es que la IA funcione lo suficientemente bien como para desplazar fuera del primer plano el conocimiento y capacidades sobre cómo debe operar una unidad de negocio.
Es aquí donde cambia de verdad el papel del CIO. No se trata de que sea el responsable de adoptar agentes y automatizar procesos de forma inteligente. Pasa a ser quien aporta, dentro y fuera de su área, la lectura completa de lo que la IA hace con una función de negocio. Es decir, ir más allá de las ganancias de productividad, aportando otros aspectos que no se ven, como la experiencia, la perspectiva de negocio y los cambios en la entrega de un servicio, sea hacia el empleado o hacia el cliente.
Esa perspectiva tiene un gran valor tanto en la dirección general y en otras áreas como operaciones, atención al cliente y, por supuesto, Recursos Humanos. En el contexto actual, donde se ven continuos anuncios de reducción de plantilla, la conversación tiende a quedarse en ahorro de costes y tiempos. El CIO, está bien situado para aportar la otra parte: dónde conviene mantener una supervisión sólida, qué puede delegarse en la IA y dónde hace falta prever la posibilidad de revertir una automatización que, sobre el papel, funcionaría.
Esa capacidad de revertir es, de hecho, una de las que la organización no puede perder. No todas las organizaciones pueden recuperar capacidades con la rapidez con la que se pierden.
La misión, por tanto, no es frenar la IA ni desconfiar por principio de los agentes. Es aportar claridad qué puede delegarse y qué no conviene ceder sin perder capacidad de intervención. En unos casos la respuesta es clara: tareas repetitivas, clasificación inicial, generación de borradores o búsqueda técnica. En otros la frontera es más delicada: priorizar riesgos, decidir excepciones, cambiar sistemas heredados o actuar sobre procesos sin suficiente supervisión.
Ese será uno de los servicios más relevantes en el papel del CIO durante los próximos años. Más allá de avanzar en la adopción de los agentes, tendrá que aportar, dentro y fuera de la TI esa lectura necesaria del impacto de los agentes en una función de negocio. Y, finalmente, conservar la capacidad de revertir cuando no se entregue lo que debería, por muy bien que luzca en las métricas.
Todo esto apunta una dirección: el papel del CIO se está ensanchando más allá de lo puramente tecnológico. Gobernar bien la IA exige capacidades nuevas, que hoy apenas empezamos a nombrar y que van a marcar buena parte de la diferencia en los próximos años. A ellas dedicaremos las próximas tribunas.


AI represents a fundamental shift in how organizations work and innovate. It demands an equally fundamental shift in how CIOs approach governance.
Forward-looking leaders are moving beyond traditional gatekeeping by creating “paved roads”: secure, pre-approved pathways that embed security controls, automated data protections, and real-time monitoring directly into AI workflows so teams can innovate rapidly within safe boundaries. When done right, this approach accelerates adoption, builds confidence across the C-suite and board, and transforms security from a bottleneck into a competitive advantage.
But how do you know whether it’s done right? Traditional IT metrics aren’t enough to measure success in the AI era. Here, we discuss three essential KPIs to evaluate speed and security as AI usage evolves.
What it measures: how long it takes to operationalize new AI tools.
This is your ultimate agility metric. Consider AI adoption using traditional IT processes: After your marketing team requests a new AI tool, security may block it even after a multi-week review. While this initiative loses steam, a competitor with modernized processes could quickly deploy the same capability.
The costs of outdated IT processes are far-reaching. Product roadmaps can be delayed by months, and employees can grow frustrated with the lack of innovation. New hires may accept other offers because they want to work with modern AI tools.
To accelerate processes, adopt secure-by-design templates and pre-approved frameworks. With these, teams can implement security controls upfront and automatically validate tools as ready for use. AI features can be shipped in hours or days, rather than weeks or months.
The goal isn’t just speed — it’s predictable, secure speed. When your deployment time decreases while security incidents also decrease (more on that below), you’ve cracked the code.
What it measures: The percentage of employees using approved AI tools, how frequently they use them, and whether they’re following guidelines or finding workarounds
This metric reveals whether your security approach is working. High adoption of approved tools is a sign employees trust your solutions and you’re preventing shadow IT. Low adoption could indicate you’re squeezing the water balloon — blocking tools on one side while employees find riskier workarounds on the other.
Approved tools only provide value when people use them. And users of approved tools are users under corporate security controls. This KPI measures both ROI and risk reduction simultaneously.
What to track:
What it measures: The number and severity of AI-related security incidents, but more importantly, your prevention rate — how many threats you stop before they become incidents
You can move fast and drive high adoption of AI tools, but if security incidents are increasing, you’re building on quicksand. Conversely, if you have zero incidents because you’ve blocked everything, you’re not enabling innovation.
The goal is prevention-first security: proactive controls that stop threats at ingress, real-time prompt injection prevention, automated sensitive data detection, and context-aware access controls.
Track these incident categories:
Track prevention metrics:
Here’s the critical insight: These KPIs must improve together. The success pattern looks like this: Deployment speed increases + adoption increases + incidents decrease = effective AI enablement.
Any other pattern indicates problems. Fast deployment with rising incidents? Your security controls have gaps. High adoption with slow deployment? You’re creating bottlenecks. Low incidents with low adoption? You’re blocking innovation.
Don’t wait for perfect measurement infrastructure. Start this month:
The organizations winning with AI aren’t the ones with the best models or the most data. They’re the ones where security and innovation teams have figured out how to move fast together.
These three KPIs are how you measure whether you’re winning.
Learn how CrowdStrike helps organizations build secure, scalable AI pathways with real-time protection and governance built in.


IDC에 따르면 지난해 말 기준 2,800만 개 이상의 AI 에이전트가 배포됐으며, 2029년에는 10억 개 이상이 실제 운영 환경에서 활용되면서 하루 2,170억 건의 작업을 수행할 것으로 전망된다.
매출 46억 달러(약 6조 7,500억 원) 규모의 글로벌 신용평가 기업 트랜스유니온(TransUnion)의 최고 기술·데이터·분석 책임자 벤캇 아찬타는 “AI 에이전트 PoC(Proof of Concept)을 구축하는 것은 쉽다”라며 “하지만 이를 통제하고, 보안을 확보하며, 확장하는 것은 완전히 다른 차원의 과제”라고 말했다. 특히 금융 서비스와 헬스케어처럼 규제가 엄격한 산업일수록 이러한 어려움은 더욱 크다고 설명했다.
이 문제를 해결하기 위해 트랜스유니온은 지난 3년간 에이전틱 AI 플랫폼 ‘원트루(OneTru)’를 구축했다. 목표는 기존의 규칙 기반 전문가 시스템처럼 신뢰성과 예측 가능성을 확보하면서도, 생성형 AI처럼 유연하고 챗봇처럼 쉽게 사용할 수 있는 환경을 만드는 것이었다.
핵심은 두 접근 방식의 장점을 결합하는 데 있었다. 설명 가능성과 안정성이 중요한 핵심 업무는 전통적인 시스템이 담당하고, 생성형 AI는 특화된 작업에 한해 제한적으로 적용하는 방식이다. 이를 구현할 인프라가 시장에 존재하지 않았던 만큼, 트랜스유니온은 약 1억 4,500만 달러(약 2,100억 원)를 투자해 자체 구축에 나섰다.
검증되지 않은 기술에 대한 대규모 투자였지만, 이미 약 2억 달러(약 2,800억 원)의 비용 절감 효과를 거뒀다. 더 나아가 해당 플랫폼을 기반으로 고객용 솔루션까지 개발했다.
대표적으로 올해 3월, 트랜스유니온은 구글 제미나이 모델을 기반으로 원트루 플랫폼에서 구축한 ‘AI 애널리틱스 오케스트레이터 에이전트’를 공개했다. 이 에이전트는 내부 분석 효율을 높이는 데 활용되고 있으며, 고객 역시 데이터 과학자 없이 고급 데이터 분석을 수행할 수 있도록 지원한다.
아찬타는 “많은 고객이 트랜스유니온 데이터를 사용하면서도 다른 솔루션이나 플랫폼은 활용하지 않는다”라며 “이번 오케스트레이터 에이전트는 데이터 활용 가치를 높이고 새로운 수익원을 창출할 가능성이 있다”고 말했다.
현재 추가적인 에이전트도 개발 중이다. 아찬타는 “에이전트의 성능을 좌우하는 핵심은 오케스트레이션, 거버넌스, 보안 계층”이라며 “단순히 에이전트를 만드는 것은 며칠이면 가능하지만, 이를 안정적으로 운영하는 기반과 통제 장치가 진짜 경쟁력”이라고 강조했다. 이어 “플랫폼 위의 에이전트는 모든 가드레일과 기반을 활용하도록 설계돼 있으며, 이것이 우리의 힘”이라고 덧붙였다.
AI 에이전트를 효과적으로 통제하기 위한 핵심 전략은 작업을 여러 계층으로 분리하고, 각 계층을 서로 다른 시스템에 할당하는 것이다. 각 시스템은 일정한 제약 조건 아래 동작하며, 이를 통해 개별 에이전트의 영향 범위를 제한하고 전체 시스템에 견제와 균형 구조를 만든다. 또한 위험도가 높은 작업은 생성형 AI 이전 기술에 맡겨 리스크를 낮춘다.
트랜스유니온의 경우 핵심 의사결정은 업그레이드된 전문가 시스템이 담당한다. 이 시스템은 명확하게 정의되고 감사 가능한 규칙에 따라 동작하며, 예측 가능하고 비용 효율적이며 지연 시간도 낮다. 새로운 상황이 발생하면 LLM이 이를 분석하고, 다른 에이전트가 이를 새로운 규칙으로 변환한 뒤 인간이 검토해 최종적으로 전문가 시스템에 반영한다. 이 외에도 시맨틱 계층을 이해하거나 인간과 상호작용하는 등 다양한 역할을 수행하는 에이전트가 존재한다.
아찬타는 “신경망 기반 추론 계층인 LLM에는 인간을 개입시키고, 논리와 머신러닝 기반의 상징적 추론 계층은 자동화한다”고 설명했다.
이처럼 각 에이전트가 제한된 데이터와 역할 내에서 엄격한 제약을 가지고 동작하면, 전체 시스템은 훨씬 더 통제 가능하고 신뢰성 높은 구조로 발전한다.
이는 하나의 장인이 모든 작업을 수행하는 공방보다, 여러 작업자가 각자 역할을 나눠 수행하는 생산 라인에 비유할 수 있다. 생산 라인은 더 빠르고 안정적으로 작업을 수행할 수 있지만, 현재 많은 기업은 여전히 AI 에이전트를 장인처럼 운영하고 있다. 이러한 방식은 창의적인 결과를 만들 수 있지만, 기업 환경에서는 항상 적합한 선택은 아니다.
툴레인대학교 교수이자 ACM AI 특별 관심 그룹 의장인 니콜라스 마테이는 에이전트 시스템 간 연결 지점에서 보안을 강화해야 한다고 조언했다.
그는 “시스템 간 연결 지점마다 보안을 확보해야 한다”라며 “예를 들어 에이전트가 이메일 서비스에 요청을 보내는 경우, 두 시스템 사이에 검증 단계(체크포인트)를 두는 것이 필요하다”고 말했다. 이어 “신뢰하기 어려운 에이전트와 기존 소프트웨어가 만나는 경계 지점이 바로 보안 통제를 집중해야 할 영역”이라고 강조했다.
자동화 솔루션 기업 지터빗(Jitterbit)이 올해 3월 공개한 설문조사에 따르면, 1,500명의 IT 리더들은 AI 도입 최종 결정에서 가장 중요한 요소로 ‘AI 책임성’을 꼽았다. 이는 보안, 감사 가능성, 추적성, 가드레일 등을 포함하는 개념으로, 구현 속도나 벤더 평판, 심지어 총소유비용(TCO)보다도 높은 우선순위를 차지했다. 또한 보안, 거버넌스, 데이터 프라이버시 리스크는 비용이나 통합 문제보다도 AI 프로젝트의 운영 전환을 가로막는 주요 요인으로 나타났다. 이러한 우려는 충분히 근거가 있다.
실제 올해 초 사이버 보안 기업 코드월(CodeWall) 연구진은 맥킨지의 신규 AI 플랫폼 ‘릴리(Lilli)’를 침해하는 데 성공했다. 연구진은 자체 AI 도구를 활용해 4,700만 건의 채팅 메시지, 72만 8,000개 파일, 38만 4,000개의 AI 어시스턴트, 9만 4,000개 워크스페이스, 21만 7,000건의 에이전트 메시지, 약 400만 개에 달하는 RAG 문서 조각, 그리고 95개의 시스템 프롬프트 및 AI 모델 설정 정보에 접근할 수 있었다고 밝혔다.
연구진은 “수십 년간 축적된 맥킨지의 독점 연구와 프레임워크, 방법론이 누구나 읽을 수 있는 데이터베이스에 그대로 노출돼 있었다”며 “기업의 핵심 지식 자산이 사실상 무방비 상태였다”고 지적했다.
문제의 원인은 단순했다. 200개가 넘는 공개 API 엔드포인트 가운데 22개가 인증 절차 없이 열려 있었던 것이다. 연구진은 단 2시간 만에 릴리의 전체 운영 데이터베이스에 읽기 및 쓰기 권한을 확보했다. 이후 맥킨지는 즉각 대응에 나서 인증되지 않은 엔드포인트를 차단하고 추가 보안 조치를 시행했다.
맥킨지는 공식 성명을 통해 “외부 포렌식 전문기관과 함께 진행한 조사 결과, 해당 연구자나 다른 비인가 제3자가 고객 데이터 또는 기밀 정보를 실제로 열람했다는 증거는 발견되지 않았다”고 밝혔다.
IDC는 이번 사건이 AI 시스템 보안 침해가 기업에 얼마나 치명적인 영향을 미칠 수 있는지를 보여주는 사례라고 분석했다.
IDC AI 리서치 부문 부사장 알레산드로 페릴리는 “대부분의 기업은 여전히 데이터 유출, 잘못된 출력, 브랜드 평판 훼손 등 기존 관점에서 AI 리스크를 바라보고 있다”라며 “물론 중요한 문제지만, 더 큰 위험은 AI 시스템에 의사결정 권한을 위임하는 데 있다”고 강조했다.
에이전틱 AI 플랫폼에 대한 접근 권한을 확보할 경우, 공격자는 단순히 비인가 정보를 열람하는 데 그치지 않고 기업의 행동 방식 자체를 은밀하게 바꿀 수 있다. 또한 릴리(Lilli)와 같은 엔터프라이즈급 에이전틱 AI 시스템을 보호하는 것은 전체 과제의 절반에 불과하다. 가트너에 따르면 69%의 조직이 직원들이 금지된 AI 도구를 사용하고 있다고 의심하고 있으며, 이로 인해 2030년까지 40%의 조직이 보안 또는 규정 준수 사고를 겪을 것으로 예상된다.
그러나 현재의 탐지 도구만으로는 AI 에이전트를 충분히 식별하기 어렵다고 가트너는 지적한다.
현재 수천 개의 AI 에이전트를 운영 중인 KPMG의 글로벌 AI 및 데이터 랩 총괄 스와미나단 찬드라세카란은 “지금 기업 내에서 얼마나 많은 에이전트가 실행되고 있는지 묻는다면 어디에서 확인할 수 있겠느냐”라며 “이들이 모두 온보딩돼 정체성을 부여받았는지, 적절한 인증 절차를 거쳤는지, 누가 관리하는지 확인할 수 있는 인프라는 아직 존재하지 않는다”고 말했다.
그는 이어 “관련 도구들이 이제 막 등장하고 있거나 기업들이 자체적으로 구축하는 단계”라며 “이러한 체계가 CIO에게 안정감을 제공하게 될 것”이라고 덧붙였다.
이미 개인 직원이 강력한 에이전틱 AI를 도입해 부정적인 결과를 초래한 사례도 공개되고 있다. 메타(Meta)의 얼라인먼트 디렉터 서머 위는 최근 오픈소스 에이전틱 AI 도구 ‘오픈클로(OpenClaw)’를 이메일 관리에 활용하기로 결정했다. 테스트 환경에서 정상적으로 작동한 이후 실제 업무에 적용했다.
메타의 서머 위는 지난 2월 “작업 전 확인을 하도록 설정했음에도, 순식간에 받은 편지함을 삭제하는 모습을 보며 크게 당황했다”라며 “휴대폰으로는 중단할 수 없어 폭탄을 해체하듯 맥 미니로 달려가야 했다”고 X를 통해 전했다.
과거에는 직원이 민감한 정보를 챗봇에 입력하거나 보고서를 작성하게 한 뒤 이를 복사해 사용하는 수준에 머물렀다. 그러나 챗봇이 완전한 에이전트형 시스템으로 발전하면서 이제 에이전트는 사용자 권한 범위 내에서 가능한 모든 작업을 수행할 수 있으며, 기업 시스템에 접근하는 것까지 가능해졌다.
EY의 디지털 및 신기술 부문 책임자 라케시 말호트라는 이러한 새로운 보안 리스크를 관리하기 위해 기업들이 기존의 역할 기반 및 신원 기반 통제를 넘어 ‘의도 기반 통제’로 전환해야 한다고 강조했다.
그는 “에이전트가 시스템에 접근해 데이터를 변경할 권한이 있는지만 확인하는 것으로는 충분하지 않다”라며 “왜 그 변경을 수행하는지까지 확인할 수 있어야 한다”고 설명했다.
이어 “현재 관측 시스템은 에이전트의 행동 의도를 포착하지 못한다”라며 “신뢰는 의도에서 비롯되지만, 이를 측정할 수 있는 방법이 없는 상황”이라고 지적했다.
또 “만약 사람이 전체 코드베이스를 리팩토링하려 한다면 그 이유를 설명해야 한다”라며 “명확한 이유 없이 그런 작업을 진행해서는 안 된다. 사람의 경우 이를 판단할 방법이 있지만, 에이전트에는 아직 그런 체계가 없다”고 덧붙였다.
트랜스유니온의 벤캇 아찬타는 자사의 원트루(OneTru) 플랫폼에서 ‘시맨틱 기반’의 중요성을 반복적으로 강조했다. 시맨틱 기반은 데이터가 무엇인지뿐 아니라 그 의미와 다른 데이터와의 관계까지 이해하도록 돕는 구조다. 가트너는 AI를 도입하는 기업이라면 시맨틱 레이어 구축이 이제 필수 과제라고 지적한다.
가트너는 “시맨틱 레이어는 정확도를 높이고 비용을 관리하며 AI 부채를 크게 줄이는 동시에, 멀티 에이전트 시스템을 정렬하고 비용이 큰 불일치를 사전에 차단할 수 있는 유일한 방법”이라고 설명했다.
또한 가트너는 2030년까지 범용 시맨틱 레이어가 데이터 플랫폼, 사이버 보안과 함께 핵심 인프라로 자리 잡을 것으로 전망했다. KPMG의 스와미나단 찬드라세카란은 “에이전트가 데이터를 활용해 의미 있는 작업을 수행하려면 맥락이 필수적”이라며 “그 안에 기업의 지식이 담겨 있다”고 말했다.
그는 이어 “이것이 기업의 새로운 지식재산(IP)”이라며 “맥락이 곧 새로운 경쟁력”이라고 강조했다.
미 법률 회사 굴스턴앤스토어스(Goulston & Storrs)의 CIO 존 아르스노는 견고한 데이터 기반 구축이 벤더 종속을 피하는 방법이기도 하다고 설명했다.
그는 “워크플로 자동화나 에이전틱 업무 지원을 위해 특정 솔루션에 데이터를 옮겨 넣으면, 이후 빠져나오기 매우 어려워진다”라며 “반면 데이터 중심 접근 방식을 취하면 시장 변화에 따라 다른 솔루션으로 유연하게 이동할 수 있다”고 말했다.
이 로펌은 고객 관련 업무 데이터를 법률 특화 문서 관리 시스템인 넷도큐먼츠(NetDocuments)로 이전했으며, 기타 데이터는 엔테그라타(Entegrata)의 법률 데이터 레이크하우스에 저장하고 있다.
아르스노는 “궁극적으로 모든 애플리케이션이 이 데이터 레이크를 중심으로 연결되도록 하는 것이 목표”라며 “이렇게 되면 회사의 모든 데이터가 두 개의 환경에 통합되고, 그 위에 어떤 AI 도구든 자유롭게 적용할 수 있다”고 설명했다.
이어 “데이터 흐름 관리도 훨씬 쉬워지고, 향후 등장할 AI 기술에도 빠르게 대응할 수 있다”라며 “생성형 AI든, 에이전틱 AI든, 혹은 앤트로픽 기반 기술이든 변화 속도가 너무 빨라 따라잡기 어렵다. 실제로 6개월마다 상황이 달라지고 있다”고 덧붙였다.
보안 가드레일을 구축하고 활용 가능한 데이터 레이어를 마련한 이후, 에이전트 인프라 퍼즐의 마지막 단계는 ‘오케스트레이션’이다. 에이전틱 AI 시스템은 에이전트 간 상호작용은 물론, 인간 사용자와의 협업, 다양한 데이터 소스 및 도구와의 연동이 필요하다. 이는 매우 복잡한 과제로, 기술은 빠르게 발전하고 있지만 아직 초기 단계에 머물러 있다. MCP(Model Context Protocol)는 이러한 오케스트레이션 문제를 해결하기 위한 핵심 요소 중 하나로 꼽히며, AI 벤더들도 이 분야에서 협력적인 태도를 보이고 있다.
디지털 전환 기업 글로번트(Globant)의 디지털 혁신 수석 부사장이자 기술 담당 부사장인 아구스틴 우에르타는 “소셜 네트워크 초기, 페이스북과 트위터가 상호작용 표준을 논의할 때는 경쟁사의 프로토콜을 채택하려는 기업이 없었다”라며 “하지만 지금은 모두가 MCP를 중심으로 표준을 발전시키고 있다”고 말했다.
그러나 에이전트 통합 문제가 완전히 해결된 것은 아니다. 800명 이상의 IT 의사결정자와 개발자를 대상으로 한 도커(Docker) 설문조사에 따르면, 여러 구성 요소를 조율하는 운영 복잡성이 에이전트 구축의 가장 큰 과제로 나타났다.
구체적으로 응답자의 37%는 오케스트레이션 프레임워크가 운영 환경에 적용하기에는 아직 불안정하거나 미성숙하다고 답했으며, 30%는 복잡한 오케스트레이션 환경에서 테스트 및 가시성 부족을 문제로 지적했다.
또한 85%의 팀이 MCP를 인지하고 있음에도 불구하고, 실제 운영 환경 적용을 가로막는 보안, 구성, 관리 측면의 문제도 여전히 존재하는 것으로 나타났다. 이 외에도 기업이 해결해야 할 통합 과제는 적지 않다.
우에르타는 “아직 해결되지 않은 문제 중 하나는 모든 에이전트를 통합적으로 제어하고 상태를 파악할 수 있는 대시보드”라며 “오픈AI 기반 에이전트를 모니터링하는 도구와 세일즈포스 기반 에이전트를 관리하는 도구는 각각 존재하지만, 제어·감사·로깅을 위한 텔레메트리를 하나의 중앙 대시보드에서 통합 제공하는 솔루션은 없다”고 지적했다.
그는 이어 “단일 플랫폼에서 에이전트를 운영하거나 도입 초기 단계에서는 큰 문제가 아니지만, 에이전트 네트워크가 확장될수록 이러한 한계가 본격적으로 드러난다”고 설명했다. 실제로 글로번트는 자체적인 에이전트 AI 통합 대시보드를 개발 중이다.
한편 미국 전역에 고객을 둔 약 700명 규모의 로펌 브라운스타인 하얏트 파버 슈렉(Brownstein Hyatt Farber Schreck)은 제안서 생성 시스템 등 다양한 영역에 AI를 적용하고 있다.
이 회사의 CIO 앤드루 존슨은 “기존에는 고객 제안요청서(RFP)를 검토하고, 수기 메모나 회의 기록을 분석한 뒤 관련 자료를 정리하는 데 며칠이 걸렸다”라며 “이제는 모든 정보를 시스템에 입력해 핵심 기준을 추출하고 몇 분 만에 수준 높은 초안을 생성할 수 있다”고 말했다.
이 과정에는 여러 에이전트가 협력한다. 성공 기준이나 인력 요건을 추출하는 에이전트, 과거 사례와 교훈을 분석하는 에이전트, 가격 책정과 브랜드 기준을 담당하는 에이전트 등이 각각 역할을 수행한다. 존슨은 “각 에이전트는 독립적으로 동작하지만, 결과물이 다음 단계로 이어지도록 반드시 오케스트레이션이 필요하다”고 설명했다. 현재는 대부분 기존 시스템에 MCP 레이어가 없기 때문에 RAG 기반 구조를 활용하고 있다.
또한 작업에 따라 서로 다른 AI 모델이 사용되기도 하는데, 이 역시 추가적인 오케스트레이션 관리 요소로 작용한다.
비용 관리도 중요한 이슈다. AI 에이전트가 무한 피드백 루프에 빠질 경우 추론 비용이 급격히 증가할 수 있기 때문이다.
존슨은 “이러한 가능성을 인지하고 있으며, 아직 실제로 발생한 사례는 없지만 모니터링 체계를 구축해 임계치를 초과할 경우 즉각 대응하도록 하고 있다”고 말했다.
이처럼 다양한 대응 전략에도 불구하고, AI를 둘러싼 변화 속도는 기업이 경험한 그 어떤 기술보다 빠르다.
EY의 말호트라는 “25년간 기술 업계에 있었지만 지금과 같은 변화는 처음”이라며 “역사상 가장 빠르게 성장한 기업들이 최근 3~4년 사이에 등장했고, 기술 도입 속도 역시 전례가 없다”고 말했다. 이어 “불과 9~10개월 전까지만 해도 핵심이었던 기술이 이미 지나간 사례도 많다”고 덧붙였다.
dl-ciokorea@foundryco.com


I have learned to treat small language models (SLMs) as less of a model category and more of a portfolio strategy. They are the pragmatic answer to a question leaders end up asking sooner or later: How do we scale GenAI across real workflows without turning inference cost, latency, data ownership and boundaries into a systemic risk?
The short answer is SLMs make GenAI operational. Frontier LLMs keep it capable; an appropriate multi-model strategy is required in the enterprise to run both responsibly.
When I say SLM, I am usually referring to two different things. They are related and mixing them leads to bad architecture decisions.
Model size is the mechanical part: Parameter count, memory footprint, compute requirements. It surfaces in questions like whether you can run inference on a single GPU, how unit cost changes as concurrency grows and whether latency holds as context grows. Size determines what is feasible to deploy and what it will cost to operate over time.
Operational intent is the part I care most about in an enterprise setting. I treat a model as a workflow component under tight constraints: Cost/transaction, latency, data boundaries and residency. This is also why agentic systems often benefit from SLM’s. Many agent subtasks in production are repetitive and scoped, which makes it sensible to prefer specialist models for most calls and reserve frontier LLMs for the hard exceptions. A clear articulation of this viewpoint is in “Small language models are the future of agentic AI”.
I see operational intent split across two deployment contexts.
In summary, size sets the ceiling; it determines what is feasible to deploy, what it costs to run at scale and where the model can run. Operational intent sets the standard; the right model may not be the most capable one, but the one that holds up under real workflow constraints, whether in business processes or on edge devices.
There isn’t one universal cutoff, but I use tiers to map infrastructure decisions.
Additionally, in an enterprise, I have seen two categories:
If you want an external, size-aware benchmark view for open models, the Hugging Face open LLM leaderboard is a useful reference point.
For workflows requiring open-ended research, deep multi-step reasoning or broad judgment, I would not recommend an SLM. This is where Frontier LLM’s still earn their keep.
I do recommend SLMs when:
If any of the above are unclear, the problem is workflow design and not model selection.
In practice the right frame is not which model is smarter, but which produces the best outcome per unit of cost and risk.
| Dimension | SLM | LLM |
| Cost per case | Lowest; enables broad rollout | Highest; must be rationed |
| Latency | Usually better; easier to hit 95/99 % targets | Often slower, especially at long context |
| Data boundary | Easier to keep private via self-hosting or minimize data sent externally | Higher governance overhead if the model is external |
| Best at | Routing, extraction, templated summaries, RAG retrieval answers | Ambiguous reasoning, synthesis, nuanced drafting |
| Failure surface | Contained; schemas, validators and escalations limit blast radius | Needs guardrails but errors in complex reasoning are harder to catch |
| Architectural pattern | Default engine with escalation routing built in | Escalation tier reserved for exceptions |
The remaining question is whether a general SLM is sufficient or whether the domain is specific enough that the generality becomes a liability. This is where domain-specific small language models (DSLM) appear and the SLM strategy becomes a competitive differentiator.
A DSLM is where SLM strategy becomes a competitive advantage rather than a cost play. I think of DSLM as an SLM fine-tuned on the language, labels and edge cases of a specific workflow. The goal is stable, structured output, not broad generalization. The fine-tuning is supported by governance processes that treat model updates the way engineering teams treat software releases.
Some have equated this to a permanent embedded RAG; however, I avoid describing fine-tuning as that. Fine-tuning changes what the model intrinsically understands. Retrieval augmented generation (RAG) changes what the model can access at runtime. They solve different problems and in mature systems, they are complementary. I recommend using both DSLM as the inference engine, with RAG layered on cases where the model needs current or use case-specific information it has not been trained on.
In my experience, DSLM’s outperform general SLM’s because domain tuning reduces brittleness on edge cases. It also outperforms LLMs in high-volume, well-defined workflows because cost and stability dominate and in regulated environments, the data never needs to leave your infrastructure.
The tradeoff is discipline. A DSLM demands curated training data, evaluation sets tied to workflow outcomes, regression gates before any update ships, versioning and a tested rollback path. The same specificity that made it reliable inside a workflow makes it brittle outside it. Every time the underlying workflow changes, the model potentially needs retraining. Teams that skip the discipline end up with a model that drifts quietly and fails loudly.
For governance, the NIST AI Risk Management Framework is a practical anchor because it is designed to be operationalized and adapted.
I recommend a four-stage maturity sequence where order matters more than pace:
The model landscape will keep shifting, context windows will grow, benchmarks will move and new tiers will appear between what we call small and frontier today. What will not change is the underlying question: How you run AI at scale, across real workflows, without turning cost, latency and data boundaries into systemic risks. Enterprises that answer that question well will not do it by chasing the most capable model. They will do it by building the operational discipline first and treating model selection as a downstream decision.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?


IDC estima que a finales del año pasado había más de 28 millones de agentes de IA desplegados, y predice que en 2029 habrá más de 1.000 millones activos, ejecutando 217.000 millones de acciones al día.
Es fácil crear una prueba de concepto (POC) de un agente de IA, afirma Venkat Achanta, director de tecnología, datos y análisis de TransUnion, una empresa global de informes crediticios con unos ingresos de 4.600 millones de dólares. Pero gestionarlo, protegerlo y escalarlo supone todo un reto, especialmente para empresas de sectores altamente regulados, como los servicios financieros y la sanidad. Para abordar el problema, TransUnion ha dedicado los últimos tres años a desarrollar su plataforma de IA agentiva, OneTru. El objetivo era crear algo tan fiable y determinista como los antiguos sistemas basados en scripts y diseñados por expertos, pero tan flexible como la IA general, y tan fácil de interactuar como un chatbot.
El truco, sin embargo, consistía en combinar lo mejor de ambos mundos utilizando sistemas tradicionales para los procesos centrales, donde la explicabilidad y la fiabilidad son clave, e incorporando la funcionalidad de la IA general de forma limitada para las tareas para las que resultaba especialmente adecuada. Y dado que no se disponía de la infraestructura necesaria para ello, TransUnion construyó la suya propia, destinando 145 millones de dólares al proyecto. Fue una gran inversión en una tecnología sin probar, pero ya ha supuesto un ahorro de 200 millones de dólares. Más aún, una vez construida la plataforma, TransUnion la utilizó para crear soluciones orientadas al cliente.
En marzo de este año, por ejemplo, TransUnion lanzó su AI Analytics Orchestrator Agent, creado con la plataforma OneTru y basado en los modelos Gemini de Google. TransUnion ya utiliza este agente internamente para mejorar los análisis, y los clientes también pueden utilizarlo para realizar sofisticados análisis de datos sin necesidad de recurrir a científicos de datos.
Muchos clientes utilizan los datos de TransUnion, pero no utilizan otras soluciones ni plataformas, afirma Achanta. El nuevo agente de orquestación tiene el potencial de ayudar a los clientes a sacar más partido a los datos y de abrir nuevas fuentes de ingresos para la empresa. Y hay más agentes en desarrollo, afirma Achanta. La clave para que funcionen son las capas de orquestación, gobernanza y seguridad. Hacer que un agente haga algo es muy fácil para cualquiera, dice, y puede llevar solo unos días. La empresa también puede crear agentes rápidamente. “Pero yo tengo la base y las barreras de seguridad, y el agente que se encuentra en mi plataforma las utiliza todas. Eso es lo que nos da poder”, afirma.
El secreto para lograr que los agentes de IA se comporten es separar las capas de la tarea y asignar cada capa a un sistema diferente, cada uno de los cuales opera bajo un conjunto de restricciones. Este enfoque limita el daño que puede causar cualquier agente en particular, crea un sistema de controles y contrapesos, y restringe las actividades más arriesgadas a una tecnología de IA de generación previa.
Por ejemplo, en TransUnion, la toma de decisiones principal la lleva a cabo una versión actualizada de un sistema experto. Funciona bajo un conjunto de reglas bien definidas y auditables, y opera de forma predecible, rentable y con baja latencia. Cuando se encuentra con una situación que no ha visto antes, se utiliza un LLM para analizar el problema; a continuación, un agente diferente podría convertirlo en una nueva regla, y luego se podría recurrir a un humano para revisar los resultados antes de que la nueva regla se añada al sistema experto. Hay diferentes agentes que comprenden la capa semántica, interactúan con los humanos y realizan otras tareas. “Con la capa de razonamiento neuronal —el LLM— incorporamos a los humanos al proceso. Cuando se trata de una capa de razonamiento simbólico, que se basa en la lógica y el aprendizaje automático, dejamos que se automatice”, explica.
Así, cuando cada agente opera dentro de restricciones muy estrictas, solo con los datos limitados que necesita para esa tarea concreta, y está limitado a lo que puede hacer, todo el sistema se vuelve mucho más manejable y fiable. Es como la diferencia entre una cadena de montaje, donde varios trabajadores realizan cada uno una tarea única y distinta, en lugar de un taller donde un solo artesano lo hace todo. La cadena de montaje puede trabajar más rápido y de forma más fiable, pero hoy en día muchas empresas implementan sus agentes de IA como si fueran artesanos. Este último enfoque puede dar lugar a productos creativos y únicos, pero no siempre es lo que una empresa necesita.
Nicholas Mattei, presidente del grupo de interés especial de la ACM sobre IA y profesor de la Universidad de Tulane, sugiere que las empresas se centren en incorporar seguridad adicional en los puntos donde se conectan las diferentes partes del sistema de agentes. “Hay que asegurarse de que hay seguridad en las uniones”, afirma. Por ejemplo, si un agente envía solicitudes a un servicio de correo electrónico, hay que configurar un punto de control entre ambos. “En los huecos entre los agentes poco fiables y donde reside el software tradicional es donde se tienen que ubicar los procesos de seguridad”, relata.
En una encuesta de Jitterbit realizada a 1.500 líderes de TI publicada en marzo, la responsabilidad de la IA —seguridad, auditabilidad, trazabilidad y medidas de protección— es el factor más importante a la hora de tomar la decisión final de compra de IA, por delante de la velocidad de implementación, la reputación del proveedor e incluso el coste total de propiedad. Los riesgos de seguridad, gobernanza y privacidad de los datos también fueron las principales cuestiones que impedían que las iniciativas de IA pasaran a producción, por delante de los costes y los retos de integración. Y tienen razón en estar preocupados.
A principios de este año, investigadores de la empresa de ciberseguridad CodeWall lograron vulnerar la nueva plataforma de IA de McKinsey, Lilli. Utilizando una herramienta de IA propia, los investigadores afirmaron que pudieron acceder a 47 millones de mensajes de chat, 728.000 archivos, 384.000 asistentes de IA, 94.000 espacios de trabajo, 217.000 mensajes de agentes, casi 4 millones de fragmentos de documentos RAG y 95 indicaciones del sistema y configuraciones de modelos de IA. “Se trata de décadas de investigación, marcos y metodologías propios de McKinsey: las joyas de la corona intelectual de la empresa almacenadas en una base de datos a la que cualquiera podía acceder”, escribieron los investigadores.
¿El motivo? De los más de 200 puntos finales de API expuestos públicamente, 22 no requerían autenticación. Los investigadores tardaron solo dos horas en obtener acceso completo de lectura y escritura a toda la base de datos de producción de Lilli. McKinsey respondió rápidamente a la alerta, corrigió los puntos finales sin autenticación y tomó otras medidas de seguridad. “Nuestra investigación, respaldada por una empresa forense externa líder, no identificó ninguna prueba de que este investigador o cualquier otro tercero no autorizado hubiera accedido a datos o información confidencial de los clientes”, afirmó la empresa en un comunicado.
IDC indica que el incidente pone de relieve lo peligrosa que puede ser la violación de un sistema de IA para una empresa. “La mayoría de las empresas siguen pensando en los riesgos de la IA en términos del pasado: fuga de datos, resultados erróneos y daño a la reputación de la marca”, explica Alessandro Perilli, vicepresidente de investigación en IA de IDC. “Esos son problemas graves, pero el mayor riesgo reside en delegar autoridad a los sistemas de IA”.
Al obtener acceso a una plataforma de IA agentiva, un atacante no solo puede ver algo que no debería, sino también cambiar de forma encubierta la forma de actuar de la empresa. Y proteger sistemas de IA agentiva a escala empresarial como Lilli es solo la mitad del reto. Según Gartner, el 69% de las organizaciones sospecha que sus empleados utilizan herramientas de IA prohibidas, y el 40% sufrirá incidentes de seguridad o de cumplimiento normativo para 2030 como consecuencia de ello.
Pero las herramientas de detección disponibles no están del todo preparadas para encontrar agentes de IA, indican desde Gartner. “Si te preguntara cuántos agentes se ejecutan en tu empresa en este momento, ¿dónde irías a buscarlo?”, pregunta Swaminathan Chandrasekaran, director global de IA y laboratorios de datos en KPMG, que ahora cuenta con varios miles de agentes de IA en producción. “¿Se han incorporado todos y tienen identidades? ¿Han pasado por un proceso de autenticación adecuado y quién está a cargo de ellos? Esa infraestructura no existe”.
Sin embargo, las herramientas están empezando a surgir, o las empresas están creando soluciones “hazlo tú mismo”, cuenta. “Eso es lo que va a dar tranquilidad a los directores de sistemas de información”. Ya estamos viendo ejemplos públicos de empleados individuales que implementan una potente IA agentiva con consecuencias negativas. Summer Yue, directora de alineación de Meta, decidió recientemente utilizar OpenClaw, una herramienta viral de IA agentiva de código abierto, para ayudarla a gestionar su bandeja de entrada. Después de que funcionara en una bandeja de entrada de prueba, la implementó de verdad.
“Nada te hace sentir más humilde que decirle a tu OpenClaw que confirme antes de actuar y ver cómo borra tu bandeja de entrada a toda velocidad”, escribió en X. “No pude detenerlo desde mi teléfono. Tuve que correr hacia mi Mac mini como si estuviera desactivando una bomba”. En el pasado, un empleado podía subir información confidencial a un chatbot o pedirle que redactara un informe que luego copiaría y pegaría, haciéndolo pasar por suyo. A medida que estos chatbots evolucionan hacia sistemas agenticos completos, los agentes tienen ahora la capacidad de hacer cualquier cosa para la que el usuario tenga privilegios, incluido el acceso a los sistemas corporativos.
Para gestionar este nuevo riesgo de seguridad, las empresas tendrán que pasar de controles basados en roles e identidades a otros basados en la intención, afirma Rakesh Malhotra, director de tecnologías digitales y emergentes en EY. No basta con preguntar si un agente tiene permiso para acceder a un sistema y realizar un cambio en un registro, afirma. Las empresas deben poder preguntar por qué se está realizando ese cambio. Ese es un gran reto en este momento. “La tecnología de observabilidad no capta la intención de por qué el agente ha hecho algo”, afirma. “Y eso es realmente importante de entender. La confianza se basa en la intención, y no hay forma de que ninguno de estos sistemas capte la intención”.
Si un empleado humano intentara refactorizar toda la base de código, se le pediría que diera una buena razón para hacerlo. “Y si estás refactorizando sin ninguna razón específica, quizá no deberías hacerlo”, dice Malhotra. “Con las personas, hay formas de juzgar esto. No sé cómo hacerlo con los agentes”.
Achanta, de TransUnion, menciona repetidamente la base semántica de la plataforma OneTru de la empresa. Esa comprensión de la información ayuda a los sistemas a entender no solo qué son los datos, sino qué significan y cómo se relacionan con otros datos. Gartner afirma que desarrollar una capa semántica es ahora imprescindible para las empresas que implementan IA. “Es la única forma de mejorar la precisión, gestionar los costes, reducir sustancialmente la deuda de IA, alinear los sistemas multiagente y detener las costosas inconsistencias antes de que se extiendan”, dice.
Para 2030, las capas semánticas universales se considerarán infraestructura crítica, junto con las plataformas de datos y la ciberseguridad, predice Gartner. Y los agentes necesitan contexto para poder hacer algo significativo con los datos, afirma Chandrasekaran, de KPMG. Ahí es donde reside el conocimiento de una empresa. “Esa es tu nueva propiedad intelectual para la empresa. El contexto es la nueva muralla defensiva”.
Para John Arsneault, director de sistemas de información de Goulston & Storrs, crear una base de datos sólida es también una forma de evitar la dependencia de un proveedor. “Si compras productos y trasladas tus datos a ellos para automatizar flujos de trabajo o crear asistentes de trabajo para los agentes, te costará mucho salir de ahí. Pero si adoptas un enfoque centrado en los datos, al menos podrás pasar de uno a otro si se produce un cambio en el mercado”.
El bufete de abogados ha migrado sus productos de trabajo orientados al cliente a NetDocuments, un sistema de gestión de documentos enfocado específicamente al sector jurídico. Y el resto de los datos que recopila la empresa se almacenan en el ‘data lakehouse’ jurídico de Entegrata.
“Nuestro objetivo es que, con el tiempo, todas nuestras demás aplicaciones apunten a ese lago de datos. Entonces tendremos estos dos entornos donde residen todos los datos del bufete, lo que nos permitirá integrar cualquier herramienta de IA que utilicemos”, afirma.
También facilitará la gestión de los flujos de datos, añade, y permitirá al bufete adaptarse rápidamente a cualquier tecnología de IA que surja en el futuro. “Ya sea IA generativa, agéntica o de Anthropic, con el complemento legal de Cowork, es muy difícil mantenerse al día. Y cambia cada seis meses”.
La última pieza del rompecabezas de la infraestructura de agentes, tras establecer las medidas de seguridad y crear una capa de datos utilizable, es la orquestación. Los sistemas de IA de agentes requieren que los agentes se comuniquen entre sí y con los usuarios humanos, e interactúen con fuentes de datos y herramientas. Es un reto complicado, y esta tecnología se encuentra todavía en una fase muy incipiente, aunque avanza rápidamente. MCP es un ejemplo de ello, y es una pieza clave para resolver el rompecabezas de la orquestación. Los proveedores de IA se han mostrado muy dispuestos a cooperar en este ámbito.
“Cuando surgieron las redes sociales, y Facebook y Twitter debatían sobre un protocolo estándar para interactuar, nadie quería adoptar el protocolo de sus competidores”, afirma Agustín Huerta, vicepresidente sénior de innovación digital y vicepresidente de tecnología en Globant, una empresa de transformación digital. “Ahora todo el mundo está adoptando MCP y madurándolo como protocolo estándar”.
Pero eso no quiere decir que la integración de agentes se haya resuelto. Según una encuesta de Docker realizada a más de 800 responsables de la toma de decisiones de TI y desarrolladores, la complejidad operativa de orquestar múltiples componentes es el mayor desafío a la hora de crear agentes.
En concreto, el 37% de los encuestados afirma que los marcos de orquestación son demasiado frágiles o inmaduros para su uso en producción, y el 30% señala deficiencias en las pruebas y la visibilidad en orquestaciones complejas.
Además, aunque el 85% de los equipos están familiarizados con MCP, la mayoría afirma que existen importantes problemas de seguridad, configuración y gestionabilidad que impiden su implementación en producción. Y hay otros problemas de integración a los que las empresas deben hacer frente.
“Un problema aún por resolver es cómo conseguir un panel de control adecuado para gestionar todos estos agentes, para saber exactamente qué está pasando con cada uno de ellos”, afirma Huerta. “Hay un panel que permite supervisar los agentes creados con OpenAI y otro para los que residen en Salesforce, pero ninguno puede mostrar la telemetría en un panel centralizado para el control, la auditoría y el registro”.
Para las empresas que acaban de empezar a implementar agentes, o que se ciñen a una única plataforma, esto aún no supone un problema, añade, pero a medida que aprovechen una red más amplia de agentes, empezarán a experimentar estos retos. La propia Globant está creando su propio panel de control interno para la IA basada en agentes, por ejemplo.
Y en Brownstein Hyatt Farber Schreck, un bufete de abogados con 50 años de antigüedad, unos 700 empleados y clientes en todo Estados Unidos, hay varias áreas en las que se está implementando la IA, incluido un sistema generador de propuestas.
Normalmente, varias personas pueden tardar días en revisar la solicitud de propuesta de un cliente, examinar notas manuscritas o transcripciones de reuniones y recopilar otros materiales relevantes, afirma Andrew Johnson, director de sistemas de información del bufete. “Podemos introducir toda esa información en un ordenador y extraer los criterios clave para producir un primer borrador de calidad en cuestión de minutos”, afirma.
Se requieren múltiples agentes para las diferentes partes del proceso: uno para extraer los criterios de éxito o los requisitos de personal, otro para buscar precedentes y lecciones aprendidas, y otros para la fijación de precios y los estándares de marca. “Cada uno de esos agentes es autónomo y debe coordinarse para que los resultados de cada uno se incorporen al siguiente paso”, explica Johnson. En su mayor parte, eso significa un sistema RAG, ya que la mayoría de las plataformas heredadas que utiliza la empresa aún no han incorporado una capa MCP.
Dependiendo de la tarea, los agentes individuales pueden funcionar con diferentes modelos, lo que supone otra capa de coordinación que hay que gestionar. Luego está el control de costes. Si un agente de IA o un grupo de agentes entra en un bucle de retroalimentación infinito, los costes de inferencia pueden aumentar rápidamente. “Somos conscientes de la preocupación, aunque aún no la hemos visto materializarse”, afirma Johnson. “Por eso contamos con un sistema de supervisión. Si superamos los umbrales, reaccionamos”.
Independientemente de las estrategias o medidas para absorber los contratiempos, todo lo relacionado con la IA está cambiando más rápido que cualquier otra cosa que las empresas hayan visto. “Llevo 25 años en el sector tecnológico y nunca había visto nada igual”, señala Malhotra, de EY. “Las empresas de más rápido crecimiento de la historia se han creado todas en los últimos tres o cuatro años. El crecimiento en la adopción no tiene precedentes. Y hablo constantemente con clientes que están implementando tecnologías que eran muy relevantes hace nueve o diez meses, y todo el mundo ha pasado página”.


La mayoría de las empresas aún no han aprovechado el poder transformador de la IA, centrándose en cambio en mejoras incrementales de productividad y eficiencia que no conducen a ventajas competitivas, según un informe de la firma de análisis Forrester. Las mejoras internas de productividad derivadas de la IA siguen siendo marginales, no sustanciales, ya que las organizaciones no han descubierto cómo generar beneficios más significativos a través de la tecnología, afirma Forrester en su reciente informe Accelerate Your AI Voyage.
La evidencia: el 43% de los responsables de la toma de decisiones sobre IA encuestados por la empresa miden las mejoras de productividad obtenidas gracias a la IA, y el 41% miden las ganancias de eficiencia, pero solo el 32% vincula los resultados de la IA con los beneficios o los ingresos.
“Ahorrar 10.000 horas de trabajo de los empleados puede parecer bueno sobre el papel, pero no cubrirá la factura de las GPU, y mucho menos impulsará la reinvención”, escriben los analistas de Forrester en el informe. “Este pensamiento incremental constituye la base de una desconexión fundamental con respecto a la promesa del potencial transformador de la IA”.
Solo entre el 5% y el 15% de las organizaciones cuentan actualmente con una estrategia de IA eficaz, y es probable que el porcentaje se sitúe más cerca del extremo inferior, estima Brian Hopkins, vicepresidente de tecnologías emergentes de Forrester. Al centrarse en las ganancias de productividad o eficiencia, la mayoría de las organizaciones se pierden el verdadero poder de la IA, añade. “La eficiencia no es estrategia; es gestión de proyectos. Estás intentando mejorar tus procesos actuales de forma incremental”, dice.
Dar a los empleados un copiloto para ver qué hacen con él no es un enfoque ganador, añade Hopkins: “Toda esta idea de que vamos a invertir de forma incremental en productividad y que, de alguna manera, eso va a capturar el potencial que ofrece la IA, es una quimera”.
Por otra parte, las mejoras de productividad de la IA suelen depender de recortes de plantilla tras la implementación, añade. “El problema de las mejoras incrementales de productividad es que, para obtener los beneficios que exige tu director financiero, tienes que implementar una solución, demostrar que funciona y, a continuación, despedir a gente”, relata. “¿Crees que las personas a las que vas a despedir te van a ayudar a hacerlo? No lo harán. Es un trabajo complicado y desagradable”.
Los datos de Forrester coinciden con otra encuesta reciente del proveedor de plataformas de agentes de IA Decidr, que desveló que el 40% de las empresas estadounidenses obtienen la mayor parte del valor de su IA de herramientas al estilo de ChatGPT, en lugar de agentes o modelos de IA personalizados.
Otros líderes de TI también ven los problemas destacados en el estudio de Forrester. Muchas empresas se centran en estrategias de IA a la altura de los carros tirados por caballos en un mundo que avanza hacia los coches autónomos, afirma Christine Park, directora de transformación de IA en Branch, proveedor de una plataforma de seguimiento de enlaces móviles.
“Esto es exactamente lo que ocurre cuando el mercado avanza más rápido que el modelo operativo. Los líderes están optimizando la eficiencia de forma limitada dentro de las funciones en lugar de replantearse cómo debería cambiar fundamentalmente el trabajo en sí”, afirma.
Las mejoras en productividad y eficiencia no supondrán un cambio significativo para la mayoría de las organizaciones, añade. La verdadera transformación de la IA no se limita a habilitar funciones individuales, sino que requiere coordinación entre todos los flujos de trabajo. “La IA para la eficiencia de costes eleva el nivel mínimo, ¿y qué?”, dice Park. “Si solo se trata de una estrategia de eficiencia, no vas a obtener más que ganancias a corto plazo. Si se compara la reducción de costes con la eficiencia, podemos crecer sin un aumento proporcional de la plantilla, pero se necesita una verdadera transformación para elevar el techo”.
En cambio, las organizaciones inteligentes se centrarán en la IA como una amplificación tanto de los ingresos como de la experiencia de las personas, añade. La naturaleza del trabajo está cambiando, ya que ahora se desarrolla en flujos de trabajo multidimensionales en lugar de tareas paso a paso, afirma.
“La IA se está tratando como una característica cuando debería tratarse como una transformación”, afirma. “Eso significa adoptar una perspectiva centrada en las personas y que los líderes cambien la forma en que formamos a las personas, definimos los roles y medimos el éxito. La IA es un cambio humano, no solo una nueva herramienta”.
Las organizaciones deberían buscar una transformación del flujo de trabajo en toda la empresa, añade Mike Flynn, líder de consultoría del sector tecnológico en la firma de servicios profesionales EY. Muchas organizaciones se centran en la automatización a nivel de tareas en lugar de rediseñar los flujos de trabajo de principio a fin, indica.
Al centrarse en las mejoras a nivel de tareas, las empresas añaden costes de herramientas de IA y de computación sin eliminar una cantidad significativa de trabajo del sistema, lo que conduce a lo que Flynn denomina “trabajo atrapado”.
Las organizaciones deben adoptar un enfoque centrado en la IA para todos sus flujos de trabajo e intentar rediseñar los procesos para eliminar el trabajo humano repetitivo en la medida de lo posible, recomienda Flynn. Las organizaciones deberían entonces añadir la intervención humana cuando sea necesaria, señala. “Si piensas en aplicar la IA a tu problema empresarial, a medida que sigues añadiendo IA, el esfuerzo que se requiere sigue aumentando, en comparación con rediseñar tus procesos de tal manera que la IA se integre en ellos”, añade.
Crear una estrategia de IA duradera va más allá de implementar unas pocas herramientas de IA para los empleados, afirma Flynn, y añade que EY guía a los clientes a través de un plan de valor de la IA que les muestra los posibles resultados de diversas estrategias de IA. “Las empresas se están dando cuenta de que esto no es tan fácil como simplemente habilitar y dar a las personas herramientas con las que puedan hacer algo y que se puedan acoplar a sus trabajos actuales”, agrega. “Para mí, lo importante es pensar en rediseñar los procesos operativos. Se trata de una transformación de los procesos y de las personas tanto como de la IA en sí misma”.
La mayoría de las organizaciones aún no están preparadas para dar el siguiente paso, sugiere Thomas Prommer, expresidente de la empresa de diseño, TI e IA Huge. Los casos de uso sustanciales, como la revisión de precios y la toma de decisiones en la cadena de suministro, requieren prácticas de gestión del riesgo de los modelos y registros de auditoría que la mayoría de las empresas aún no tienen, afirma.
“La productividad interna es el único caso de uso que la organización puede probar realmente de forma segura con la gobernanza actual”, apunta Prommer. “Están utilizando copilotos porque los copilotos no necesitan un comité de riesgo de modelos”.
Además, la transición de ganancias incrementales a ganancias sustanciales impulsadas por la IA requiere que alguien o algo fuerce el cambio, como un director ejecutivo, un inversor activista o una sacudida competitiva, añade. Los directores de sistemas de información (CIO) rara vez pueden impulsar el cambio por sí solos, afirma.
Sin embargo, algunas organizaciones han dejado atrás los ahorros de productividad porque no aparecen en las cuentas de resultados (P&L), dice Prommer. “Si le ahorras a un ingeniero 90 minutos al día, eso no aparece en la cuenta de resultados; aparece como: ‘Hemos lanzado un 15% más de funciones”, afirma. “Los consejos de administración quieren una partida concreta. Las empresas que pasaron a casos de uso sustanciales lo hicieron porque contaban con un único responsable de la cuenta de resultados dispuesto a arriesgar sus cifras por ello”.
Hopkins, de Forrester, insta a las organizaciones a replantearse las estrategias de IA y a centrarse en cambios sustanciales, a pesar de las dificultades. Si las organizaciones apuntan lo suficientemente alto, pueden utilizar la IA para permitir una transformación empresarial completa y encontrar usos de la IA que impulsen ventajas competitivas, afirma.
Forrester aconseja a los responsables de TI y a los líderes empresariales que se centren en cuatro áreas clave:
Si las organizaciones adoptan el enfoque adecuado, pueden implementar la IA de formas que generen ventajas competitivas reales, según Hopkins. “La estrategia consiste en aplicar una fuerza masiva, basada en una visión que uno tiene, que le da fuerza y debilita a la competencia. Se tiene una visión que los competidores no ven y se establece una capacidad que los competidores no pueden replicar”.


구글이 지난주 개최한 연례 컨퍼런스 ‘구글 클라우드 넥스트 2026’에서 내놓은 발표 가운데 가장 주목할 점은 새로운 모델이나 TPU가 아니었다. 기업 전반에 제미나이를 확산하는 또 다른 방식 역시 핵심은 아니었다.
오히려 이는 하나의 인정이자, 동시에 경고에 가까운 메시지로 읽힌다.
이미 알고 있던 사실이지만, “알고도 실행하지 않으면 진정으로 아는 것이 아니다”라는 말처럼 실제로 이를 실천하는 것은 또 다른 문제다. 우리는 에이전트를 분주하게 일을 처리하는 디지털 직원처럼 여기지만, 동시에 이들은 인증 정보와 예산, 메모리, 민감 데이터 접근 권한을 가진 취약한 소프트웨어 시스템이기도 하다. 게다가 비용이 크게 들고 원인 추적이 어려운 방식으로 실패하는 특성까지 갖고 있다.
이것이 ‘구글 클라우드 넥스트 2026’의 본질적인 메시지다. 많은 이들은 구글이 에이전틱 엔터프라이즈 시장을 선점하기 위해 나섰다고 해석하지만, 보다 흥미로운 해석은 구글이 이를 ‘통제하기 위해’ 등장했다는 점이다.
물론 구글은 ‘에이전틱 클라우드(agentic cloud)’를 적극적으로 강조했다. 요즘 어떤 행사에서도 빠지지 않는 주제다. 제미나이 엔터프라이즈 에이전트 플랫폼, 8세대 TPU(Tensor Processing Unit), 새로운 워크스페이스 인텔리전스 AI(Workspace Intelligence AI) 기능, 그리고 기업 전반에 AI를 자연스럽게 녹여내기 위한 다양한 통합 기능도 함께 발표했다. 에이전트 시대의 성과를 자축하는 자리로만 본다면 충분한 발표였다.
하지만 화려한 연출을 걷어내면 더 중요한 메시지가 드러난다. 지난 2년 동안 기업은 AI 에이전트에 열광해 왔고, 이제는 이들이 기업의 평판을 해치거나 재무적 손실을 일으키거나 민감 정보를 노출하지 않도록 통제해야 할 단계에 이르렀다는 점이다.
이는 구글을 비판하는 이야기가 아니다. 오히려 그 반대다. 이번 행사에서 가장 실질적인 가치가 있는 발표일 수 있다.
AI가 단순히 말하는 수준을 넘어 실제 행동을 수행하기 시작하는 순간, 기업 환경에서는 필수적인 질문들이 쏟아진다. 누가 이를 승인했는지, 어떤 데이터를 사용했는지, 어떤 시스템에 접근했는지, 왜 그런 행동을 했는지, 비용은 얼마나 들었는지, 그리고 필요할 경우 어떻게 중단할 수 있는지 등이다.
구글의 이번 발표는 상당 부분 이러한 질문에 대한 답변으로 구성됐다.
구글이 강조한 내용을 보면 이를 분명히 알 수 있다. 지식 카탈로그(Knowledge Catalog)는 기업 데이터 전반에서 신뢰할 수 있는 비즈니스 맥락을 제공해 에이전트의 판단을 보완하도록 설계됐다. 제미나이 엔터프라이즈에는 장시간 실행되는 에이전트를 포함해 이를 관리·모니터링할 수 있는 기능이 추가됐다.
워크스페이스에는 에이전트의 데이터 접근을 모니터링하고 제어하며 감사할 수 있는 기능이 도입돼 프롬프트 인젝션, 과도한 정보 공유, 데이터 유출 위험을 줄인다. 또한 구글 클라우드는 에이전트 방어 기능과 위즈(Wiz) 기반 보안 체계를 통해 클라우드와 AI 개발 환경 전반에서 에이전트를 보호할 수 있도록 했다.
이러한 기능들은 시스템이 완벽하게 작동할 때 필요한 도구가 아니다. 오히려 “데모에서는 잘 작동했지만 실제 업무에 맡겨도 되는가”라는 현실적인 고민에 직면한 기업을 위해 만들어진 것이다.
업계 분석가들은 기업용 AI의 새로운 계층을 설명하는 용어로 ‘에이전트 컨트롤 플레인(agent control plane)’에 점차 합의하는 분위기다. 익숙한 개념이라는 점에서 적절한 표현이다. 마치 쿠버네티스(Kubernetes)가 인프라를 통합 관리하듯, AI 에이전트의 동작을 중앙에서 관리하는 플랫폼을 떠올리게 한다. 즉, 다수의 AI 에이전트를 한곳에서 관리하고 관찰하며, 라우팅·보안·최적화를 수행할 수 있는 통합 시스템을 의미한다.
하지만 현실은 아직 그 단계와 거리가 멀다.
에이전트에 컨트롤 플레인이 필요한 이유는 이들이 이미 직원을 대체하고 있어서가 아니다. 오히려 기업이 확률 기반 시스템인 에이전트를 기존의 결정론적 업무 프로세스에 연결하면서, 그 사이를 누군가 반드시 관리해야 한다는 사실을 깨닫고 있기 때문이다. 에이전트 데모에서는 자율성이 깔끔하게 보이지만, 실제 엔터프라이즈 시스템에서는 상황이 훨씬 복잡하게 전개된다.
고객 데이터는 한 시스템에, 계약 정보는 또 다른 시스템에 흩어져 있고, 예외 처리는 누군가의 이메일함에 남아 있으며, 정책 문서는 2021년에 업데이트된 PDF 파일에 머물러 있는 경우가 많다. 게다가 해당 업무 흐름을 이해하던 담당자는 팬데믹 기간 중 회사를 떠났을 수도 있다.
이처럼 복잡한 환경에 이제 에이전트까지 추가되고 있다.
이 때문에 필자는 구글의 컨트롤 플레인 전략에 일정 부분 공감하면서도, 지나치게 정돈된 벤더의 서사에는 여전히 경계심을 갖는다. 통합 에이전트 플랫폼, 거버넌스, 모니터링, 평가, 관측성, 시뮬레이션 기능은 모두 필요하다. 특히 제미나이 엔터프라이즈는 기업이 개별적으로 엮어 왔던 복잡한 운영 요소를 중앙화하려는 시도라는 점에서 의미가 있다.
다만 컨트롤 플레인을 실제 업무 그 자체로 오해해서는 안 된다.
에이전틱 AI 관련 데이터는 한 가지 메시지를 반복하고 있다. 기대감이 실제 운영 성숙도를 크게 앞서고 있다는 점이다.
업무 자동화 기술 카문다(Camunda)의 ‘2026 에이전트 오케스트레이션 및 자동화 현황’ 보고서에 따르면, 71%의 조직이 AI 에이전트를 사용하고 있다고 답했지만 지난 1년간 실제 운영 환경에 적용된 사례는 11%에 그쳤다. 또한 73%는 에이전틱 AI에 대한 비전과 현실 사이에 격차가 있다고 인정했다.
가트너 역시 비슷한 전망을 내놓았다. 2027년 말까지 에이전틱 AI 프로젝트의 40% 이상이 중단될 것으로 예상되며, 그 이유로는 비용 부담, 불명확한 비즈니스 가치, 미흡한 리스크 관리가 꼽힌다.
분명히 짚고 넘어가야 할 점은, 이것이 모델의 문제가 아니라는 사실이다. 전형적인 엔터프라이즈 소프트웨어 운영 문제에 가깝다.
이 같은 흐름은 보안과 거버넌스 영역에서도 동일하게 나타난다. 생성형 AI 관리 플랫폼 라이터(Writer)의 2026 조사에 따르면, 67%의 경영진이 승인되지 않은 AI 도구로 인해 데이터 유출이나 보안 사고를 경험했다고 답했다.
또한 36%는 AI 에이전트를 감독하기 위한 공식적인 계획이 없으며, 35%는 문제가 발생한 에이전트를 즉시 중단할 수 없다고 밝혔다.
세 가지 가운데서도 특히 마지막 수치가 가장 우려되는 대목이다. 기업 시스템과 고객 데이터, 조직의 인증 정보에 접근할 수 있는 소프트웨어 에이전트임에도 불구하고, 3분의 1이 넘는 기업이 문제가 발생했을 때 이를 신속하게 중단할 수 있다고 확신하지 못하고 있다.
그럼에도 정말 걱정하지 않아도 되는 걸까?
에이전틱 엔터프라이즈 환경의 숨겨진 진실은, 정작 에이전트 자체는 아키텍처에서 가장 덜 중요한 요소일 수 있다는 점이다. 모든 주목과 기대는 에이전트에 쏠리지만, 실제 핵심은 따로 있다. 인증과 권한 관리, 워크플로 경계 설정, 데이터 품질, 검색과 메모리, 평가 체계, 감사 추적, 비용 통제, 그리고 에이전트가 혼란에 빠졌을 때 어떤 시스템을 ‘단일 진실의 원천(source of truth)’으로 삼을지 결정하는 문제 등이 진짜 과제다.
구글 클라우드 넥스트에서의 발표는 에이전틱 엔터프라이즈가 이미 도래했음을 증명하지는 않았다. 대신, 에이전틱 기업이 현실화된다면 결국 기존 엔터프라이즈 소프트웨어가 중요한 국면에 접어들었을 때와 매우 유사한 모습이 될 것임을 보여줬다. 마법 같은 혁신보다는 거버넌스 중심의 구조로 수렴한다는 의미다.
이는 분명 진전이지만, 결코 ‘화려한 발전’은 아니다.
에이전틱 AI 시장에서 승자를 가려내고 싶다면, 가장 똑똑한 에이전트를 가진 기업을 찾기보다 데이터 계약이 명확하고, 평가 체계가 정교하며, 일관된 인증 모델을 갖추고, 비공식적인 ‘섀도우 AI’ 확산을 최소화하는 기업을 주목해야 한다. 그러나 업계는 이러한 이야기를 꺼리는 경향이 있다. 자율적으로 일하는 디지털 노동자에 대해 말하는 것이 데이터 계보나 접근 통제를 논하는 것보다 훨씬 흥미롭기 때문이다.
하지만 엔터프라이즈 소프트웨어가 현실이 되는 지점은 바로 이런 ‘지루함’ 속에 있다.
에이전트 시대의 도래를 성급히 선언하기 어려운 또 다른 이유도 있다. 에이전트의 유용성은 결국 안전하게 이해하고 활용할 수 있는 데이터에 달려 있기 때문이다. 구글 역시 이를 분명히 인식하고 있다. 지식 카탈로그 크로스 클라우드 레이크하우스 전략을 포함한 ‘에이전트 데이터 클라우드’ 개념은, 에이전트가 신뢰할 수 있는 비즈니스 맥락을 필요로 한다는 점을 인정한 것이다.
이러한 맥락이 없다면 에이전트는 엔터프라이즈의 업무 수행자가 아니라, 시스템을 떠도는 ‘말 잘하는 관광객’에 불과하다.
결국 이번 구글 클라우드 넥스트에서 가장 고무적인 발표는 에이전트를 더 자율적으로 만드는 기술이 아니었다. 오히려 에이전트를 더 잘 관리할 수 있도록 만드는 기능이었다. 에이전틱 AI는 거대한 가능성을 지니고 있지만, 그것이 현실이 되기 위해서는 무엇보다 ‘지루할 만큼 안정적인’ 특성을 입증해야 한다.
dl-ciokorea@foundryco.com


IDC estimates there were over 28 million AI agents deployed by the end of last year, and predicts there’ll be over 1 billion actively deployed by 2029, executing 217 billion actions per day.
It’s easy to build an AI agent POC, says Venkat Achanta, chief technology, data, and analytics officer at TransUnion, a global credit reporting company with $4.6 billion in revenues. But governing, securing, and scaling it are a whole other challenge, especially for companies in highly regulated industries such as financial services and healthcare.
To address the problem, TransUnion spent the last three years building its agentic AI platform, OneTru. The goal was to make something as reliable and deterministic as the old, scripted, expert-style systems but as flexible as gen AI, and as easy to interact with as a chatbot.
The trick, however, was to combine the best of both worlds by using old-school systems for core processes where explainability and reliability are key, and layering in gen AI functionality in limited ways for the tasks it was uniquely suited for. And since the infrastructure to do this wasn’t available, TransUnion built its own, allocating $145 million to the project.
That was a big investment in an unproven technology, but it’s already led to $200 million in cost savings. More than that, once the platform was built, TransUnion used it to build customer-facing solutions.
In March this year, for example, TransUnion released its AI Analytics Orchestrator Agent, built using the OneTru platform and powered by Google’s Gemini models. The agent is already being used by TransUnion internally to improve analytics, and can also be used by customers to run sophisticated data analysis without the need for data scientists.
Many clients use TransUnion’s data but don’t use other solutions and platforms, Achanta says. The new orchestrator agent has the potential to help customers get more value out of the data, and unlock new revenue streams for the company.
And more agents are in the works, Achanta says. The key to making them work is the orchestration, governance, and security layers. Just making an agent do something is very easy for anyone, he says, and can take just a few days. The company can also create agents quickly. “But I have the foundation and guardrails, and the agent sitting on my platform uses all of them,” he says. “That’s what gives us power.”
The secret to making AI agents behave is to separate the layers of the task and assign each layer to a different system, each one operating under a set of constraints. This approach limits the damage any particular agent can do, creates a system of checks and balances, and restricts the riskiest activities to a pre-gen AI technology.
For example, at TransUnion, the core decision-making is performed by an updated version of an expert system. It operates under a set of well-defined, auditable rules and works predictably, cost-effectively, and at low latency. When it encounters a situation it hasn’t seen before, an LLM is used to analyze the problem, a different agent might then turn it into a new rule, and then a human might be called in to review the results before the new rule is added to the expert system. There are different agents that understand the semantic layer, interact with humans, and perform other tasks.
“With the neural reasoning layer — the LLM — we put humans in the loop,” he says. “When it’s a symbolic reasoning layer, which is logic and machine-learning-driven, we let it be automated.”
So when each agent operates within very narrow constraints, on just the limited data it needs for that one task, and is limited to what it can do, the entire system becomes much more governable and reliable.
It’s like the difference between an assembly line, where multiple workers each do a single, distinct task, instead of a workshop where a single artisan does everything. The assembly line can do work faster and more reliably but today, many enterprises deploy their AI agents as if they were craftsmen. The latter approach can result in creative, unique products, but this isn’t always what a company needs.
Nicholas Mattei, chair of the ACM special interest group on AI and professor at Tulane University, suggests that companies focus on building in extra security at points where different parts of the agentic system connect.
“Make sure you have security at the seams,” he says. For example, if an agent sends requests to an email service, set up a checkpoint between the two. “Around the gaps between the unreliable agents and where the traditional software lives, that’s where you want to focus your security processes,” he says.
In a Jitterbit survey of 1,500 IT leaders released in March, AI accountability — security, auditability, traceability, and guardrails — is the biggest factor when it comes to the final AI purchase decision, ahead of speed of implementation, vendor reputation, and even TCO. Security, governance, and data privacy risks were also top issues preventing AI initiatives from moving to production, ahead of costs and integration challenges. And they’re right to be worried.
Earlier this year, researchers at cybersecurity firm CodeWall were able to breach McKinsey’s new AI platform, Lilli. Using an AI tool of their own, the researchers said they could access 47 million chat messages, 728,000 files, 384,000 AI assistants, 94,000 workspaces, 217,000 agent messages, nearly 4 million RAG document chunks, and 95 system prompts and AI model configurations.
“This is decades of proprietary McKinsey research, frameworks, and methodologies — the firm’s intellectual crown jewels sitting in a database anyone could read,” the researchers wrote.
The reason? Out of over 200 publicly exposed API endpoints, 22 required no authentication. It took just two hours for the researchers to get full read and write access to Lilli’s entire production database. McKinsey responded quickly to the alert, patched the unauthenticated endpoints, and took other security measures.
“Our investigation, supported by a leading third-party forensics firm, identified no evidence that client data or client confidential information were accessed by this researcher or any other unauthorized third party,” the firm said in a statement.
IDC says the incident underscores just how dangerous the breach of an AI system can be to an enterprise.
“Most companies are still thinking about AI risk in yesterday’s terms: data leakage, bad outputs, and brand reputation damage,” says Alessandro Perilli, IDC’s VP for AI research. “Those are serious issues, but the bigger risk becomes delegating authority to AI systems.”
By getting access to an agentic AI platform, an attacker can’t just see something they’re not supposed to, but also covertly change how the company acts. And securing enterprise-scale agentic AI systems like Lilli is only half the challenge. According to Gartner, 69% of organizations suspect employees use prohibited AI tools, and 40% will experience security or compliance incidents by 2030 as a result.
But available discovery tools aren’t fully ready to find AI agents, Gartner says.
“If I asked you how many agents run in your enterprise right now, where are you going to go look it up?” asks Swaminathan Chandrasekaran, global head of AI and data labs at KPMG, which now has several thousand AI agents in production. “Have they all been onboarded and have identities? Have they gone through a proper authentication process and who’s in charge of them? That piece of infrastructure doesn’t exist.”
Tools are just starting to emerge, however, or companies are creating DIY solutions, he says. “That’s what’s going to give CIOs peace of mind,” he says.
We’re already seeing public examples of individual employees deploying powerful agentic AI to negative consequences. Summer Yue, Meta’s alignment director, recently decided to use OpenClaw, a viral open-source agentic AI tool, to help handle her inbox. After it worked in a test inbox, she deployed it for real.
“Nothing humbles you like telling your OpenClaw to confirm before acting and watching it speedrun deleting your inbox,” she wrote on X. “I couldn’t stop it from my phone. I had to run to my Mac mini like I was defusing a bomb.”
In the past, an employee might upload sensitive information to a chatbot or ask it to write a report that they’d then copy and paste, and pass off as their own. As these chatbots evolve into full-on agentic systems, the agents now have the ability to do anything a user has privileges to do, including accessing corporate systems.
To manage this new security risk, companies will need to move past role- and identity-based controls to intent-based ones, says Rakesh Malhotra, principal in digital and emerging technologies at EY.
It’s not enough to ask whether an agent has permission to access a system to make a change to a record, he says. Companies have to be able to ask why are you changing this. That’s a big challenge right now.
“The observability stacks don’t capture the intent of why the agent did something,” he says. “And that’s really important to understand. Trust is based on intent, and there’s no way for any of these systems to capture intent.”
If a human employee tries refactor the entire code base, they’d be asked to provide a good reason for doing that. “And if you’re refactoring without any specific reason, maybe you shouldn’t do it,” Malhotra says. “With people, there are ways for this to be adjudicated. I don’t know how to do this with agents.”
TransUnion’s Achanta repeatedly mentioned the semantic foundation of the company’s OneTru platform. Such an understanding of information helps systems understand not just what the data is, but what it means, and how it relates to other data. Gartner says developing a semantic layer is now a must-do for companies deploying AI.
“It’s the only way to improve accuracy, manage costs, substantially cut AI debt, align multi-agent systems, and stop costly inconsistencies before they spread,” the firm says.
By 2030, universal semantic layers will be treated as critical infrastructure, alongside data platforms and cybersecurity, Gartner predicts. And agents need context to be able to do anything meaningful with data, says KPMG’s Chandrasekaran. That’s where a company’s knowledge is contained.
“That’s your new IP for the enterprise,” he says. “Context is the new moat.”
For John Arsneault, CIO at Goulston & Storrs, creating a solid data foundation is also a way to avoid vendor lock-in.
“If you’re buying things and moving your data into them to create workflow automation or agentic work assistants, you’ll have a hard time getting out of it,” he says. “But if you take a data-centric approach, you can at least move from one to the other if there’s a shift in the marketplace.”
The law firm has migrated its client-oriented work products into NetDocuments, a document management system specifically focused on the legal industry. And for the rest of the data the company collects, it goes into Entegrata’s legal data lakehouse.
“Our goal is to have all our other applications eventually point at that data lake,” he says. “Then we’ll have these two environments where all the firm’s data exists, which will allow us to put any AI tool we use on top.”
It’ll also make the data flows easier to manage, he adds, and will enable the firm to adapt quickly to whatever AI technology comes next. “Whether gen AI, agentic, or Anthropic stuff, with the Cowork legal plugin, it’s very difficult to keep up with,” he says. “And it changes every six months.”
The last part of the agentic infrastructure puzzle, after getting security guardrails in place and creating a usable data layer, is orchestration. Agentic AI systems require agents talk to each other and human users, and interact with data sources and tools. It’s a complicated challenge, and this technology is still very much in its infancy, though moving quickly. MCP is one such example, and is a key piece of solving the orchestration puzzle. AI vendors have been remarkably willing to cooperate here.
“When social networks were born, and Facebook and Twitter were discussing a standard protocol for interacting, nobody wanted to adopt their competitors’ protocol,” says Agustin Huerta, SVP of digital innovation and VP of technology at Globant, a digital transformation company. “Now everyone is going through MCP and maturing it as a standard protocol.”
But that’s not to say agentic integration has been solved. According to a Docker survey of more than 800 IT decision makers and developers, the operational complexity of orchestrating multiple components is the biggest challenge when it comes to building agents.
In particular, 37% of respondents say orchestration frameworks are too brittle or immature for production use, and 30% report testing and visibility gaps in complex orchestrations.
In addition, while 85% of teams are familiar with MCP, most say there are significant security, configuration, and manageability issues that prevent deployment in production. And there are other integration issues enterprises have to deal with.
“One problem yet to be solved is how to get a proper dashboard to control all these agents, to know exactly what’s going on with each of them,” says Huerta. “One dashboard will let you monitor agents built with OpenAI, and one is for agents that live on Salesforce, but none can expose telemetry in a central dashboard for control, auditing, and logging.”
For companies just starting to deploy agents, or who are sticking to a single platform, this isn’t yet an issue, he adds, but as they leverage a larger network of agents, they’ll start to experience the challenges. Globant itself is building its own internal dashboard for agentic AI, for instance.
And at Brownstein Hyatt Farber Schreck, a 50-year-old law firm with about 700 employees and clients around the US, there are several areas where AI is being deployed, including a proposal generator system.
Normally, it can take several people days to review a client’s request for proposal, go through hand-written notes or meeting transcripts, and pull together other relevant materials, says Andrew Johnson, the firm’s CIO.
“We can feed all that information into a computer and extract key criteria to produce a quality first draft in minutes,” he says.
Multiple agents are required for different parts of the process — one to extract success criteria or staffing requirements, one to look for precedents and lessons learned, and others for pricing and the brand standards. “Each of those agents is autonomous and needs to be orchestrated so the outputs of each are fed into the next step,” Johnson says. For the most part, that means a RAG system, since most of the legacy platforms the firm uses have yet to incorporate an MCP layer.
Depending on the task, individual agents may be powered by different models, which is another layer of orchestration that needs to be managed.
Then there’s cost monitoring. If an AI agent or group of agents gets into an infinite feedback loop, the inference costs can quickly rise.
“We’re aware of the concern, though we have yet to see it manifest,” says Johnson. “So we have monitoring in place. If we exceed thresholds, we react to it.”
Regardless of strategies or measures to absorb setbacks, everything having to do with AI is changing faster than anything else companies have seen.
“I’ve been in technology for 25 years and I’ve never seen anything like this,” says EY’s Malhotra. “The fastest growing companies in the history of companies have all been created in the last three to four years. The growth in adoption is just unprecedented. And I talk to clients all the time implementing technologies that were highly relevant nine or 10 months ago, and everyone’s moved on.”
