Visualização normal

Antes de ontemStream principal
  • ✇Security | CIO
  • Why a modern data foundation takes more than a new platform
    Too many data modernization efforts begin with the platform. The conversation turns to replacing the underlying data environment, moving reporting workloads to the cloud or retiring legacy tooling. Those decisions matter, but in my experience, they are rarely what makes the work hard. What makes the work hard is everything that has built up around the platform over time. I have seen this most often in organizations that inherited legacy architecture through acquisitio
     

Why a modern data foundation takes more than a new platform

7 de Maio de 2026, 09:00

Too many data modernization efforts begin with the platform. The conversation turns to replacing the underlying data environment, moving reporting workloads to the cloud or retiring legacy tooling. Those decisions matter, but in my experience, they are rarely what makes the work hard.

What makes the work hard is everything that has built up around the platform over time.

I have seen this most often in organizations that inherited legacy architecture through acquisition, accumulated technical debt through years of deferred investment or saw reporting logic and master data evolve without enough enterprise discipline. On the surface, the environment may still appear functional. Dashboards are still refreshing. Reports still go out. Teams still find ways to get numbers. But once the business begins to scale, the weaknesses become much harder to hide.

The warning signs usually appear before the platform itself becomes the problem. Different teams start using different numbers for the same KPI, critical reporting logic begins to live outside core systems and analysts spend more time reconciling data than interpreting it. New business units take longer to onboard, reporting changes become harder than they should be and, before long, the issue is no longer just the data platform. It becomes a broader problem of trust, scalability and control.

That is why too many modernization efforts are scoped too narrowly. Replacing the platform is only one part of the challenge. The real work is untangling years of logic, definitions and integration patterns that were never designed to scale together.

The platform is only one layer of the problem

One of the clearest lessons I have learned is that legacy data environments rarely fail in an isolated way. They fail by becoming harder to trust and harder to change.

In many environments, the data platform is carrying far more than data. It is carrying years of workarounds for things that source systems were never able to handle cleanly. Reporting logic ends up split across ETL jobs, SQL transformations, scripts, spreadsheets and side databases. Some of it was built quickly to solve immediate business needs. Some of it was necessary at the time. But over time, those decisions create duplicated logic, hidden dependencies and handoffs that become harder to govern every time the business changes.

The issue is not only technical debt in the traditional sense. It is also reporting debt, where inconsistent definitions and duplicated logic across reports make data harder to trust and maintain. KPI definitions evolve differently across functions. Business logic gets embedded in too many places. Teams build local workarounds to compensate for mismatched source data. The business keeps moving, but the data foundation falls further behind.

That is why I think CIOs need to treat modernization less like a platform replacement and more like an effort to restore architectural separation and control.

In practice, that means separating ingestion, transformation and reporting instead of allowing all three to collapse into the same layer. It means reducing the number of places where business logic can live. It means establishing a clear source of truth for key metrics before they show up in executive dashboards. It also means making sure master data is defined consistently enough that teams are not comparing duplicate records or conflicting definitions and assuming the platform is to blame.

Fit matters more than feature depth

Platform decisions are often misunderstood.

On paper, most modern data platforms are capable. They all promise scale, flexibility and performance. But in practice, the decision is rarely about capability alone. It is about fit.

In recent modernization work, I have seen firsthand that the wrong decision is not always choosing an inferior technology. More often, it is choosing a platform that introduces unnecessary complexity into an environment that is already fragmented.

That complexity shows up quickly in the form of another cloud to manage, another billing model to track, another toolchain to support, another integration layer to maintain, another set of skills to build and another governance surface to control.

Those costs do not always show up clearly in vendor comparisons, but they show up immediately in execution.

That is why I have become more disciplined about asking a different question. Not what is the most powerful platform on paper, but what choice best aligns with the operating model, capabilities and simplification goals of the enterprise.

There is no one-size-fits-all answer. For some organizations, a separate cloud native warehouse may make perfect sense. For others, a more unified platform approach is the better fit because it leverages current skills, preserves momentum and avoids duplicating effort inside an ongoing modernization program.

That distinction matters.

The goal is not to build the most theoretically flexible architecture. It is to build one where the organization can actually govern, extend and operate over time.

Master data is where credibility starts

Modernization does not become credible until master data starts to improve.

That is not a side effort. It is part of the foundation.

In many enterprises, the root problem is not just the reporting layer. It is the fact that core entities such as customers, products, suppliers and locations are still defined differently across systems. When that happens, every downstream discussion about trust, reporting consistency and AI readiness becomes harder than it should be.

One area where this becomes tangible is syndication and deduplication. In most legacy environments, the same customer, product or supplier exists multiple times across systems, often with slight variations in naming, attributes or hierarchy. Over time, teams build local workarounds to compensate, which only reinforces the fragmentation.

Deduplication is not just a technical exercise. It forces alignment to what defines a unique entity. Syndication operationalizes that alignment, ensuring that once data is standardized, it is consistently distributed across systems and downstream processes. Without both, organizations end up maintaining multiple versions of the same truth and the platform becomes harder to trust regardless of how modern it is.

That is why I keep coming back to master data discipline. If important reports are not built on agreed business definitions and trusted logic, leaders end up looking at different versions of the same KPI. If customers, products and suppliers are not defined consistently across the business, the platform may look modern while the reporting remains hard to trust.

That is also why phased execution matters. Master data does not have to be fully resolved upfront, but it does need to be mature enough in the right domains to support the first releases and give the organization a foundation it can extend with confidence.

A modern foundation has to be engineered for change

What has worked best in my experience is a disciplined architecture that separates ingestion, transformation and reporting instead of mixing them together in ways that are hard to maintain.

That is where the medallion model becomes practical, giving the organization a structured way to separate raw data, standardized data and business-facing reporting. Bronze is where data first comes in from different systems. Silver is where it gets standardized, so the business is not working from conflicting definitions or duplicate records. Gold is where reporting and KPIs can sit on a more trusted foundation. That separation makes the environment easier to scale, troubleshoot and govern over time. The value is not in terminology, but in the discipline behind it.

I have seen organizations modernize into cloud data warehouses, data lakes and lakehouse architectures. The pattern is the same. If the underlying logic, master data and governance are still fragmented, the new platform inherits the same trust problems as the old one.

That same discipline has to carry through to the platform itself. If the environment is going to hold up under growth, the pipelines have to be observable, versioned and resilient enough to support change without constant rework. Environment separation, CI/CD workflows and operational monitoring are not extras. They are part of what makes the platform sustainable.

I also would not lead a modernization effort with AI, even when the pressure is high. AI raises the stakes, but it does not change the core problem. If the data foundation is still fragmented, poorly governed or inconsistent, a new AI layer will not solve it. That is increasingly showing up in the market, with Gartner warning that many generative AI efforts will stall because of poor data quality, inadequate risk controls, escalating costs or unclear business value. Foundry’s latest AI research reinforces this, identifying data storage and management as a top foundational investment for internal AI.

Final thought

The technology will continue to evolve.

The organizations that benefit most will not be the ones chasing every new platform. They will be the ones making disciplined decisions about how those platforms fit into their operating model and executing against them consistently.

Modernization does not fail because the technology is not good enough.

It struggles when the decisions behind it are not grounded in how the business actually runs.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • The DSPM promise vs the enterprise reality
    The data sprawl problem is worse than anyone admits Before a DSPM tool can protect data, it must find it. That sounds straightforward. In practice, it is the first place most programs quietly begin to unravel. Enterprises have been operating in hybrid and multi-cloud environments for a long time. Data has followed every workflow — into Salesforce, into SharePoint, into dozens of S3 buckets that were created by developers who have since moved on, and into collaboratio
     

The DSPM promise vs the enterprise reality

30 de Abril de 2026, 08:00

The data sprawl problem is worse than anyone admits

Before a DSPM tool can protect data, it must find it. That sounds straightforward. In practice, it is the first place most programs quietly begin to unravel.

Enterprises have been operating in hybrid and multi-cloud environments for a long time. Data has followed every workflow — into Salesforce, into SharePoint, into dozens of S3 buckets that were created by developers who have since moved on, and into collaboration tools adopted during the pandemic without any formal data classification policy attached. Nobody tracked it systematically. Research from Cyera’s 2024 DSPM Adoption Report found that 90% of the world’s data was created in just the last two years, and total data volume by 2025 reached 181 zettabytes. Security teams are being asked to govern a landscape that is growing faster than any tool or team was designed to handle.

When DSPM scanners go to work on a large enterprise environment, the volume of findings almost always exceeds initial expectations — sometimes by an order of magnitude. One organization I worked with discovered sensitive customer PII in seventeen cloud storage locations that they had no formal record of. Another found regulated financial data sitting in a collaboration workspace that had been shared with an external contractor two years prior and never revoked.

The visibility is genuinely valuable. But, as Wiz notes in their DSPM framework, visibility without remediation capacity is just a longer list of things that can go wrong. And that is exactly where the first real friction begins.

Ownership is a political problem, not a technical one

DSPM tools are exceptionally good at identifying data risk. They are not designed to resolve the organizational question of who is responsible for fixing it. That question, in most enterprises, does not have a clean answer.

Security teams surface the finding. The data sits in a business unit’s environment. The IT team may own the cloud account, but the data owner is in Finance, HR, or a product team operating on a separate roadmap and budget cycle. When the DSPM platform generates a remediation ticket, the question of who closes it — and who gets measured on closing it — is rarely answered in advance.

This creates what I call the remediation gap. Findings accumulate. Risk scores rise. But nothing gets fixed, because no single team has both the authority and the incentive to fix it. Security points at the business. The business points at IT. IT points at the data owner. The data owner has a product launch in six weeks and no security budget. Forcepoint’s DSPM implementation research confirms this pattern: Even capable platforms underdeliver when rollout turns into a scanning project with unclear ownership and remediation that lives in a permanently deferred backlog.

I have watched this dynamic play out in organizations across industries. It is not a technology failure. It is a governance failure — and no DSPM platform in the market today ships with a solution to it. That solution must be built by leadership, before deployment, with teeth. That means defined data ownership models, escalation paths and accountability metrics that connect to performance conversations, not just security dashboards.

Classification debt is real, and it goes well with compounding

Every DSPM implementation depends on one foundational input: A coherent data classification framework. Most enterprises do not have one that is current, enforced, or agreed upon across business units.

Organizations are equipped with policy documents written five years ago, and what was defined there, nobody uses consistently. What adds more is a growing volume of unstructured content that was never classified at all. According to a 2024 industry survey cited by Securiti, 83% of IT and cybersecurity leaders assert that lack of visibility into data contributes significantly to their weak security posture — a figure that points directly at the classification gap sitting underneath most programs.

DSPM tools apply machine learning to infer sensitivity from data patterns — and they are increasingly good at it. But inference is not a substitute for intentional classification. False positives create noise. False negatives create blind spots. Both erode trust in the platform over time. And once analysts stop trusting the findings, the program stalls regardless of how sophisticated the tooling is.

The harder truth is that many organizations use the DSPM project as a forcing function to finally build the classification framework they should have built years ago. That is not inherently wrong. But it dramatically expands the scope and timeline, and it requires business stakeholder engagement that security teams are rarely resourced to drive on their own. Executives who budget for a DSPM tool without budgeting for the classification work alongside it are setting their programs up for a slow, expensive drift toward shelfware.

Integration complexity is systematically underestimated

DSPM vendors will show you a connector library that spans AWS, Azure, GCP, Microsoft 365, Salesforce, Snowflake and a long list of other platforms. What the demo does not show you is what happens when your specific version of a legacy ERP system does not match the connector’s assumptions or when your on-premises database sits behind a network segment the cloud-native scanner cannot reach without significant architectural change.

Enterprise environments are heterogeneous by nature. Palo Alto Networks’ market analysis puts the DSPM market on a trajectory toward $2 billion by 2025, growing at rates between 25% and 37% annually — a reflection of just how aggressively organizations are investing in this space. But investment velocity and implementation maturity are not the same thing. The average large organization runs hundreds of distinct data stores across multiple cloud providers, legacy systems and third-party SaaS applications. Getting DSPM coverage across all of them is not a deployment — it is an ongoing engineering program.

Connectors break when APIs change. New data sources appear with every acquisition and product build. Maintaining coverage requires dedicated resources that are rarely factored into the initial business case. Executives should push their vendors on exactly which environments will have full coverage at go-live versus which ones are on a roadmap with no committed timeline. The distinction matters enormously because a DSPM deployment with significant coverage gaps gives a false sense of security that can be more dangerous than no deployment at all.

This is a point worth reinforcing with your procurement team: Gartner’s Market Guide for DSPM explicitly flags that organizations can no longer separate data visibility from data control — and that coverage depth, not just breadth, is the critical variable when evaluating platforms.

Alert fatigue arrives faster than expected

A fully operational DSPM deployment in a large enterprise will generate findings at a volume that most security operations teams are not built to absorb. The irony is that the better the tool works, the faster alert fatigue sets in.

Risk prioritization is the answer in theory. In practice, prioritization logic requires ongoing tuning that takes months of calibration with your specific data environment. Varonis, in their DSPM guidance for CISOs, makes the point directly: The goal should not be to generate a list of findings but to surface meaningful, actionable alerts that can be remediated — ideally with automation doing the heavy lifting. Most implementations fall well short of that standard in the early months.

In the meantime, analysts are triaging hundreds of findings per week, many of which turn out to be acceptable risks or known exceptions. Teams burn out. Findings get acknowledged and deprioritized. The board dashboard shows a healthy posture score that no longer reflects ground reality. Zscaler’s analysis of cloud data security challenges identifies this precisely: Security teams need AI and ML-powered prioritization not just to reduce noise but to help analysts focus effort on the data exposures that could realistically lead to a breach.

This is not an argument for turning off the tool. It is an argument for honest capacity planning. If your security operations team is already stretched, a DSPM deployment without additional analyst headcount or a meaningful automation investment is not going to improve your security posture. It is going to add a new category of noise to an already overloaded function.

What good looks like

None of the friction described here is insurmountable. Organizations that get DSPM right tend to share a few common attributes that have nothing to do with which vendor they chose.

They treat DSPM as an organizational change program, not a technology deployment. They invest in governance structures before they deploy scanners. They define data ownership at the business unit level with clear accountability, and they build that accountability into how people are measured and managed. They budget for the classification work alongside the tooling. They phase their integration roadmap honestly, scope the first phase to environments where coverage will be complete, and build confidence before expanding.

They also pay attention to what Microsoft’s research on enterprise data security posture flags as the underlying imperative: Organizations must stop seeing data security as a collection of individual tools and start treating it as a holistic program anchored in measurable business outcomes. That shift in framing changes everything — from how the board conversation is structured to how remediation accountability is assigned across the business.

Most importantly, they have executive sponsorship that goes beyond signing the purchase order. The CISOs who successfully land DSPM programs are the ones who have a CFO, COO, or CEO who understands that data security risk is a business risk — and who is willing to hold business unit leaders accountable for their piece of it.

DSPM, at its best, gives your enterprise the situational awareness it needs to make informed decisions about data risk. The organizations that leverage awareness as a genuine security improvement are the ones that walk in with eyes open — prepared for the friction, staffed for the remediation work and governed for the accountability.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • AI won’t fix your data problems. Data engineering will
    Most enterprise AI investments today focus on models, compute and tooling. The assumption is that intelligence is the binding constraint and that a more capable model will produce better outcomes across every dimension that matters. This is a reasonable starting point, but it is also where most initiatives go wrong. The models organizations are deploying were trained on public data at scale. None of your internal systems, customer schema, pricing logic or support taxono
     

AI won’t fix your data problems. Data engineering will

28 de Abril de 2026, 09:00

Most enterprise AI investments today focus on models, compute and tooling. The assumption is that intelligence is the binding constraint and that a more capable model will produce better outcomes across every dimension that matters. This is a reasonable starting point, but it is also where most initiatives go wrong.

The models organizations are deploying were trained on public data at scale. None of your internal systems, customer schema, pricing logic or support taxonomy appeared in that training.

When a model encounters your internal data, it processes it as best it can, but without the grounding that comes from having been trained on it. Early AI initiatives are struggling not because the models are weak, but because the context they need to operate reliably inside your organization is something they have never seen before.

Data engineering holds the key to this context.

Why context breaks first

Think about what an AI agent handling a support escalation needs to function well: The customer’s support history across time, not just the most recent ticket. Billing records matter too, because the character of a problem often depends on what the customer is paying for and whether anything has changed recently. Product usage data is equally essential, as what a customer reports is frequently explained by how they have been using the product. None of these things live in a single place, as they are scattered across systems that were each built by different teams, on different timelines, with different definitions of what a customer record is supposed to capture.

Human agents work around these gaps through judgment developed over time. They know which system to trust for a particular type of question, they know the usage data runs six hours behind and they know how to weigh conflicting signals based on context that is never written down anywhere. AI systems do not have that judgment. They process whatever they receive and act on it, which means that when the context is inconsistent or incomplete, the output reflects that, not as a visible error but as a subtly wrong decision. The customer notices before anyone on your team does.

When bad data stops being annoying and starts being operational

In the analytics era, data quality problems surfaced as numbers that looked off in dashboards. Analysts were the error-detection layer, and when something looked wrong, they would investigate, find the issue and get it fixed. The feedback loop was slow, but it existed, and it caught most problems before they reached the business in any consequential way.

AI agents making operational decisions do not have that buffer. They have no way of knowing that a schema migration introduced silent gaps or that a pipeline is running four hours late. Refunds go out incorrectly because the billing context was incomplete at the moment of decision.

What an analytics team could absorb as an occasional anomaly in a report becomes a real problem when an automated system acts on degraded context hundreds of times a day before anyone identifies the pattern. The volume is what makes it dangerous, and by the time it surfaces, the damage is already distributed across thousands of interactions.

The role data engineers play now

For the past decade, data engineering meant building pipelines that fed warehouses so analysts could query data and produce dashboards. The work was foundational but treated as background infrastructure, and its value was measured in pipeline reliability, query performance and reporting freshness.

The agent era changes the purpose of that work entirely. When AI systems make operational decisions, the goal is no longer producing data that is queryable. The goal is producing context that is reliable enough for a system to act on, and those are different problems with different requirements. That starts with entity resolution across systems, providing a consistent and trustworthy answer across every data source that touches them.

This also means handling late-arriving data explicitly, because agents cannot act on a state of the world that no longer holds. Freshness thresholds need to be calibrated to the decision type, since a personalization recommendation can tolerate six-hour-old usage data in ways that a refund workflow cannot. Lineage needs to survive schema changes and reorganizations, so that the provenance of any piece of context can be traced when something goes wrong.

None of that is a model problem, nor does it yield to prompt engineering. This is data engineering work, and organizations that treat it as anything else will spend a long time debugging production failures that look like AI problems but are infrastructure problems.

Context is only half the problem

Getting the right information to an agent is necessary, but it is not sufficient. There is a second challenge that most organizations have not yet confronted: How do you coordinate, govern and operate dozens or hundreds of autonomous agents making real decisions across your business?

Agent frameworks handle reasoning well. What they do not handle is everything around the agent: Scheduling when it runs, controlling what it is allowed to spend, enforcing who can approve its decisions, managing retries when external systems fail and ensuring that when an agent needs human sign-off, it does not tie up compute for hours while it waits. These are not AI problems. They are operational infrastructure problems, and they are the same class of problems that orchestration platforms have been solving for data pipelines for over a decade.

One agent answering questions in a sandbox is a proof of concept. Fifty agents making operational decisions across finance, compliance and customer operations is a fleet management problem, and it requires the same kind of scheduling, governance, cost controls and auditability that enterprises already demand from their data infrastructure.

Orchestration is typically the one layer that already has visibility across platforms, spanning your warehouse, your transformation layer, your external APIs and your operational databases. That cross-platform vantage point is what makes it possible to build a context layer that is comprehensive rather than siloed.

Governance needs to execute at runtime, not live in documentation. Policies about data access, cost limits and human approval requirements need to be enforced in code as agents run, not described in guidelines that agents cannot read and humans forget to follow.

What this means going forward

The organizations that deploy AI agents at scale will have invested in two things before those agents reach production.

First, a context layer that gives agents a reliable, cross-platform understanding of the enterprise’s data. This means not just raw access to tables, but semantic knowledge of what the data means, where it comes from and how much to trust it.

Second, an operational layer that governs how agents act, with the scheduling, cost controls, auditability and human-in-the-loop checkpoints that enterprise deployment demands.

These two investments are not independent. They form a flywheel. Better context makes agents more effective, which drives broader adoption, which generates richer operational metadata, which deepens the context layer further.

Data engineers are becoming the people who determine whether automated decisions are trustworthy, not because they control the models but because they control both the context on which those models operate and the infrastructure through which they act. The organizations that understand this early will keep building on it. The ones that keep treating data engineering and orchestration as background infrastructure will keep rediscovering the same production failures, just with different names on the postmortem each time.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • Moving autonomous agents into production requires a universal context layer
    I recently sat down with a group of enterprise technology leaders to discuss artificial intelligence deployments. It was a spirited discussion with lots of learnings. The consensus among the group highlighted a rapid transition away from simple chat applications. Companies want autonomous agents capable of executing multi-step workflows across human resources and customer service departments. I listened to chief information officers describe the harsh reality of moving age
     

Moving autonomous agents into production requires a universal context layer

24 de Abril de 2026, 06:00

I recently sat down with a group of enterprise technology leaders to discuss artificial intelligence deployments. It was a spirited discussion with lots of learnings. The consensus among the group highlighted a rapid transition away from simple chat applications. Companies want autonomous agents capable of executing multi-step workflows across human resources and customer service departments. I listened to chief information officers describe the harsh reality of moving agentic workflows from pilot programs into live production. Scaling the technology exposes severe infrastructure gaps. Dropping high-speed agents into old systems creates immediate operational chaos. Achieving business value requires building an architecture of flow.

Building an architecture of flow replaces isolated bottlenecks with continuous execution. Continuous execution ensures intelligence moves instantly across the organization. The universal context layer serves as the technological connective tissue enabling continuous execution. The layer sits beneath the applications. Bridging disparate legacy systems provides a common language for both autonomous agents and human workers.

The data fragmentation crisis

Decades of disjointed data management now block progress. During my conversations with the technology executives, data fragmentation emerged as the primary roadblock. Artificial intelligence agents require absolute ground truth to function securely. Fragmented legacy systems trap enterprise intelligence in isolated silos. Organizations must build the universal context layer to orchestrate underlying data before turning autonomous agents loose on complex workflows. I see companies investing millions in large language models while completely ignoring data readiness.

The 2025 Gartner Hype Cycle for Artificial Intelligence reveals a stark reality regarding infrastructure. Analysts report 57 percent of organizations remain unprepared to support artificial intelligence due to inadequate data foundations. Deploying autonomous agents demands clean information. Relying on disconnected databases forces new autonomous systems to hallucinate at unprecedented speeds. Chief information officers must connect raw data directly to daily workflows. Providing a secure framework prevents compliance disasters and protects customer data. Establishing a solid foundation guarantees agents access to accurate historical records. Integrating scattered documents into a unified stream gives the autonomous agent the exact context needed to complete a task successfully.

Identity and the naked agent

Identity and access emerge as distinct operational hurdles. The leaders I spoke with expressed deep concern over exposing excessive data scope to autonomous agents working on their behalf. Deploying naked agents without rigid operational boundaries guarantees compliance disasters. An autonomous agent scanning a corporate network will inevitably find unsecured payroll files or confidential merger documents unless teams establish strict access limits.

The architecture of flow establishes strict connective tissue. Strict connective tissue ensures agents only receive the exact context required for the specific task. Relying on perimeter defense fails in an agentic world. We must adopt an identity-first zero-trust security posture to govern machine behavior. Providing the exact context at the exact moment limits the blast radius of a potential breach. Governance becomes an enabler of speed rather than a blocker. Security protocols must evolve to match the speed of algorithmic execution. Establishing proper guardrails allows innovation to flourish safely. Giving an agent partitioned access protects the enterprise from internal data leaks. The universal context layer authenticates every request dynamically based on the active workflow.

Budgeting for algorithmic operations

The financial reality of autonomous agents forces a complete restructuring of technology budgets. Multiple executives at our dinner table discussion asked how to budget for processing tokens across different departments. Processing tokens operates like a utility cost. Paying for generative AI resembles paying an electric bill. Companies want to deploy agents to speed up customer onboarding workflows and back-office operations. Expanding the agentic footprint increases token consumption exponentially.

Treating token consumption as a standard software licensing fee breaks financial models. I recommend finance teams redefine AI spending as operating expenses. The ongoing cost of computing requires constant monitoring and optimization. The architecture of flow provides visibility into system usage. Leaders can track exactly which departments consume the most resources. Transparency allows organizations to allocate funds dynamically based on operational outputs. Aligning computational spending directly with business outcomes creates a sustainable growth model.

The shift toward focused language models

Enterprise technology leaders recognize the inefficiency of using massive foundation models for every request. Routing simple database queries through giant language models wastes processing power and money. Chief information officers now pivot toward deploying smaller language models. Smaller language models trained on specific enterprise data execute narrow workflows more efficiently. Operating a focused model reduces computing costs drastically. A specialized model designed purely for reviewing human resources policies requires a fraction of the token budget.

Building an architecture of flow accommodates a multi-model ecosystem. The universal context layer routes the specific task to the most efficient model available. Connecting a massive foundation model to a series of smaller agentic tools creates a highly optimized digital workforce.

Navigating the interoperability mandate

From my early days at Gartner, covering messaging, communication and collaboration, intra- and interenterprise collaborative workflows have always been hampered by the thorny issue of interoperability. The multi-vendor reality in the enterprise remains undeniable. No single platform will dominate the enterprise artificial intelligence stack. Organizations currently deploy multiple systems across various departments to solve specific problems. Marketing teams use one large language model while software engineering teams use a completely different coding assistant. Interoperability demands an underlying architecture giving disparate agents and legacy databases a common language to flow together seamlessly.

Vendors historically push locked ecosystems to trap customer data. The future digital workplace requires open communication protocols. I continually advocate for frameworks allowing different models to communicate effortlessly. Industry standards like the Model Context Protocol demonstrate the growing demand for universal connectivity. The universal context layer acts as the universal translator. Translating raw processing power into immediate business context breaks down vendor lock-in. Agent-to-agent collaboration requires a shared technological foundation.

Elevating the resolution specialist

Buying isolated software applications perpetuates the static enterprise. Organizations must stop hoarding disconnected tools to fix fragmented workflows. My discussions with industry peers confirm a shift in perspective. The boardroom conversation moves past pure cost reduction. Enterprises deploy agentic workflows to accelerate high-return operations and client-facing experiences. Equipping the human workforce with agentic speed transforms standard employees into empowered resolution specialists.

Removing internal system friction directly orchestrates flawless external customer outcomes. A resolution specialist equipped with instant context executes better customer outcomes regardless of the market sector. The human worker spends zero time searching for information across disconnected applications. Technology leaders must start orchestrating the flow of context. Designing systems around secure boundaries and interoperability guarantees a more sustainable technological advantage.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • Data debt will cripple your AI strategy if left unaddressed
    As every CIO knows, AI success hinges on rock-solid data practices. But as CEOs and boards have emphasized digital transformations in recent years, funding for data management transformation efforts has been piecemeal at best. Now, with AI atop the CEO agenda, many CIOs find themselves in a bind, having to also overhaul data operations and address years, or decades, of accumulated data debt.    If your enterprise has data debt, AI will expose it. In fact, data debt can
     

Data debt will cripple your AI strategy if left unaddressed

23 de Abril de 2026, 07:01

As every CIO knows, AI success hinges on rock-solid data practices. But as CEOs and boards have emphasized digital transformations in recent years, funding for data management transformation efforts has been piecemeal at best. Now, with AI atop the CEO agenda, many CIOs find themselves in a bind, having to also overhaul data operations and address years, or decades, of accumulated data debt.   

If your enterprise has data debt, AI will expose it. In fact, data debt can lead to devastating failure rates with AI projects. For technology leaders, there’s no time like the present to pay down this debt with a comprehensive remediation strategy.

Data debt can arise for a variety of reasons, including old and outdated data management practices, shortcuts and compromises in infrastructure to meet near-term goals, poorly documented data sources, and inefficient data storage practices.

Research firm IDC in its 2026 CIO Agenda Predictions notes that by 2027, CIOs who delay the launch of data debt remediation will face 50% higher AI failure rates and rising costs, as model underperformance exposes issues from siloed, redundant, or poor-quality data.

“These findings reinforce that scaling AI requires disciplined investment in data foundations and integrated platforms, and that postponing these fundamentals risks turning AI ambition into sustained operational friction,” the report says.

“AI doesn’t create data problems; it exposes and accelerates them,” says Hrishikesh Pippadipally, CIO at accounting and advisory firm Wiss. “When organizations lack standardized processes, consistent definitions, and disciplined data governance, data naturally decays over time. That decay may not be visible in traditional reporting environments, but AI systems surface those inconsistencies quickly.”

Data debt is often the result of process drift — multiple teams using different definitions, inconsistent data entry standards, and siloed systems evolving independently, Pippadipally says.

“Without standardization and clear ownership, even modern systems degrade,” he says. “At our organization, we’ve learned that remediation isn’t just about cleaning historical data. It’s about instituting disciplined processes that prevent decay going forward: clear data ownership, standardized workflows, and governance embedded into daily operations.”

That said, not all AI initiatives are blocked by imperfect data, Pippadipally says. “There are smaller, well-bounded use cases, such as document summarization, drafting assistance, anomaly flagging, or reconciliation support, that can deliver value with human-in-the-loop verification,” he says. “These contained applications allow organizations to build AI maturity while foundational data improvements are under way.”

A mounting problem that requires a fast fix

A widespread problem, data debt at most organizations has grown organically over decades. In addition to increasing emphasis on data collection, companies have also accumulated data debt over years of mergers and acquisitions, as well as the deployment of new systems and services either enterprisewide or by departments.

“Systems were layered in response to immediate needs, acquisitions, regulatory requirements, or departmental preferences,” Pippadipally says. “Over time, inconsistent processes and standards lead to fragmented data environments.”

Moreover, data management inefficiencies have historically been addressed with manual work-arounds, Pippadipally says. “Teams reconciled reports manually,” he says. “Analysts compensated for inconsistent definitions. But AI reduces tolerance for ambiguity. When automated systems operate at scale, inconsistencies multiply rather than average out.”

It’s vital to address this now because AI initiatives are moving faster than process maturity. There is a clear sense of urgency.

“If organizations don’t institutionalize process discipline and standardization, they risk automating chaos instead of improving outcomes,” Pippadipally says. “The issue is not simply poor data; it is the absence of sustained governance to keep data reliable over time.”

For many enterprises, data debt can stay hidden while they are conducting traditional business intelligence or one-off analytics, says Juan Nassif, regional CTO at software development provider BairesDev.

“AI is different; it’s far less forgiving and it quickly exposes duplicates, inconsistent definitions, missing context, and ‘mystery fields’ with unclear lineage,” Nassif says. “When you scale beyond pilots, those issues show up as model underperformance, higher iteration cycles, and rising operational costs. It’s absolutely a concern for us, too, and we treat it as a prerequisite for scaling AI responsibly.”

If data is incomplete, inconsistent, or duplicated, the output from AI models becomes unreliable. “That can mean wrong answers, poor recommendations, or automations that break at the worst time,” Nassif says. “Teams end up spending most of their time wrangling data, reworking pipelines, and compensating for poor inputs with repeated tuning and exceptions.”

Some form of data debt is present in every sector, and in virtually all sizes of organizations.

“I witness the consequences of data debt in my daily work with schools in the UK every single week,” says Mark Friend, director of Classroom365, which consults educational institutions on technology and architecture and strategies.

“Most people assume that when they purchase the latest AI tool, all their problems will be solved no matter how messy the foundation underneath the hood,” Friend says. “My experience with this is that even the most expensive software is useless if the input is not reliable.” Data debt is “a fundamental risk to institutional stability,” he says.

Tips for effective data debt remediation

Enterprise-wide data debt remediation can be a significant, costly undertaking that involves multiple aspects of the business. It’s not just a technology issue, but a discipline issue as well. It requires cleaning up historical data as well as strengthening process governance to keep from repeating the mistakes or poor practices of the past.

Because of this, building and executing an effective strategy requires an organized and thorough approach. Here are some tips from experts.

Get senior management and board-level sponsorship

Any major IT initiative typically needs buy-in from senior business executives and even boards, particularly if it involves a large, global enterprise. Data debt remediation is no different. There is significant financial risk if remediation does not have the blessing and full backing of senior executives and board members.

Explaining the potential ramifications is a good way to bring attention to the need for remediation. “Make data debt visible and tie it to business risk,” Nassif says. “Data debt won’t get prioritized until it’s linked to AI failure rates, rising costs, and compliance exposure.”

Data debt is now a board-level risk, says Adrian Lawrence, founder of executive recruitment firm NED Capital, who advises boards and finance leaders on enterprise data governance, reporting integrity, and AI readiness.

“I see the pressure mounting with boards linking their AI investment to productivity and profitability objectives, but disjointed financial, sales, and operations data severely undermine model accuracy,” Lawrence says. “They lay bare the deficiencies [enterprise platform] upgrades and antiquated technology did not fully address.”

Success with debt remediation “demands executive sponsorship, disciplined data governance, and staged architecture cleanup treating data as an asset on the balance sheet,” Lawrence says.

Standardize core processes before scaling AI

To make the benefits of data debt remediation more long lasting, enterprises need to standardize their core business processes.

“Data quality reflects process quality,” Pippadipally says. “Leaders must align on standardized workflows, definitions, and system usage before expecting AI to operate consistently. Without process standardization, remediation efforts will be temporary.”

AI performs best in predictable environments, Pippadipally says, and standardization creates the stability AI requires.

BairesDev has embedded automated checks for data freshness, completeness, duplicates, and schema changes, so data quality issues get caught before they reach analytics or AI workflows, Nassif says.

Establish data ownership and ongoing governance

Another way to assure long-term benefits from a remediation effort is to have ongoing governance and accountability processes in place.

“Data remediation is not a one-time cleanup initiative,” Pippadipally says.

“Assigning clear ownership at the domain level, and establishing continuous monitoring, prevents data from degrading again.”

This is important, because governance ensures sustainability. “Without discipline, organizations reaccumulate data debt even after cleanup efforts,” Pippadipally says.

“We’ve been tightening dataset ownership and standardizing common business definitions, so teams aren’t training or prompting on conflicting ‘versions of truth,’” Nassif says. “We’ve been strengthening our cataloging and lineage practices, so teams can trace where data comes from, how it transforms, and who can use it — critical for both trust and governance.”

The biggest shift is mindset. “We don’t treat data remediation as a one-time cleanup,” Nassif says. “We treat it as ongoing engineering with guardrails that prevent debt from coming right back.”

Prioritize high-value, contained AI use cases

While large data modernization initiatives progress within an organization, CIOs can deploy AI in tightly scoped areas where outputs are verifiable and human oversight is straightforward, Pippadipally says.


“Examples include drafting support, controlled reconciliations, workflow triage, or anomaly flagging,” Pippadipally says. “This approach builds organizational confidence and demonstrates ROI without overexposing the enterprise to data risk.”


Clean up storage

When it comes to data storage practices, there’s no doubt that organizations need to clean up their act. Poor practices lead to poor data quality, which could impact AI-driven projects.

“Schools are often very good at storing data like [in] an attic where they just keep throwing boxes without looking inside,” Friend says. “Anyone who has lived through a technology refresh knows that messy storage is a massive financial burden.”

Decades of bad collection practices “have created a technical rot that we can no longer ignore,” Friend says. “You might think that your legacy storage is harmless, but it actually places a massive financial burden in the form of rising operational costs,” and can negatively impact AI initiatives.

  • ✇Security | CIO
  • How poor data foundations can undermine AI success
    The promise of AI is immense, but poor-quality data undermines every attempt to derive any value from it. Without the right inputs, AI produces unreliable, incomplete, and even misleading outcomes. For the average enterprise, data exists in many forms across many systems, says Brian Sathianathan, CTO at Iterate.ai, and integrating structured and unstructured data is harder than most AI pilots account for. “Structured data from operational systems is rarely as tidy as te
     

How poor data foundations can undermine AI success

17 de Abril de 2026, 07:00

The promise of AI is immense, but poor-quality data undermines every attempt to derive any value from it. Without the right inputs, AI produces unreliable, incomplete, and even misleading outcomes.

For the average enterprise, data exists in many forms across many systems, says Brian Sathianathan, CTO at Iterate.ai, and integrating structured and unstructured data is harder than most AI pilots account for. “Structured data from operational systems is rarely as tidy as teams are assuming, and unstructured data, like scanned documents and forms, requires a different preparation process before it can be matched and used effectively,” he says, adding this might explain why businesses hit a wall when trying to move beyond POC.

Organizations with impressive POCs typically succeed because they rely on curated datasets, manual workarounds, and tightly controlled environments, says Rhian Letts, head of group technology strategy at Investec. The real challenge lies in converting pilots into reliable, production-grade implementations. Scaling, she adds, requires resilient pipelines, consistent definitions, operational support, and integration into real workflows. It also raises the bar for governance.

“Many data governance frameworks were designed for human-paced consumption,” she says. “AI significantly increases both the speed and volume of data demand and introduces non-human consumers. Governance, therefore, needs to evolve to become more automated, real-time, and explicit about provenance and permissions.”

For Daniel Acton, CTO at technology firm ADG, too many organizations rush to do something with AI without properly analyzing what they actually want to do with it. “AI can be useful, but if you feed AI data that’s incomplete and inaccurate, or if it doesn’t have the data needed to teach the machine to do what you want it to do, the results will be underwhelming,” he says.

Another core issue is a lack of standardized, high-fidelity metadata. “The quality of metadata is the hardest challenge to overcome,” says Brett Pollak, executive director for workplace technology and infrastructure services at UC San Diego. “Metadata is the essential connective tissue that allows an AI agent to interpret a user’s prompt and map it correctly to the intersection of specific columns and rows. Most organizations have unique, institution-specific interpretations of data that are rarely documented properly or kept current.” This creates a translation gap where an agent might have access to the data but lacks the context to understand what a specific field represents in a business context.

Data, data everywhere

Just because obstacles exist, though, doesn’t mean progress needs to pause. “AI use should be aligned to current maturity,” says Letts. “Rather than treating imperfect data as a constraint, organizations can ask how AI might help improve and better connect the data they already have.” Sathianathan agrees, adding that within the new LLM world, even small amounts of accurate data can have significant value. “With traditional machine learning just a few years ago, you needed a lot of data to train models,” he says. “Today, since most LLMs come with highly pre-packaged knowledge, all you need is sufficient amounts of the right data to get it ready for your domain.”

For organizations that have already deployed structured data warehousing, the new barrier is the transition from human-centric storage to machine-actionable delivery, says Pollak. “Readiness now means ensuring your data is wrapped in specific metadata, exposed via modern protocols like MCP servers, and governed by a selective exposure strategy that ensures agents only act on what’s governed,” he says.

Shift your mindset around data

Today, many organizations want to quickly move from data disorder to being data-driven. But if that’s the end goal, CIOs and tech leaders need to be mindful of treating data like a first-class citizen within your organization. As part of this shift, data can no longer be seen as a by-product of business systems, but rather as a core output that should be managed with the same level of care as any other product or service. When this happens, business leaders can unlock insights and value they didn’t know existed.

Also, according to Letts, a use-case-led approach is critical. Trying to fix every dataset across an organization is neither practical nor necessary. Meaningful value can be unlocked even where data is imperfect by focusing on the right use cases. By prioritizing five to 10 high-value use cases and mapping the data required to deliver them in production, it’s easier to focus efforts. Foundations can then be strengthened to serve those priorities.

With AI, the threshold for what’s good enough has lowered for many use cases, particularly those focused on productivity and knowledge work, she adds. AI models can extract value from context and connect dots, even where data isn’t perfectly structured. But higher-stakes use cases demand higher quality and stronger controls. “The key is to be explicit about purpose, risk, and operational dependency,” she says. “Lower-risk use cases can move faster with well-described and well-governed context, while higher-risk applications require tighter thresholds.”

Prioritize ownership, governance, and security

All governance frameworks, policies, standards and procedures should be reviewed with AI in mind, adds Letts. Many were designed for human-paced consumption, whereas AI increases speed, scale, and integration across both structured and unstructured data. So validating ownership of critical data elements and establishing a shared business understanding of their meaning is essential to progress. Standardized definitions and metadata should also ensure questions like what it means and where did it come from can always be answered. “AI access must be secure by default,” she adds. “This means having least privilege, audit trails, handling of sensitive data, and strong controls around retrieval. It should always be demonstrable what a model can and cannot access.”

Additionally, organizations must be mindful of data privacy when using AI, too. “Agentic AI systems require a different level of data access than traditional enterprise apps,” says Sathianathan. “Data needs to be analyzed, not just queried, at scale. That’s a big change to privilege models, and IT and security leaders need to think carefully about where all that data is going and what access the AI system really requires.” The same is true, he adds, if the LLM processing that data is running within or outside an organization’s four walls, and such decisions should be considered before deployment, not after. 

Use AI to fill in the gaps

In areas where the business might be falling short, consider using AI to draft and update your organization-specific data definitions, suggests Pollak. “Prioritize establishing a rigorous human-in-the-loop process to ensure this connective tissue is accurate and current.” Additionally, it’s possible to use LLMs and smaller language models to clean up data in certain areas with restrictive prompts, adds Sathianathan. This way, you can process data efficiently and avoid wasting resources by pumping massive amounts of data into large cloud-based LLMs.

Being AI-ready isn’t a one-time milestone, says Letts. AI capabilities are evolving quickly, which means the threshold for readiness shifts over time. It’s essential to improve end-to-end lineage, build shared semantics and ontology so data is consistently understood, increase interoperability across platforms and domains, and tighten how AI systems access data so it remains secure, auditable, and fit for purpose. “Thresholds change as use cases evolve,” she says, “so data readiness must be treated as an ongoing discipline rather than a completed task.”

❌
❌