A CFO at a boardroom table with a printed P&L and a red pen, an AI dashboard glowing on the wall behind

mins to read

•

The ROI Question Nobody Wants: AI ROI Measurement, Decoded

Lara Demir

Partner, Strategy & Operations

Published

April 30, 2026

Share this on

Getting Started

The 95% number is real, and it is not about the technology

Productivity theater, decoded

The CFO test that AI ROI measurement has to pass

Designing the next pilot for board defensibility

Strategy

AI ROI

CFO strategy

Every CFO we meet this quarter has the same complaint. The AI dashboard is busy. The line item keeps growing. And not one number on the deck would survive a serious board review. AI ROI measurement is now the most expensive open question in the enterprise.

At ATCON, we sit with finance and strategy leaders who are tired of guessing. AI spend is climbing 86% to 91% year over year. Only 14% of finance chiefs in the recent RGP survey see clear, measurable impact. Deloitte EMEA CFO Signals echoes the same gap on this side of the Atlantic. The problem is not that AI does not work. Pilots are designed to feel productive, not to be measurable. The CFO is paying for the difference.

The 95% number is real, and it is not about the technology

The headline came from the MIT NANDA State of AI in Business 2025 report. Roughly 95% of integrated GenAI pilots produce no measurable P&L impact. The number has critics, but the signal lines up with everyone else's data.

McKinsey's State of AI work points the same way. Over 80% of respondents see no tangible EBIT impact from generative AI. Only 17% attribute more than 5% of EBIT to it. BCG's Build for the Future research calls this a widening value gap. A small leader cohort captures the value. Everyone else funds activity.

The failure mode is structural, not technological. Pilots launch without baselines. They get funded without P&L hooks. They run on legacy workflows nobody redesigned. McKinsey's strongest finding is that workflow redesign predicts EBIT impact more than model choice or vendor. Look at SAP rollouts inside Siemens, Allianz, and BNP Paribas. The teams seeing real margin movement rebuilt the process first and added the model second.

If your AI dashboard cannot survive a CFO with a printed P&L and a sharp pencil, you do not have an AI program. You have a software subscription with good marketing.

Productivity theater, decoded

Walk into most AI steering committees and you see the same slide. Adoption climbing. License use climbing. "Hours saved" climbing. The lift in unit economics missing. Workers report saving about 2.2 hours per week from AI tools, and roughly 37% of that time is eaten by rework. The St. Louis Fed pegs the real productivity gain at 1.3%. Solow's paradox is back, dressed in a chatbot.

The metrics on the deck are not the metrics that move the P&L. They are activity dressed as outcome. Four common offenders show up everywhere.

Adoption rate. It tracks who logged in, not who produced a different business outcome. A 90% adoption rate on a tool nobody uses well is a worse signal than 30% on a tool that rewrote a workflow.
Hours saved. Almost always self-reported. Almost never reconciled against headcount or throughput. If saved hours did not change a roster, a service level, or a revenue number, they did not happen.
Token consumption. A usage metric vendors love because it scales with their invoice. It tells you nothing about value and quite a lot about cost.
Employee satisfaction with AI tooling. Useful for change management. Useless as a board KPI. Sentiment is not margin.

Underneath all of this sits the shadow AI gap. Most real AI use happens outside central IT. Consumer tools quietly outperform sanctioned ones. Your most productive AI users are probably not in the rollout metrics at all. The EU AI Act adds a second twist. Governance and audit costs land on the same line as the pilot, and most ROI models pretend those costs do not exist.

A simple table comparing vanity AI metrics on the left to P&L-linked KPIs on the right, with arrows mapping each pair

The translation problem. Every vanity metric has a CFO-grade equivalent. Most programs never make the swap.

The CFO test that AI ROI measurement has to pass

The McKinsey leader cohort shares one trait. Every funded AI initiative connects to a tracked, redesigned workflow with a named P&L line. Not a deck. A line. The KPIs that survive a board review fall into a small set.

Cost-to-serve per transaction is the cleanest. Klarna's customer-service automation reportedly drove cost per inquiry from about $0.32 to $0.19 before the company rebalanced toward human agents. Cycle-time-to-revenue captures the same idea on the top line. Quote-issued to cash-collected, in days. Gross margin on the affected workflow is what a CFO will defend in a board meeting. Revenue per FTE tells you whether output rose or you simply moved work around. Quality-adjusted output catches the Klarna problem before it becomes a press release.

That last one is where most enterprise AI ROI measurement breaks. Klarna walked back its automated customer service after the CEO admitted the company "went too far." Cost savings were measured. The CSAT regression was not. An unmeasured quality regression is an unbooked cost that shows up later as churn or refunds. ING and Allianz tied their AI rollouts to quality-adjusted KPIs for exactly this reason.

Designing the next pilot for board defensibility

Productivity theater is comfortable because it is cheap to produce. Real ROI is uncomfortable because it forces a CFO and a workflow owner to agree on one number, in writing, before the pilot ships.

Set the baseline before the pilot, not after

If you cannot show "before," you cannot claim "after." Baselines are cheap in week zero and impossible to reconstruct in week twelve. A pilot without a baseline is a demo with a budget. We have watched programs try to build baselines backwards from a vendor invoice. It never holds.

Bind every pilot to one P&L line, with kill criteria written in

One cost line, one revenue line, or one working-capital line. No abstract "productivity uplift." If the sponsor cannot name the line in one sentence, the pilot is not ready to be funded. Pair it with kill criteria agreed on day one. Gartner's projection that 40% of agentic AI projects will be cancelled by 2027 lands differently when exit terms are in the funding memo. The discipline is not pessimism. It is how a CFO signs the check as an investment, not a hope.

If your next AI portfolio review is close and the numbers do not feel defensible, that is the conversation worth having now, not after the board meeting. At ATCON, we work with finance and strategy leaders to rebuild AI pilots around KPIs that survive a P&L review.

Let's BUILD Your Digital Future

Do you have

any questions?

Coffee’s on us, let’s talk

Address

Brussels, Belgium

Avenue Louise 523, 1050 Brussels, Belgium

Contact Number

+32 470 20 45 12

connect@atconglobal.com