The Risk of Agentic AI Isn’t Error. It’s Scale.

Apr 8

When generative AI first emerged, I used to laugh at the caveats that said “AI can make mistakes”. It seemed to go against everything I’d learned as a programmer. You don’t just throw out software with a caveat that it could be wrong and hope for the best. You test it exhaustively. Unit testing, integration testing, system testing, user acceptance testing. You ensure that you have weeded out all the known errors. You make sure anything that is mission critical - like moving money - passes all tests with an A+. There is no room for error when it comes to handling money.

When the hype over AI didn’t die down, I learned more about it and quickly discovered why the old rules for validating software don’t apply anymore. Unlike traditional rules-based software, modern AI is probabilistic. It learns patterns from data and makes predictions based on likelihood, not certainty.

Neural network architecture is loosely inspired by biological neurons. In many ways, it functions closer to how our brains work than traditional software with fixed rules and consistent outputs. This is why we find ourselves treating tools like Claude and Gemini less like software and more like actual advisors. And it’s why we need the disclaimers.

Because while AI is very fast and seems very smart, like people, it can also be somewhat careless.

Validation of AI models focuses more on lowering error rates than eliminating mistakes. It’s impossible to test for every possible variation. What’s more, when an issue is found, there isn't a straightforward way to isolate and correct it without sacrificing that probabilistic nature that makes modern AI so very powerful.

When I started to use AI on a daily basis, I realized that it really is okay that it sometimes makes mistakes. As long as I take responsibility for validating any information I use to make an important decision, the value is far greater than the weakness. Modern AI is right much more than it is wrong, and the use cases for it are absolutely endless. Learning about prompt engineering and writing better requests can also help to reduce issues.

Of course, sometimes lessons have to be learned the hard way. I ordered the wrong part for my washing machine twice, because I relied on ChatGPT. This was something I could have - and should have - easily validated before ordering.

Making mistakes is manageable when AI is acting as an assistant.

It becomes much more complicated when AI starts acting on its own, and when we pivot to talking about fintechs and financial institutions using AI at scale.

AI agents are autonomous operators that can make decisions and take action without human intervention. It’s not uncommon right now to hear talk of completely replacing an operations team with AI agents. The push to automate and reduce staff to offset the cost of AI—and to keep up with its promise—is stronger than ever.

And while it makes sense for us to think of our AI assistants as trusted advisors, AI agents should not be thought of as trusted operators. It’s critical to keep humans in the loop in financial services processes where a bad decision could scale rapidly and cause disastrous consequences.

Consider something like loan underwriting.

If a human underwriter makes a mistake, it’s contained. It can be reviewed, corrected, and learned from. Even if a mistake is a common one that is not realized until months, or even years down the line, as was the case during the financial crisis of 2008, the scale of the problem is still limited by human speed.

But while human underwriters often take 1–3 weeks to complete a mortgage review, agentic AI companies advertise that they can generate similar decisions in about a minute. That is orders of magnitude faster than a human.

The number of “bad loans” that fueled the financial crisis was roughly 14 million. Imagine if that number had been exponentially higher. It likely wouldn’t have mattered that banks were deemed “too big to fail”. Government bailouts would not have been possible simply because of the size of the problem and how quickly it was able to scale.

The question isn’t whether agentic AI belongs in banking. It does.

The question is how quickly we allow systems that can make mistakes to operate at financial scale.

The path forward isn’t to avoid agentic AI, but we do need to recognize that the risk requires more than an “AI can make mistakes” disclaimer. AI agents should not replace all backend processes, as fast as possible. Instead, financial institutions and fintechs need to take a more cautious approach:

Start by mirroring existing workflows that are clearcut and low risk.
Define a well thought out plan that includes mitigation measures and human checkpoints all along the way. Some processes should always require a human in the loop.
Run systems in parallel for a meaningful period so that errors can be observed under real-world conditions.
Expand autonomy gradually, only when there is confidence in performance and governance is well established.

We’ve spent years building systems that assume precision. Now we’re introducing systems that operate on probability. That doesn’t mean we shouldn’t move forward. It just means we should move forward with a different set of expectations.

Because the risk isn’t that agentic AI will fail.

The risk is that it will fail at scale.