Thinking About AI Agents? Get Your Tools Right Rirst

MIT research reveals 95% of AI pilots fail to deliver value as enterprises skip crucial tool-building for flashier autonomous agents.

Alejandro Zarate Santovena

Shravankumar Chandrasekaran

November 4, 2025

Over the past year, the potential of AI agents has captured the imagination of many enterprises. Projects like AutoGPT and frameworks such as LangChain and CrewAI have enabled AI systems to take autonomous actions: reading and analyzing documents, calling application programming interfaces (APIs), and making decisions on their own. The promise of these models is enticing AI that doesn't just chat but acts.

Insurance and financial services leaders are envisioning the automation of complex tasks, such as handling claims or investigating possible fraud cases using these intelligent agents.

Yet, amid the hype, a new MIT Media Lab report found that, despite $30–40 billion in enterprise AI spending, 95% of organizations have not seen a measurable return on these investments. Only a handful (about 5%) of AI pilots are delivering tangible value, while the rest stall out with no effect on the bottom line. This creates a noticeable gap that separates the few companies gaining value from these agents from those stagnating in experimentation.

Therefore, we must inquire as to the cause of this divide. It's not model quality or lack of enthusiasm, as evidenced by the executives who are eager to adopt AI, resulting in 90% of these executives having seriously explored solutions with these models. The core of the issue is in the approach.

Early enterprise AI efforts struggled with their integration of AI into real value creation workflows, as these efforts were lacking in mechanisms for the AI machine to learn and adapt. According to a recent report published by MIT called the "State of AI in Business 2025," in large organizations, only 5% of custom AI tools ever make it from their early pilot version to production-ready models. The rest never move beyond proofs of concept, often because they remain unable to handle the complexities of real-world cases.

To make real progress in AI, organizations need to rethink their approach: Rather than jumping straight to autonomous agents, they should first master the fundamentals by developing robust AI-powered tools. Simply put, walk before you run.

Cases exist of individual employees already figuring this out. In their "State of AI in Business 2025" survey, MIT researchers found that while only 40% of companies officially bought an LLM subscription, employees at over 90% of firms were using personal AI tools (like ChatGPT or Claude) to speed up their work. These "shadow AI" users succeeded by leveraging focused tools (e.g. drafting emails, summarizing texts) that deliver quick wins.

Enterprises can learn from this pattern. By deploying a portfolio of narrow and reliable AI tools that augment specific tasks, organizations set the foundation for bigger agent-driven transformations. It's a phased approach to AI maturity, one that is proving far more effective than leaping straight into "autonomous everything" without significant groundwork.

Agents or Tools First?

To understand why developing specialized tools comes first, it helps to demystify and better understand how AI agent frameworks work. An agent works as a decision-making engine (often a large language model) that can perform a series of reasoning steps toward a goal. Crucially, these agents aren't all-powerful on their own, they rely on tools to take actions in the world outside the AI's own mind. A tool, on the other hand, is an external function or API that the agent can invoke. For example, an insurance-focused agent might have tools for retrieving a customer's policy data, running a fraud check, or sending an email. The agent's job is to figure out which tools to use, in what sequence, to accomplish a complex task.

In practical terms, agent frameworks implement a loop often described as "think, act, observe". The AI agent "thinks" (meaning that it generates a reasoning step in natural language), decides to invoke a tool with some inputs, then "observes" the tool's output and incorporates it into the next reasoning cycle. This continues until the agent arrives at a final answer or action.

An agent is like a sophisticated orchestrator or problem-solver, and tools are the discrete capabilities it can leverage (query a database, call an API, run a calculation, etc.). The agent chooses and sequences tools to achieve an objective. This means that the agent is only as powerful as the tools given allow it to be. If the tools are weak, unreliable, or non-existent for a needed function, the agent will inevitably stumble. This is why building a robust toolset is a necessary and important step in the foundation of an agent.

For instance, imagine asking an AI agent to "Process this new claim and flag any anomalies." The agent might first use a document parsing tool to extract key fields, then call a knowledge base search tool to compare details with past claims, then use a calculation tool to compute risk scores, and so on – reasoning at each step about what to do next. The framework (LangChain, CrewAI, etc.) provides the orchestration that ties these steps together and manages intermediate states (so that, say, the result of step 1 can be fed into step 2). It also manages the agent's memory, which in this context means the information the agent retains as it moves through the sequence. Memory could include the conversation history or results from prior tool calls, allowing the agent to maintain context over multiple turns.

Not every task calls for an agent. Some are better handled by sharp, single-purpose tools. An AI tool might be something as simple as a document summarizer, a language translation function, or an email classification model – narrow in scope but high in accuracy.

Tools can be deployed directly into workflows (for example, auto summarizing each incoming claim report). In contrast, an AI agent tackles open-ended tasks that involve multiple decisions or tool uses (for example, handling an entire claims adjustment process). Trying to skip straight to agents without a base of proven tools is a recipe for frustration.

For enterprise leaders, the takeaway should be clear: Tools are the building blocks of any agentic system. If you want an AI agent to reliably execute a multi-step process, you must first invest in those individual steps as standalone competencies. Think of it like training an employee – you wouldn't expect a new hire to handle an entire claims process on day one without first ensuring they know how to do each component task (review documents, check databases, draft responses). Equipping an AI agent is similar; you need to furnish it with dependable mini-skills and data access (the tools) and then give it the autonomy to string them together.

Just as you wouldn't deploy a piece of software to production without testing each module, you shouldn't deploy an autonomous AI agent without first proving out the individual tool actions, data connections, and guardrails. Tools are an important foundation in the building of AI agents, and any individual or company that wishes to use AI agents in their workflow or systems should prioritize developing strong tools.

Alejandro Zarate Santovena

Alejandro Zarate Santovena is a managing director at Marsh-USA.

He has more than 25 years of global experience in technology, consulting, and marketing in Europe, Latin America, and the U.S. He focuses on using machine learning and data science to drive business intelligence and innovative product development globally, leading teams in New York, London, and Dublin.

Santovena received an M.S. in management of technology - machine learning, AI, and predictive modeling from the Massachusetts Institute of Technology, an M.B.A. from Carnegie Mellon University, and a B.S. in chemical engineering from the Universidad Iberoamericana in Mexico City.

Shravankumar Chandrasekaran

Shravankumar Chandrasekaran is global product manager at Marsh McLennan.

He has over 13 years of experience across product management, software development, and insurance. He focuses on leveraging advanced analytics and AI to drive benchmarking solutions globally.

He received an M.S. in operations research from Columbia University and a B.Tech in electronics and communications engineering from Amrita Vishwa Vidyapeetham in Bangalore, India.