Welcome to the “find out” stage of AI

Welcome to the “find out” stage of AI - Stack Overflow

Stack Overflow Business Stack Internal: the knowledge intelligence layer that powers enterprise AI.Stack Data Licensing: decades of verified, technical knowledge to boost AI performance and trust.Stack Ads: engage developers where it matters — in their daily workflow.When I went to the first HumanX conference in January 2025, agents were vaguely-defined frontier tech. It was the first time I heard the letters MCP. The big conversations were around inference, hallucinations, and retrieval augmented generation. The tech felt new—Tomasz Tunguz of Theory Ventures called it the “bottom of the first inning.” Every company was running AI experiments all the time.Since then, companies have played a few innings in the AI game. As Anish Agarwal, CEO at Traversal told me, “More companies have gone through a renewal cycle with customers. They've understood what it takes to actually win a contract.” LLMs no longer run raw call and response games in company chatbots. We’ve attached tooling, implemented automation, attached evals, and formalized these as agents, usually with the word “claw” in the name. They—and their customers—need to justify the ballooning token spend with real results.I started saying that we’re in the “find out” stage of AI, meaning we’re past the experimental phase of AI and entering into one where they need to work and provide real value. HumanX validated that notion, with almost everybody who gave comments referring to “an inflection point”, “a second phase of AI”, and “the conversation shifting.” Here’s a few places that the conversation is shifting to.The dream machine grows upIn the early days of AI, much of the chatter I heard was about all the cool new things AI could do. There was a lot of talk about emergent behavior—things like getting an AI to guess a movie based on emojis or draw a unicorn. It was a source of wonder and surprise, cool tech that wowed the folks exploring it.The promise of AI grew and large enterprises started figuring out how to implement AI features into their software and business processes. Enterprises, traditionally, are where wonder and surprise lead to lost customers and lawsuits. Sectors like healthcare, law, and energy have real consequences for errors. “In these environments, mistakes aren’t just technical—they can be fatal,” said Radha Basu, CEO and Founder of iMerit. “That changes the mindset entirely. It forces a more careful, purposeful approach to how we build and deploy these systems.”For a couple of years, AI has been a story of better and better models trained on more and more data. But as Ravindra Mistri, founding operator at Better Auth, said, “The next phase of AI adoption won’t be limited by model performance—it will be limited by trust.” As HumanX CEO Stefan Weitz said in his opening keynote, “Without trust, all we're doing is building a high-tech house of cards and hoping no one coughs too hard.”To get that trust from your AI, you need it to be reliable. “Model intelligence has been advancing rapidly, but reliability hasn’t kept up,” said Dan Klein, co-founder and CTO at Scaled Cognition. “You need to hit a high bar on reliability to deploy these systems confidently. You can’t ship a system that’s making up policies as it goes or lying to you about your account balance.”A lot of this shift can be attributed to how AI is being used now. With chatbots, you could call BS on their output. In the agentic paradigm, those ponies run until the race is finished. They autonomously break a problem down into multiple steps, call a bunch of tools to achieve an outcome, and hopefully do all this without deleting your database or inventing information out of whole stochastic cloth. As Basu said, “AI is becoming less about static answers and more about taking the right action in complex, ambiguous environments. That shift demands accountability, judgment, and a culture that values questioning the model.”As for how folks are thinking about solving the trust and reliability issues, conversations fell into a few different buckets:Is it true? - The hallucination problem is still prevalent, despite everyone running RAGs left, right, and center. New solutions for ensuring agents have true information include better context, agentic memory, and other inference-time data access solutions.Should the agent do this? - A number of people and organizations looked at trust from an identity and user access paradigm. That included tying agentic actions to a human user, just-in-time and ephemeral auth controls, and zero-trust permissioning systems. The context problem creates a new issue in this domain—with all that data an agent has, who’s to say they are leaking it?Can I prove and audit it? - Trust but verify at scale. Lots of folks are trying to build agentic trust with visibility and data. Observability companies were all around, as were AI SRE companies. But this is also a conversation about activity trails, automated and human-in-the-loop evals, and traceability.AI’m a business, manAt a conference like this, obviously there will be a lot of people trying to sell you their products. Most of them are in the AI technology space in some way, and are both providers and consumers of AI. I saw plenty of returning logos on the floor, and as the opening quote about renewal cycles indicates, people are starting to eye the tech with a business lens. That is, how do I make more money and spend less money? “Every single person I talked to was thinking about how to change their monetization model, how to monetize AI products,” said Cosmo Wolf, CTO of Metronome. “No one's figured it out yet.”I heard from plenty of folks that token spend is the new cloud compute bill. Corey Quinn’s genie joke needs a fifth rule: you can’t spend it on AI tokens. Grizzled DevOps engineers used to tell war stories about blowing six figures in a weekend over misconfigured SQS, but recently plenty of folks have started seeing their token spend ramping up as usage explodes. This comes as per-token pricing has dropped about 200x in under three years, open-source and small models perform very well, and competition is fierce.So what’s the rub? There are a few things at play. The trust and reliability issues mean people are stuffing more and more into context windows. Agents and agentic skills rely a lot on a clever system prompt and context window (plus tools and other harness functions). While input tokens are generally cheaper than output tokens, these can add up—someone I talked to mentioned $1 in context per agent per session. For large enterprises with lots of AI-assisted engineers or customer-facing agents, this cost can add up quickly. Context windows are limited, so if you need to change something, that’s a pile more tokens to send.The agentic paradigm also burns more tokens than the old way of prompt and response chatbots. They break a problem down into steps, call tools and receive responses from them, and run evals and loops. Some agents run tasks overnight, chewing up delicious tokens on their complex (and often opaque) thought processes.Where costs start multiplying more and more is when you have multiple agents working together, so-called agent swarms. Miranda Nash, Group VP at Oracle AI, talked a lot about multiple agents working alongside people in her Future of Work presentation. This future is already here in some places (not just Gastown) and spending tokens like a kid in Chuck E. Cheese.While some folks are saying that coding agents have made code essentially free (it hasn’t, ask around), reviewing and running code has grown decidedly more expensive. There’s increased load on code review, security, and running it in production. This seems like a place where organizations are looking to tooling to help. “There’s a growing gap between how fast teams can generate and ship code and how well they can operate it once it’s in production,” said Spiros Xanthos, founder and CEO at Resolve AI. “Should they build, buy, or wait and see? These aren’t new questions, but AI is amplifying them to a point where it’s harder to wait and costlier to make a wrong decision.”As for monetization and profitability, nobody’s quite got an answer there. Even the big dogs of Anthropic and OpenAI don’t expect to be profitable until 2028 and 2030, respectively.Anxiety among the inference classBesides the concerns about implementing and monetizing AI applications, there was a fair bit of chatter about the social effects of AI. Many folks felt like, despite or because of how powerful the tech was, the world outside of the tech industry could face some harsh changes. A lot of this is speculation based on headlines, mind you, because AI in its current form has only been around for a little over three years, and the effects have yet to shake out and be studied. Heck, we’re still coming to grips with how social media affects us, and that’s been over 20 years.Most of this talk was off-the-record, casually dropped during happy hours. But Dr. Danielle Schlosser, co-founder and chief business officer, at mpathic, went into greater detail:“The technical capabilities are accelerating quickly, but our frameworks for evaluating impact—especially on people—are still catching up. Much of today’s AI is optimized around human preference signals—what people like in the moment—rather than what actually supports long-term well-being. Optimizing for engagement or validation can lead to unintended consequences, like reinforcing bias or reducing critical thinking.”These concerns were fresh in my mind after researching some of the psychologically distorting and disempowering aspects of AI. I will admit I brought it up plenty in conversation, mostly to see whether people in this industry were looking at this issue. Fortunately, I wasn’t the first person to mention this to them; most people had heard about this and were aware. And most were hopeful that being forewarned made us forearmed.There was some concern for the economic effects of AI, but less so. For

Welcome to the “find out” stage of AI

Welcome to the “find out” stage of AI - Stack Overflow

Related Articles

The Singleton Labyrinth

Build your first MCP server in TypeScript: the 2026 setup that takes 30 minutes.

Check Wallet Balances Across 4 Chains with Zero Dependencies — chain_balance.py

Comments