Most AI-generated business analysis is wrong. Not because the models are bad, but because we ask them to research, analyze, and synthesize in a single step. No human analyst would do that. Splitting these tasks changes everything, and what comes out the other side is more useful than you might expect.
If you have ever pasted a company URL into ChatGPT and asked for a go-to-market assessment, you know the result. It sounds authoritative. The structure looks professional. And a significant portion of the claims have no basis in reality. This is not an edge case. It is the default behavior when you compress complex analytical work into one prompt.
I learned this the hard way. When I started building an AI-powered analysis system as a learning project, the first approach was the obvious one: one large prompt, one comprehensive output. The results were inconsistent, superficial, and hallucinated freely. What changed everything was understanding why.
The one-prompt fallacy
The mental model most people have: AI is smart, give it a complex question, get a complex answer. One prompt in, one analysis out.
This conflates three fundamentally different cognitive tasks: gathering evidence, interpreting that evidence, and synthesizing conclusions. A McKinsey consultant does not walk into a boardroom, Google the company for five minutes, and deliver strategic recommendations in the same breath. Yet that is exactly what a single AI prompt attempts.
The research backs this up. Wei et al. showed that when language models produce answers directly instead of working through intermediate steps, accuracy drops from 47% to 16% on standardized reasoning benchmarks (GSM8K, 2022). A related study on intermediate computation found accuracy improvements from 35% to 95% simply by recording intermediate steps. The act of separating thinking into phases nearly triples accuracy.
This is not a minor optimization. It is the difference between useful analysis and expensive noise.
Why business analysis makes it worse
Mathematical reasoning has clear right answers. Business analysis is harder because models must navigate ambiguity, contradictions, and incomplete information.
When you ask an AI to analyze a company's strategy in one shot, three things happen:
The model fills gaps with confidence. Where evidence is missing, the model does not say "I don't know." It generates plausible-sounding statements. A 2024 Deloitte survey found that 47% of enterprise AI users had made at least one major business decision based on hallucinated content. Not slightly wrong content. Fabricated content.
Strengths crowd out weaknesses. A single-pass analysis tends to confirm whatever hypothesis the model forms in its first few tokens. If the company's website emphasizes growth, the analysis emphasizes growth. Confirmation bias is not just a human problem. It is baked into how autoregressive models generate text.
Observations replace root causes. "The company has a long sales cycle" is an observation. "Enterprise-only positioning without a self-serve motion creates dependency on outbound, which does not scale at their current team size" is analysis. Single-prompt outputs almost always stay at the observation layer.
What changes when you split the work
The fix is not better prompts. It is separating the tasks entirely, the same way you would structure a consulting engagement.
| Phase | Job | Constraint |
|---|---|---|
| Evidence gathering | Collect data, no interpretation | Every claim needs a traceable source |
| Strength analysis | Find what is working, with receipts | Strengths without evidence get filtered |
| Stress testing | Challenge every finding | Adversarial by design, not cheerleading |
| Root-cause synthesis | Connect observations to structural dynamics | Only after all perspectives exist |
Each phase has a single job. The output of one feeds the next. No phase is allowed to do the work of another.
This is not theoretical. A 2025 Google Research study evaluating 180 agent configurations found that coordinated multi-phase systems improved performance by over 80% on parallelizable analysis tasks like financial reasoning compared to single-agent approaches (Google Research, 2025). For sequential tasks, the picture is more nuanced. The architecture has to match the problem.
When I built intentic's infrastructure, one of the core decisions was making evidence immutable. Search results get stored in the database before any LLM touches them. If the analysis later claims something, it must trace back to raw evidence. No evidence, no claim. That single constraint eliminated an entire category of hallucination.
One shot by design, but not one prompt
Here is where it gets interesting. intentic is deliberately a one-shot tool. You enter a URL and a growth goal. That is it. Minimal input, maximum output. No questionnaire, no onboarding flow, no 30-minute setup.
That sounds like it contradicts everything I just said about single-prompt analysis. It does not.
The user experience is one shot. The work behind it is not. Multiple specialized phases research the company, score strengths across six dimensions, stress-test findings, synthesize root causes, and produce a structured report. The user sees a 5-minute wait and a result. The system runs a full analytical process in the background.
But the core idea goes deeper than just a better report. What the pipeline actually does is build an external picture of a company's GTM reality from minimal input. How does the market see you? Where do competitors overlap? What gaps exist between positioning and evidence? The result is a structured, machine-readable view of your go-to-market situation.
The output is what I call Strategy as Code: a structured representation of your GTM situation that any AI tool can use as context. Paste it into ChatGPT, Claude, or whatever you work with, and your conversations about strategy are suddenly grounded in validated analysis instead of the model's generic knowledge.
This idea is part of a broader shift. Martin Brüggemann describes it well in his concept of Company as Code: the real bottleneck for AI agents is not the technology. It is undocumented decisions, scattered knowledge, and strategic context that lives in people's heads instead of in machine-readable files. His argument: treat your entire company knowledge like infrastructure code. Versioned, searchable, in one place. A recent academic paper on Codified Context reinforces this from the engineering side, showing how a structured knowledge base reduced errors and maintained consistency across 283 AI development sessions.
I have been doing exactly this for my own setup. All of intentic's business knowledge lives in a single knowledge base repository: brand identity, voice guidelines, ICP definitions, messaging frameworks, competitor positioning. Every AI agent I work with has access to the same structured context. When I ask an agent to write a blog post or analyze a competitor, it does not start from zero. It starts from documented decisions. Strategy as Code is the same principle applied to the output: turning a company's external GTM reality into structured context that any tool can use.
This flips the one-shot problem on its head. A single prompt asking "What should my GTM strategy be?" produces hallucinations. A single prompt asking "Given this validated strategic context, what should I prioritize?" produces something useful. The difference is not the model. It is whether the strategic context behind your question has been captured, challenged, and structured before the model tries to answer.
What comes next: closing the context gap
The current version works purely from external data. That is both its strength and its limitation. An outside-in perspective catches blind spots you cannot see from the inside. But it cannot know your budget, your team size, or which channels you have already tried.
The next step is layering internal context into the analysis. Not just "here is what the market says about you," but "here is what the market says, and here is what that means given your specific situation." The difference between a generic recommendation and a concrete outreach campaign you can run next Monday.
The longer-term idea is turning strategic intent itself into something machines can work with. Not just "here is what the market says" but capturing where you want to go, challenging that against reality, and structuring the result so every future interaction builds on it. Your strategic context evolves as you test recommendations and feed results back in. But that is the direction, not the current state.
Three signs your AI analysis is shallow
71% of companies now use generative AI in at least one business function (McKinsey, 2025). If you rely on AI for business analysis, here is how to tell whether the output is worth anything.
| Sign | What it means | What to do |
|---|---|---|
| No sources linked | Claims are generated, not discovered | Demand evidence tracing for every statement |
| Everything sounds positive | Model is pattern-matching marketing language | Look for tension between strengths and weaknesses |
| Observations without explanations | Describing, not analyzing | Ask "why" and "what would change this" |
No sources means no trust. If a claim does not link back to a specific data point, it is probably generated. Evidence tracing is the minimum bar for analysis you would act on.
The confidence trap
AI analysis will keep getting better. Models will get smarter, context windows will grow, and outputs will sound even more convincing. That is actually the danger.
OpenAI's own testing showed that their most advanced reasoning models hallucinate between 33% and 48% of the time on factual questions about real entities (OpenAI, 2025). Global losses from AI hallucinations reached $67.4 billion in 2024 (Forrester Research). And an MIT study found that models are 34% more likely to use confident language when generating incorrect information than when providing accurate answers (MIT, 2025).
The more polished the output, the harder it is to spot when it is wrong.
Retrieval-augmented generation (RAG) helps. It reduces hallucinations by up to 71% when properly implemented. But RAG addresses the evidence problem. It does not address the reasoning problem. You need both.
Three takeaways for builders
Separate the steps, always. Research, analysis, and synthesis are different cognitive tasks. Compressing them into one prompt is not efficiency. It is the reason the output is unreliable.
Demand evidence at every boundary. If a claim in your AI output cannot trace back to a specific source, treat it as hallucinated until proven otherwise. Schema validation between phases catches errors before they cascade.
Make the output reusable. The goal is not a better report. It is a structured strategic context that makes every future AI interaction about your business more grounded. One good analysis, properly structured, is worth more than a hundred ad-hoc prompts.
Frequently Asked Questions
Does splitting prompts actually improve AI analysis accuracy?
Yes. Research consistently shows multi-step reasoning improves output quality significantly. Wei et al. demonstrated that separating reasoning steps nearly tripled accuracy on standardized benchmarks compared to direct single-prompt answers. The same principle applies to business analysis.
Can a better single prompt solve this problem?
Better prompts help with formatting and focus, but they do not solve the structural issue. A single prompt still asks the model to research, analyze, and synthesize simultaneously. The problem is task conflation, not prompt quality.
What is Strategy as Code?
A structured, machine-readable representation of a company's GTM situation. Instead of asking an LLM to figure out your strategy from scratch, you give it validated context: your positioning, competitive landscape, strengths, and gaps, all derived from evidence. The model then reasons about your specific situation instead of generating generic advice.
What about RAG? Does retrieval solve hallucinations?
RAG reduces hallucinations by up to 71% when implemented well. But models still smooth gaps into fluent conclusions even with retrieved context. RAG addresses evidence availability. Multi-phase analysis addresses reasoning quality. Both are necessary.
When is a single prompt good enough?
For simple, well-defined questions with clear factual answers, single prompts work fine. The problem emerges with complex analytical tasks where evidence gathering, interpretation, and synthesis require different approaches. If the output would inform a real business decision, split the work. Or better: use a tool that already did the splitting for you and work from its structured output.
Pedram Shahlaifar is building intentic as a learning project: a complex AI system built by someone from the business side, using AI as the development partner. He writes about building GTM intelligence from the outside in. Connect on LinkedIn.
Sources
- ACM Computing Surveys — Wei et al., Chain-of-Thought Prompting, Multi-Step Reasoning Survey (GSM8K benchmark, 2022)
- Deloitte — Enterprise AI hallucination impact survey (2024)
- Google Research — Scaling Agent Systems: 180 configurations evaluated (2025)
- McKinsey & Co. — AI adoption across business functions report (March 2025)
- OpenAI — o3/o4-mini hallucination rates on PersonQA benchmark (2025)
- Forrester Research — Global AI hallucination losses, $67.4B (2024)
- MIT — Confident language in incorrect AI outputs study (January 2025)
- Martin Brüggemann / brgmn.de — Company as Code: Warum dein Unternehmen ein Repository braucht (February 2026)
- Vasilopoulos (arXiv) — Codified Context: Infrastructure for AI Agents in a Complex Codebase (February 2026)
Sources
- ACM Computing Surveys
Wei et al., Chain-of-Thought Prompting, Multi-Step Reasoning Survey (GSM8K benchmark, 2022)
- Deloitte
Enterprise AI hallucination impact survey (2024)
- Google Research
Scaling Agent Systems: 180 configurations evaluated (2025)
- McKinsey & Co.
AI adoption across business functions report (March 2025)
- OpenAI
o3/o4-mini hallucination rates on PersonQA benchmark (2025)
- Forrester Research
Global AI hallucination losses, $67.4B (2024)
- MIT
Confident language in incorrect AI outputs study (January 2025)
- Martin Brüggemann / brgmn.de
Company as Code: Warum dein Unternehmen ein Repository braucht (February 2026)
- Vasilopoulos (arXiv)
Codified Context: Infrastructure for AI Agents in a Complex Codebase (February 2026)