DEV Community

Cooper D
Cooper D

Posted on

Why AI Analytics Tools Are Solving the Wrong Problem

TLDR: The AI analytics industry is obsessed with building better query engines - using LLMs to turn natural language into SQL. But that's only 20% of the real challenge. The other 80%? Capturing and maintaining the massive amount of business context that exists in people's heads, undocumented meetings, and scattered wikis across five layers of your organization. Until we solve this unglamorous documentation problem, AI-powered analytics will remain impressive demos that struggle in production.


Every "chat with your data" demo looks the same. Someone types "show me sales by region last month" into a sleek interface. An LLM generates a SQL query. Results appear. Everyone nods approvingly.

Then you try to deploy it at your company.

Suddenly, questions that should be simple become impossible. "What's our revenue from premium customers?" sounds straightforward until you realize three different teams define "premium" differently, and "revenue" means something else to finance than it does to operations.

The demo worked because the demo had clean, simple data. Your reality is messier.

What Everyone Is Building

Open any AI analytics product and you'll find roughly the same architecture under the hood.

They connect to your database and pull the schema - tables, columns, data types, foreign keys. They use retrieval augmented generation (RAG) to find relevant metadata when you ask a question. An LLM takes that context and generates SQL. Execute the query, format the results, maybe generate some insights.

For simple questions against well-designed databases, this works. "Total orders last week" or "top 10 customers by spend" - no problem.

This is the 20% of analytics that everyone's racing to perfect.

The 80% That Nobody Talks About

The real work begins when you step outside the database schema and into the messy world of business meaning.

Your challenges live in five distinct layers, each one adding complexity that no amount of clever prompting can solve.

Hidden Work

Layer 1: Business Definitions That Live Nowhere

Your database has a customers table with 2 million rows. Great. But which ones are "premium customers"?

Is it customers who spend over $10K annually? Or is it the VIP tier from your loyalty program? Or maybe it's anyone with a dedicated account manager? Different teams give different answers.

What about "high volume stores"? Is that top 10% by revenue? By transaction count? By square footage? The answer exists somewhere - maybe in a strategy deck from 18 months ago, maybe in someone's head who's been here for five years.

"Peak hours" sounds objective until you learn that retail defines it as 10am-2pm and 5pm-8pm, but the warehouse team uses 7am-11am and 3pm-7pm.

None of this lives in your database. It's business knowledge, and it needs to be documented in a structured way before any LLM can use it.

Layer 2: Metrics and Their Hidden Complexity

Ask five people what "revenue" means and you might get five different answers.

Does revenue include pending orders? What about returns? Is it before or after discounts? Do you count the shipping fee? What about tax? Is it when the order is placed, when it ships, or when payment clears?

Every one of these questions has an answer somewhere in your organization. Probably in multiple places with multiple versions, some contradictory.

Your analytics team might calculate "Monthly Recurring Revenue" one way. Finance calculates it differently for the board. The sales dashboard shows a third number because it excludes trials.

Each metric needs a single source of truth - not just a definition in plain English, but the actual business logic. The conditions, the exclusions, the edge cases. All of it documented and maintained.

Layer 3: Domain-Specific Business Rules

Now add in the decisions each business unit makes to solve their specific problems.

Marketing runs a campaign and excludes customers who purchased in the last 30 days. Operations has special handling for orders over $5,000. Customer service treats warranty claims differently than regular support tickets. Finance has revenue recognition rules that vary by product type.

These rules get implemented to solve today's business problem. The people making these decisions aren't thinking about downstream analytics. They're not documenting for future AI systems. They're shipping features and closing deals.

But every one of these decisions affects what the data means and how it should be interpreted.

Layer 4: Technical Implementation Decisions

The business requirements land with engineering. Now developers make their own choices.

They build microservices, each owning its own data. They choose data structures that make sense for their use case. They optimize for performance, for their API contracts, for their deployment constraints.

Is a customer ID a string or an integer? Are addresses stored as structured fields or freeform text? Is the timestamp in UTC or local time? Different services make different choices.

These aren't wrong choices - they're pragmatic engineering decisions. But data becomes a byproduct of operations, not a first-class concern.

And again, most of these decisions aren't documented anywhere that a data system can access.

Layer 5: The Data Platform Transformation Layer

Finally, the data team pulls everything together. They extract data from dozens of sources, cleanse it, standardize it, transform it.

They create dim_customer by joining six different customer tables. They build fact_orders by combining order data with returns, refunds, and adjustments. They calculate derived metrics like customer_lifetime_value using complex logic.

Every table, every transformation, every derived field represents decisions. What business logic is embedded in this ETL job? Why was this data transformed this way? What assumptions were made? What edge cases are handled?

Without documentation, this knowledge lives in the data engineer's head or buried in hundreds of lines of SQL code.

5 Layers deep

The Documentation Debt Crisis

Add it up:

  • Business definitions for every domain term
  • Precise logic for every metric and KPI
  • Rules and exclusions from every business unit
  • Technical decisions from every engineering team
  • Transformation logic from every data pipeline

This is the context an LLM needs to generate correct queries for real business questions.

And almost none of it is documented in a way machines can understand.

This documentation problem is the 80%.

Why This Is So Hard

Documentation is manual work. Unglamorous, time-consuming, never-ending manual work.

Business definitions change. A "premium customer" today might be redefined next quarter. Metrics evolve as the business grows. Rules get updated when regulations change. The data platform refactors tables and schemas.

Static documentation becomes stale the moment it's written. You need a living system that evolves with the business.

But who owns this? The business teams are focused on business problems. Engineering is shipping features. The data team is drowning in pipeline maintenance. Nobody has "document everything for future AI" in their job description.

Even worse, this problem compounds. Every new business rule, every new data source, every new transformation adds to the documentation debt. The gap between what your AI needs to know and what's actually documented grows wider every sprint.

The Real Opportunity

RAG retrieval algorithms, query optimization, result formatting - these are interesting technical problems. They make for great blog posts and conference talks.

But they're the easy 20%.

The companies that will win in AI-powered analytics aren't the ones with the smartest query generation. They're the ones who solve documentation.

What if business context was captured automatically as a byproduct of work instead of as extra work? What if implementing a business rule automatically created the metadata an AI needs? What if refactoring a data pipeline updated the documentation simultaneously?

What if documentation was a living system that breathed with your business instead of a static artifact that goes stale?

That's the hard problem. That's the valuable problem.

Where to Start

You can't solve all five layers overnight. But you can start making documentation a first-class concern:

For business teams: Maintain a single source of truth for business definitions. Not in slides or wikis, but in a structured system that can be queried programmatically.

For product teams: When you implement business logic, document it where the data team can find it. Make "how will this be analyzed?" part of the design conversation.

For engineering teams: Treat data as a first-class output, not a byproduct. Document your implementation decisions. Make your data contracts explicit.

For data teams: Don't just transform data - document why. Capture the business logic embedded in your pipelines. Make your schemas self-describing.

For everyone: Stop treating documentation as a chore to avoid. It's the foundation that makes everything else possible.

The Bottom Line

Better LLMs won't solve AI analytics. Better RAG won't solve it. Better query optimization won't solve it.

The bottleneck isn't technical capability. It's business context.

You can't assemble an answer from context you don't have. You can't generate the right query if you don't know what "premium customers" means in your organization.

The 20% is getting commoditized fast. Every AI analytics vendor has roughly the same query generation capability.

The real differentiator is the 80% - capturing, maintaining, and surfacing the business knowledge that turns database schemas into business intelligence.

That's where the hard work is. That's where the value is.

And that's the problem that's still unsolved.


Key Takeaways:

  • AI analytics tools focus on query generation (20%) while the real challenge is business context documentation (80%)
  • Business knowledge exists across five layers: business definitions, metrics/KPIs, domain rules, technical implementation, and data transformations
  • Each layer contains critical context that isn't captured in database schemas or code
  • Static documentation fails because business context changes constantly
  • The solution requires making documentation a byproduct of regular work, not extra work
  • Whoever builds systems that capture and maintain this context automatically will win the AI analytics race

Top comments (1)

Collapse
 
braingrid-ai-inc profile image
BrainGrid AI, Inc

Nice!