Autonomous Forecast Optimization — Proven on the M5 Benchmark

Executive Summary (For Busy Leaders)

Most enterprises today are caught between two unscalable approaches to forecasting:

  • Human-driven forecasting, where planners manage complexity through spreadsheets, dashboards, and manual overrides.

  • AI-driven forecasting, where data science teams continuously tune models to prevent accuracy from degrading.

Both approaches can work in limited contexts. Neither scales reliably across thousands of SKUs, volatile demand patterns, and ongoing business change.

We believe there is a third path: autonomous forecast optimization—where forecasts improve continuously without heroic human effort or constant model tuning, and where uncertainty is made usable for decision-making, not hidden.

To validate this belief, we ran our production forecasting engine (AURA, within VYAN) against the M5 Forecasting Competition, one of the most demanding public benchmarks in demand forecasting.

A few results matter for executives:

  • 60% item-level win rate versus the official M5 winner

  • Stronger performance in the long tail, where enterprises struggle most

  • When we won, we often won by a lot, not by marginal gains

  • Achieved without manual tuning or competition-specific optimization

These results don’t prove we “won” M5. They prove something more important:

A production-ready, autonomous forecasting system can deliver competition-grade performance where it actually matters for business decisions—at scale, under volatility, and without fragile human effort.

If forecasting in your organization still depends on spreadsheets, constant AI/ML tuning, or point forecasts that hide risk, this article is for you.

The Real Problem: Forecasting That Doesn’t Scale

Most enterprises today are stuck between two unscalable extremes.

On one end is spreadsheet- and dashboard-heavy forecasting:

  • Human effort becomes the control system.

  • Knowledge lives in people’s heads.

  • Scale is achieved by adding more planners, more overrides, and more time.

  • Consistency and reliability degrade as complexity grows.

This approach is familiar—and operationally brittle.

On the other end is AI/ML-driven forecasting:

  • Small teams of highly skilled data scientists build sophisticated models.

  • Pilots look impressive on selected SKUs.

  • Accuracy metrics improve—for a while.

  • Sustaining performance requires constant retraining, retuning, and babysitting.

This approach looks modern—and is equally fragile at scale.

Both approaches eventually hit a ceiling.

One collapses under operational complexity. The other collapses under model maintenance effort and cost—especially when volatility becomes the norm.

And both ultimately fail the same test:

Can the business actually run on these forecasts without constant manual intervention?

Why We Used M5 (and How We Used It)

The M5 Forecasting Competition is one of the most rigorous public benchmarks in the field, using real Walmart data across tens of thousands of products and a demanding evaluation metric.

But we did not approach M5 as a leaderboard contest.

We used it as a stress test.

Specifically, we wanted to answer one question:

Can an autonomous forecasting system designed for real enterprise use—rather than competition theatrics—hold up under transparent, public evaluation?

So we ran AURA against M5:

  • No manual tuning

  • No SKU-specific feature engineering

  • No hand-picked subsets

  • No post-hoc cleanup

Just the autonomous system, as it is designed to runs for customers.

What the M5 Results Actually Tell Us (and What They Don’t)

Let’s be precise.

We did not top the M5 leaderboard. We ranked #23 in terms of the published leaderboard.

That is not the headline.

The headline is where and how performance showed up:

  • 60% item-level win rate versus the M5 winner

  • Stronger median and mean absolute error performance

  • Significant wins in intermittent, sparse, and long-tail SKUs

  • Low systematic bias, a critical requirement for inventory decisions

In other words:

  • We did not optimize for a weighted metric that favors a handful of high-volume items.

  • We delivered consistent performance across the demand landscape, including the messy parts enterprises care about most.

This distinction matters.

Many M5 winners achieved extraordinary weighted scores by optimizing heavily for a narrow subset of items. That’s impressive engineering—but not how real businesses operate.

Enterprises fail in the long tail, not the top 20 SKUs.

Why “Best Accuracy” Is the Wrong Goal

Forecasting competitions are scored on symmetric accuracy metrics. That’s appropriate for benchmarking.

Businesses, however, do not run on symmetric loss functions. They run on asymmetric cost.

A negative bias that causes a stockout does not cost the same as a positive bias that creates excess inventory. Lost revenue, expediting, customer churn, and market-share erosion are often far more damaging than carrying additional inventory.

Yet many forecasting approaches still optimize as if all error is equal and bias should be driven to zero.

That is a mistake.

What traditional accuracy metrics fail to capture is the Cost of Forecast Error (CoFE):

  • Expediting and recovery costs

  • Lost sales and margin

  • Cash trapped in excess inventory

  • Downstream impact on service, trust, and competitiveness

Optimizing for near-zero bias may look statistically clean—but it often produces forecasts that are economically wrong.

This is why chasing the “best” point forecast so often backfires:

  • Fragile models

  • Volatile outputs

  • Excessive forecast churn

  • Loss of planner trust

A forecast that is slightly more accurate but constantly changing is often worse than a forecast that is marginally less accurate but stable and explainable.

This is why so many AI-generated forecasts quietly end up overridden in Excel.

From Forecast Accuracy to Decision Intelligence

The shift enterprises need to make is not incremental. It is conceptual.

Stop asking:

“What is the most accurate forecast?”

Start asking:

“What forecast can the business confidently act on?”

That requires focusing on economic impact of bad forecasts as well as moving beyond point forecasts.

Single numbers hide risk. They create false certainty. They force executives to bet without understanding the downside.

Decision-intelligent forecasting makes uncertainty explicit and usable.

This is where deterministic and stochastic views come together—not to overwhelm users, but to support better decisions:

  • inventory positioning

  • service commitments

  • capacity planning

  • financial risk exposure

When uncertainty is visible, tradeoffs become explicit—and conversations change.

Forecasting stops being a statistical exercise and becomes a decision system.

What Makes This a Third Path

Most forecasting approaches still depend on one of two things:

  • Human judgment to compensate for system weaknesses

  • Continuous model tuning to fight reality

Autonomous forecast optimization removes both dependencies.

Instead of humans managing models, the system continuously evaluates which forecast signals perform best under current conditions and at different horizons—optimizing for economic outcomes rather than metric aesthetics.

Not which algorithm is fashionable. Not which model won last quarter. Which forecast actually reduces business risk now.

This autonomy is what allows forecasting to scale:

  • across thousands of SKUs

  • across volatile demand patterns

  • without exploding planner effort or data science cost

Why This Matters to Executives

Forecasting is often framed as a technical problem.

In reality, it is a business performance problem.

When forecasts aren’t reliable:

  • Inventory rises “just in case”

  • Service still misses

  • Planners debate numbers instead of acting

  • Executives stop trusting planning outputs

Over time, this erodes confidence not just in forecasting—but in planning as a whole.

What the M5 results validated for us is not that autonomous systems can win competitions—but that they can deliver reliable, scalable performance where enterprises actually need it.

That combination is rare.

And it is what enterprises actually need.

The Offer (Revisited)

If forecasting in your organization:

  • Depends on unscalable human effort

  • Depends on constant AI model tuning

  • Struggles with intermittency and volatility

  • Produces point forecasts without usable risk context

We invite you to start with a complimentary forecasting health-check or working session, focused on:

  • How forecasting is actually done today

  • Where effort and trust break down

  • Where Cost of Forecast Error is silently accumulating

From there, we propose a 2–4 week rapid pilot on your own data, designed to:

  • Cover a large portion of your demand universe

  • Demonstrate 20%+ forecast error reduction versus current performance

  • Require no manual model tuning

  • Show how uncertainty can be used—not hidden—in decision-making

If you’re proud of your AI team, skeptical because your demand is “too volatile,” or ready to move beyond point forecasts toward true decision intelligence—this is the fastest way to find out what’s possible.

Forecasting doesn’t need to be debated anymore. It needs to be trusted.

If you’re ready to test that assumption on your own data, let’s talk.

Previous
Previous

A $3M Opportunity Hiding in Plain Sight

Next
Next

Why Transformation Value Feels So Close — Yet Always Six Feet Too High