Overfitting Risk: When a Model Learns the Past Too Well

Written by Blackworks Capital Team | Mar 20, 2026 4:59:23 AM

A student memorizes every practice test for a standardized exam. They can recite 500 answers perfectly. Then they sit the actual test and score mediocrely. They memorized the questions, not the principles.

That’s overfitting. And in systematic investing, it’s one of the most common ways well-intentioned programs blow up.

The promise of rules-based investing is that patterns tested against history will persist in the future. The danger is that you can engineer rules to work perfectly against history while being worthless going forward. A model so finely tuned to past data that it’s learned the noise rather than the signal.

How It Happens

Building a trading model starts simply enough. Identify an economic relationship, code it into a rule, backtest it. If the results look promising, the instinct is to tweak. Adjust the entry parameter. Smooth the exit signal. Add a filter. Test again.

Each tweak improves the historical results. Smoother returns, fewer drawdowns, higher Sharpe ratios. It feels like progress. But what’s actually happening is the model is fitting itself to the specific sequence of past market events—events that are unlikely to repeat in that exact configuration.

This is the multiple comparisons problem. Test one strategy against 30 years of data, and you get an honest answer. Test 100 variations against the same data, and by pure probability, some will show strong results by random chance. The more you look, the more you find—and the less likely it is to be real.

A researcher with 20 potential variables runs 1,000 regression models. One combination shows a stunning backtest: 25% annual returns, 8% volatility, Sharpe of 3.1. Six months into live trading, it’s down 18%. The relationships were statistical artifacts, not economic truths.

What It Looks Like When It Fails

Overfitted models enter live trading looking great. The backtest is flawless. Capital gets deployed.

Then reality diverges from history. Usually the first sign is gradual decay—the 2% monthly alpha becomes 0.5%, the “controlled” 8% drawdown becomes 15%. Sometimes it’s sudden. A model trained on pre-2008 data works for years, then falls apart the moment a volatility regime arrives that wasn’t in the training set.

This dynamic has destroyed systematic funds. Launch with a 10-year backtest showing 20% annual returns. Year 2, returns are 8%. Year 3, a 25% drawdown in a flat market. Investors redeem. Year 4, the fund closes. The tragedy is that the overfitting was invisible to the person building the model. Strong historical results feel like evidence of signal. They’re not always.

How We Defend Against It

There’s no single fix. It takes layered discipline.

Out-of-sample testing is non-negotiable. You cannot validate a model using the data you built it on. Train on years 1-10, test mechanically on years 10-15 without any further tweaking. If training performance was 18% and testing performance was 2%, the model is overfit. Reject it. The gap between in-sample and out-of-sample performance is your overfitting meter. At Blackworks Capital, we’ve rejected dozens of models that looked promising in training but failed this test.

Economic intuition filters out spurious signals. Before deploying anything, we ask: why should this variable predict returns? Is there a causal mechanism rooted in market structure or investor behavior? A signal based on fundamental economic logic makes sense, a signal based on statistical noise risks being just that, noise. No amount of backtested significance justifies deploying a signal you can’t explain.

Cross-validation tests whether a signal is real or just fitted to a particular context. Does it work across different time periods? Different market regimes? Different asset types within our universe? A signal that works only in U.S. large-caps during a calm decade is probably overfit to that environment.

Parsimony—preferring simpler models—matters more than most people think. A model with 2 parameters is far more likely to be robust than one with 20. The additional complexity creates more degrees of freedom for the model to fit noise. The best models are often deceptively simple, and that simplicity is a feature.

Spotting It in Others

If you’re evaluating a systematic fund, a few questions cut through the noise quickly. What’s the out-of-sample performance? How does it perform across different market environments? How stable are the parameters over time? And critically: how did the first 1-2 years of actual trading compare to the backtest? A 10-year backtest showing 25% returns with 6% volatility paired with 2-year live performance of 6% returns and 15% volatility tells you what you need to know.

Our Approach

At Blackworks Capital, we’ve built the BWC Founders Fund on these defenses. Every strategy undergoes rigorous out-of-sample testing across multiple cycles and market conditions. We require economic intuition before statistical testing. We use walk-forward analysis to ensure parameter stability. We’ve rejected models that looked statistically perfect but failed our validation process.

That rejection discipline costs us in the short term—fewer strategies, deployed more slowly. But the strategies that survive are robust to real market conditions. Our multi-factor framework demands that signals earn their place through independent validation—not historical convenience. Daily rebalancing reinforces this by executing tested rules consistently, without drift.

Overfitting isn’t solved once. It’s a discipline maintained continuously. Our conviction is that sustainable alpha comes from rigorous process, not optimistic backtests. The question isn’t whether the risk exists—it’s whether the discipline is there to minimize it.

Get in touch to discuss how disciplined systematic investing can work for your portfolio.

View full post