Why backtests lie
And what you can do about it
Today, the starting point for how I approach research is completely different from how I approached it years ago.
I used to start with data. Pull a bunch of price history, run some analysis, hunt for patterns that looked promising. Optimise the parameters until everything looked great. Then I’d go live and watch the whole thing fall apart.
The problem wasn’t the statistics. The problem was I had no idea why the pattern should exist in the first place.
The mental model came last when it should have come first.
Every price you see on your screen is the result of someone buying and someone selling.
Sounds obvious, right?
But most of us - myself included, for a long time - treat price data like it fell from the sky. Just numbers to be analysed in isolation.
In reality, prices move because participants with different objectives, constraints, and information interact. Some are trying to make money. Some are hedging risks they’d rather not have. Some are rebalancing portfolios on a schedule. Some are forced to trade regardless of price.
That last one is key. Forced to trade regardless of price.
When you understand who’s in the market and why they’re trading, you start to see where edge might really exist. Not because you found a correlation in historical data, but because you understand the mechanism creating it.
Building the Model
A useful mental model answers a few questions:
Who are the major participants? In equities, you’ve got index funds rebalancing mechanically, hedge funds hunting alpha, market makers providing liquidity, and retail traders doing... various things.
In crypto, there are basis traders, funding rate arbitrageurs, trend-followers, leverage-hungry speculators, and protocols with treasury management needs.
What are their objectives? Index funds want to minimise tracking error, not maximise returns. Market makers want to capture spread while managing inventory. Hedge funds want uncorrelated returns. Each objective creates predictable behaviour.
What are their constraints? This is where it gets interesting. Constraints create predictable, price-insensitive trading. An index fund has to buy when a stock enters the index. A risk parity fund has to rebalance when volatility changes. A leveraged trader has to liquidate when margin runs out.
When are they forced to act? End of month, end of quarter, index rebalances, options expiration - these create windows where certain participants must trade regardless of price.
From Model to Edge
Once you have this framework, research becomes more directed.
Instead of asking “is there a pattern here?”, you ask “who might be creating predictable price pressure, and can I trade against them?”
Instead of data mining for anomalies, you look for situations where participants with constraints are forced to trade at bad prices (for them).
Take rebalance flows. Index funds mechanically buy stocks entering an index and sell stocks leaving. This isn’t a secret. Everyone knows it happens. But knowing it happens and understanding the dynamics well enough to trade it profitably are different things.
You need to understand the timing, the size of the flows relative to liquidity (even if there’s no precision). How other participants might be positioning. Whether the effect is already priced in. Under what conditions it becomes more or less pronounced.
All of this comes from having a rich mental model, not from running regressions.
The Practical Bit
Before you dive into backtesting your next idea, ask yourself: Who’s on the other side of this trade? Why are they trading? What constraint or objective makes their behaviour predictable? Why won’t this get arbitraged away? Why could I trade against this?
If you can’t answer these questions, you’re probably data mining. And data mining without understanding causation is how you build strategies that worked beautifully in the past and terribly in the future.
The market is very rational on aggregate, even though there are a lot of maniacs in it. Your edge comes from understanding the structure well enough to know where rationality creates predictable behaviour you can exploit.
That mental model is the foundation everything else builds on. Get it right, and research becomes clearer.

Solid piece on model-first thinking. The point about constraints forcing predictable trades is exactly where systematic edge lives. Most quants optimize til they find something that works in sample, but if there's no structural reason why participants must act that way, it's just noise fitting. The rebalance flow example nails it, nobody's gonan stop index funds from mechanically buying, so that edge persists.