Blog

Do stock factors actually work? Testing momentum, low volatility, and reversal on 5 years of S&P 500 data

We tested three classic price-based factors on 619,040 days of S&P 500 data from 2013–2018: momentum, low volatility, and short-term reversal. Two worked weakly, one had the wrong sign, none were statistically significant. Here's what 5 years of real data tells us about factor investing.

The previous three posts in this series — on credit risk, loan defaults, and credit card fraud — were all about classification on binary outcomes. They were also explicitly setting up a different kind of test: can we apply the same evidence-first discipline to equity scoring, which is QScoring's actual job?

This post answers that question on real data. We pulled the camnugent/sandp500 Kaggle dataset — 619,040 daily price rows across all 505 S&P 500 constituents from February 2013 to February 2018 — and ran a clean cross-sectional factor test on the 474 names with full price history.

Three price-based factors. Five years of data. Monthly cross-sectional ranks. Long-short quintile portfolios. Information coefficients with t-statistics. All gross of costs and computed without look-ahead. The results are not as clean as the academic literature suggests.

Headline:Momentum +3.9% annualized · Short-term reversal +2.8% · Low volatility −6.4% (wrong sign in this period). Zero of three factors had statistically significant IC (t-stat > 2) over the 5-year window.

1. What we're testing and how

With price-only data we can't compute true fundamental-value factors like price-to-earnings or price-to-book — those require financial-statement data the Kaggle file doesn't carry. What we can compute, cleanly and without ambiguity, are the three classic price-based factors:

For each factor and each month t:

2. The market backdrop

Before looking at factors, it helps to know what the market did:

Cumulative growth of equal-weighted S&P 500 plus AAPL, XOM, KO over 2013-2018 showing strong bull market
Figure 1.Equal-weighted S&P 500 (gold) compounded to 1.91× over the 5-year window — about +91% total return, +14% annualized. This was a quintessential post-QE bull market: low rates, high multiples, narrow drawdowns. The interesting question for factor testing isn't whether stocks made money — they did, broadly — but whether which stocks they were could be predicted.

The monthly-return cross-section is also worth eyeballing:

Histogram of all monthly returns across all S&P 500 names showing fat-tailed distribution centered slightly above zero
Figure 2.Pooled monthly returns across all 474 names × 61 months ≈ 29,000 observations. Median monthly return: +1.4%. Mean: +1.3%. The distribution is fat-tailed on both sides — many stocks deliver ±5%+ moves in a typical month. That cross-sectional dispersion is the raw material a factor needs to work with.

3. Information coefficients — the headline metric

The IC time series for the three factors:

Three IC time series showing all three factors hovering around zero with noisy monthly variation
Figure 3. Cross-sectional Spearman IC by month (thin lines) and 6-month rolling averages (bold). Mean ICs: momentum +0.016, low vol −0.020, short-term reversal +0.019. The literature considers ICs in the 0.03–0.05 range “weakly informative”; all three of our factors are below that bar on a 5-year window. None of the t-statistics exceed 2.

Read those numbers carefully. Two factors (momentum, short-term reversal) had IC of the expected sign — positive — but the magnitudes are small and the variability across months is huge. The third factor (low volatility) had IC of the wrong sign: high-vol stocks outperformed low-vol stocks in this period, on average.

That last finding isn't a coding bug. It's a well-documented feature of the 2013–2018 environment: in a zero-interest-rate, QE-driven bull market, high-beta and high-growth names dominated. The low-volatility anomaly that worked decades earlier was structurally fighting the regime.

4. Quintile portfolios — momentum is the most well-behaved

IC summarizes the rank correlation. Quintile portfolios show what actually happens when you act on that ranking. Here's momentum:

Five lines showing cumulative growth of momentum quintile portfolios — Q5 (highest momentum) reaches 1.75x by 2018, Q1 reaches 1.41x
Figure 4.Equal-weighted momentum quintile portfolios, rebalanced monthly. Q1 (worst momentum) grew 1.41×; Q5 (best momentum) grew 1.75×— a 33-percentage-point cumulative spread. The middle quintiles aren't perfectly monotonic (Q4 dips below Q3), but the top-vs-bottom spread is the direction the literature predicts.

That said: a 33pp spread over 4+ years is not a lot. The equal-weighted S&P 500 returned 91% over the same window. The quintile spread is just over a third of the broad market's total return.

5. Long-short returns — the operational view

A long-short portfolio buys the top quintile and shorts the bottom quintile. It's the cleanest test of whether the factor carries real signal, because it strips out the market beta — what's left is pure factor return.

Cumulative long-short returns for momentum (positive), short-term reversal (positive), and low volatility (negative)
Figure 5. Long-short cumulative returns. Momentum compounded +21% gross over 5 years (+3.9% annualized). Short-term reversal: +14% (+2.8% annualized). Low volatility: −28%(−6.4% annualized) — high-vol names beat low-vol names every year of this period.
Summary table:Momentum mean IC +0.016, t-stat 0.53, Sharpe 0.35, max DD −24.2% · ST Reversal +0.019, 1.04, 0.36, −7.3% · Low Vol −0.020, −0.82, −0.51, −22.8%
Bar chart of annualized Sharpe ratios: momentum +0.35, low vol -0.51, short-term reversal +0.36
Figure 6.Annualized Sharpe ratios. Low volatility's −0.51 isn't just “factor didn't work” — it's “factor worked in reverse,” statistically indistinguishable from zero but consistently wrong-signed.

6. Drawdowns — factors are not free lunches

Drawdown chart showing all three factors going through 20%+ drawdowns at various points
Figure 7.Drawdowns from peak. Momentum's 24% drawdown happened in early 2016 — the same period that classic “momentum crash” risk premia papers identify after sharp market dislocations. Low-vol grinds down continuously. Short-term reversal stays close to its peak the whole period.

If you ran any of these as a standalone strategy with real capital, you'd need the discipline to hold through a 20%+ drawdown without changing your mind. Most investors can't. This is the soft constraint that makes factor investing harder in practice than in backtests.

7. Are the factors independent?

Combining factors into a composite score only makes sense if they carry distinct information. The correlation matrix:

Heatmap of correlations between the three long-short factor returns
Figure 8. Long-short return correlations. Momentum × Low Vol = +0.62 (both lost out to the same high-vol names). Momentum × Reversal = −0.30 (opposite-horizon, expected). Low Vol × Reversal = −0.38. The factors are not 3 independent bets — only the negative correlations suggest meaningful diversification benefit.

If you naively averaged the three factor signals into a composite score on 2013–2018, you'd effectively be double-weighting “don't buy high-vol names” (the shared bet between momentum and low-vol) and only partially-cancelling that with the reversal signal. Composite scoring requires factor de-correlation, not just factor averaging.

8. The honest conclusions

Five years of S&P 500 data is not enough to confidently say a factor “works” or “doesn't.”

The decades-long academic record on momentum is robust — but that record is built on ~100 years of data spanning multiple regimes. On any individual 5-year window, momentum can be flat, positive but weak, or even negative (the 2008–2009 momentum crash is famous). Our finding of “positive but not significant” over 2013–2018 is consistent with the long-run literature, not in conflict with it.

The low-volatility anomaly is structurally regime-dependent. It works when expensive low-vol stocks beat cheap high-vol ones — typically in slow-growth, risk-off environments. The 2013–2018 window was the opposite environment, and the anomaly inverted. The right read isn't “low-vol is dead” — it's “low-vol has macro-regime exposure that long-term backtests average out.”

Short-term reversal had a clean low-drawdown profile but only +2.8% annualized gross. Subtract realistic transaction costs (5–15 bps per month at high monthly turnover) and the return is plausibly negative net of costs.

9. What QScoring does about this

This post is, in a sense, the empirical justification for several specific choices in the QScoring methodology:

The factor zoo problem in equity research — Cochrane's “hundreds of significant factors discovered” — is the same overfitting problem we warned about in the loan-default post. The honest fix is the same one: small, vetted feature set with a real empirical record, evaluated on the operational metric that actually matters.

Related reads

Discussion

Comments are powered by GitHub Discussions. Sign in with GitHub to join the conversation.

← All posts