title: "replacing the regime score" slug: "replacing-the-regime-score" date: "2026-04-12" updated: null type: "methodology" excerpt: "The original regime score had zero variance over any 180-day window. The replacement hits r = 0.439 against forward returns. Here is how it happened." tags: ["scoring", "regime", "factor-analysis"] readTime: 9 draft: false
The original regime score in the scoring engine was a composite of four FRED macro series — unemployment, ISM PMI, yield curve slope, and a financial-conditions index. It was the first factor I built and I had a clear story for why it mattered: recession-adjacent environments should depress expected returns, expansionary environments should lift them, and the composite would pick that up and feed it into every ticker's score. The story was tight. The factor was useless.
— —
the thing I noticed
I started actually looking at factor outputs after a run of backtests where the composite score barely moved between two obviously different market weeks. One was the week after a surprise FOMC dissent, the other was a quiet melt-up into an earnings cycle. Both weeks had near-identical regime scores.
That shouldn't have been possible. If the factor was actually reading the environment, those two weeks should have produced meaningfully different regime outputs. I pulled the factor's per-day values for a 180-session window and the problem was immediate: the standard deviation was under two points on a 0–100 scale. The factor was a constant dressed up as a variable.
The reason, once I looked at it, was obvious. FRED macro series are monthly. A few are quarterly. None of them tick fast enough to move a daily scoring factor, and the smoothing I'd added to avoid month-over-month jumps turned the residual signal into mud. By the time the composite rolled up four of those, the daily factor was effectively a constant with a slow drift over quarters.
the factor correlation sweep
Before replacing it I wanted to understand what did correlate with forward returns over a 5-day horizon. The sweep was simple: construct candidate regime factors from any reasonable combination of market-observable inputs, compute a daily value for each, and regress against forward 5-day total return on the SPY as the target.
The candidates I tested:
- Single-series: SPY 20/50/200 trend agreement, QQQ/SPY ratio 20-day change, VIX level, VIX percentile, VIX term structure slope, credit spread proxies from HYG/LQD, XLU/SPY as a defensive ratio, equal-weight S&P over cap-weight.
- Pair combinations: each of the above with VIX as a regime filter.
- Triples: everything plus a volatility and a momentum leg.
The winning combination was not subtle. SPY trend agreement (a 0/1/2/3 score based on whether price is above its 20/50/200 day moving averages), times QQQ/SPY ratio momentum, minus a VIX percentile penalty. The r-value against forward 5-day SPY returns was 0.439 on the 2023–2025 in-sample window, and held at 0.41 on the 2025–2026 hold-out. Zero tuning, no ML, no fitted hyperparameters beyond the lookback windows.
For comparison, the original FRED-based factor produced an r-value of 0.003 on the same window, which is close enough to "no correlation" that it was measuring noise.
the chart
The thing that made the difference legible was plotting the new factor against SPY's 5-day forward return, binned into deciles of the factor score. The bottom-decile bins showed consistently negative forward returns (–80 basis points average over the hold-out window) and the top-decile bins showed consistently positive forward returns (+140 bps average). The monotonicity of the relationship — the deciles stepped cleanly up — is what convinced me the factor was doing real work rather than catching a few outlier days.
The FRED-based factor, run through the same decile analysis, had bins that were indistinguishable from one another. The top and bottom deciles differed by 12 basis points, which is noise on a sample of the size we're working with.
what changed in the code
The regime factor module now takes three inputs at scan time: the SPY daily bar series, the QQQ daily bar series, and the VIX daily close. It computes the three sub-signals, combines them with a weighting that's identical to the regression output rounded to the nearest 0.05, and returns a single 0–100 score. The computation is 38 lines of Python. There is no calibration, no normalization against a moving baseline beyond what's already in the VIX percentile. The whole thing runs in a few milliseconds and has no external data dependencies beyond prices.
The weight of the regime factor in the overall composite went from 15% (the weight it had earned in the original regression when it was basically a constant) to 22% (the weight it earned in the re-run regression after replacement). That 7-percentage-point shift came at the expense of three other factors — narrative freshness, catalyst density, and IV rank — each of which gave up 2–3 points.
what it taught me about the system
Two things.
First, the problem wasn't that the FRED data was wrong. FRED data is correct for what it describes. The problem was that I'd assigned it a job it couldn't do — provide a daily signal — and built a factor around that assumption without ever checking whether the factor actually varied day over day. A five-minute plot of the factor's output would have caught this two years earlier. I now have a mandatory "plot the factor's distribution and autocorrelation" check in the commit hook that blocks a factor module from being added to the engine until someone looks at the output.
Second, the story that the replacement factor is telling — that SPY trend agreement plus QQQ/SPY momentum minus a VIX penalty is a reasonable proxy for the market's current regime — is almost embarrassingly simple. It is what a discretionary trader would tell you they look at on a Monday morning. The lesson I took from this is that the quantitatively interesting result was not the factor itself; it was that the obvious factor actually produced a measurable signal, and the clever factor produced nothing. There is a temptation, when you build these things, to reach for complexity because complexity feels like effort. In this case, less was more, and the quiet part out loud is that I spent nine months iterating on the FRED-based version when I could have tried the SPY/VIX version on day one.
what I'd still like to know
The next question, which I haven't answered yet, is whether the replacement factor's edge is stable across regime changes. The hold-out window was mostly in a constructive market. If we hit a real drawdown — the kind of thing the factor is supposed to warn about — does its top-decile advantage collapse, or does it still rank the cleanest days correctly? I don't know yet, and the dataset I need to answer it doesn't fully exist. The current plan is to re-run this analysis once per month and note any drift. If the r-value falls below 0.25 for two consecutive months, I'll write about it.
— —
The ledger reflects the new regime score as of the February 2026 rebuild. If you are looking for the specific date the swap happened, it's in the commit history.