The Human Complexities of Correcting the Record

Uncategorized, Psychology, Theory

In a recent article on his website Early Retirement Now, Karsten Jeske extended his long-running series on safe withdrawal rates with a new entry detailing his perspective on the dangers of expecting small cap value stocks to help modern portfolios. This is not the first time he has expressed doubt in the small and value premiums, but in this case he also used the Golden Butterfly in an example case to warn against using historical small cap value data to make educated retirement decisions.

I believe there is plenty of room for differing opinions in the personal investing space, and I am not normally the type to reflexively reply to every criticism. That said, the article raises several interesting points that I believe are worth discussing. On some things I agree with Karsten. On others we clearly have very different philosophies when it comes to the best use of data. And on at least one issue, I believe the article is misleading and requires a balancing explanation.

Just to be clear from the start — while we may disagree on some things, my goal is not to lob rhetorical grenades or participate in petty internet fights. I simply plan to share my own unique perspective to help you see another side to the story. No drama. Just real talk about how to interpret historical data.

So no matter whether you love small cap value stocks or think the value premium is ancient history, let’s all lay down our arms and talk about the best way to approach the numbers in front of us.

The Declining Value Premium


To get things started, there’s one important point where we are in agreement — the small cap value premium has very clearly eroded since the mid 1970s.

Here’s a chart I made that shows the relative rolling 10-year returns of small cap value versus large cap blend. For detail oriented readers, note that my data tracks real-world returns from index providers rather than factor regressions. That actually makes my SCV data more conservative, as it more closely tracks a fund like VBR that is less heavily weighted towards small and value than ERN’s chosen factor model.

small cap value premiums vs large cap blend

Since this chart shows the difference between SCV and LCB rolling returns, a blue column means that small cap value did better while red means that large cap blend prevailed.

See the big peak around 1975? That’s the type of outperformance in the historical record that factor skeptics are right to call out. Just like it would be inappropriate to base your expectations for stock performance on the US bubble in the 80’s and 90’s, it is a bad idea to base your expectations for the small and value premiums looking only at the 1970s. Beyond that peak, you can also see a general decline in small cap value performance relative to large cap blend over time. While it still outperformed large cap blend in about 3 out of every 4 rolling timeframes, the magnitude of that outperformance has fallen over the years.

Next, if you look at the red to the right you can see why factor investors have been stressed out since 2009 and critics have become more vocal. As huge tech growth stocks have taken off since the 2008 financial crisis and only accelerated in recent years, large cap growth has been the place to be. So yeah, small cap value is clearly not a magic bullet for constant outperformance.

Where I depart a bit from ERN’s opinion is how to interpret this observation. He leans towards the position that factor premiums have been dead for ages and are at best much smaller looking forward than anything you see in the record. I look at data like this and see a cycle with peaks and valleys. We’re clearly experiencing a low point for SCV, but I also imagine that investors probably felt the same way in the 1990s during its own dot-com growth boom with very similar factor underperformance. We know how that turned out.

SCV wasn’t dead then, and I don’t believe it is irreparably broken now. It’s likely only a matter of time before the cycle repeats. But I do think it’s fair to think about how much of a premium to expect in the good years looking forward.

That said, it’s fine for people to look at the same data and come to different conclusions. So I have no problem with anyone who sees things differently. There are plenty of observers on both sides of this topic.

The Need for Altering Data


Another issue where I’m sympathetic is the need in certain circumstances to adjust the historical record. For example, both ERN and Portfolio Charts utilize Fama French data in our research. Raw Fama French numbers, however, are not always very useful on their own and require further massaging to get something that represents a true history that investors actually experienced outside of an academic research setting.

Portfolio Charts and Early Retirement Now take different paths to the same destination. I find their detailed numbers on factor loadings in different ETFs to be quite interesting and helpful as a way of modeling reasonably realistic fund-specific histories. Personally, I prefer to use the data to back-out the underlying index returns that went into constructing the Fama French database. But honestly both mindsets have their place and the goal is the same — to accurately present data that represents investing choices you and I can make today.

Now where our philosophies differ significantly is just how far we’re willing to go to massage historical data. I will do everything I can to create numbers that I can prove to be historically accurate and dutifully track real-world investment options because I want to have a solid foundation to see where the data leads. But when it comes to small cap value, ERN has a different mindset.

The main thrust of the article is a long, technical explanation of sophisticated data manipulation techniques. But the end goal is clearly stated — to remove the “ill-gotten excess returns”, and replace the recorded history with new “realistic” numbers based on a forward-looking Vanguard projection that factor skeptics agree with. Think of it as giving all of the tall blue columns in the previous chart a major buzz cut. Not just looking forward, but looking backward, too.

Perhaps this is a common thing in academic circles, but rewriting financial history to support one’s personal viewpoint is just not something I respond to. And even if the premise is 100% correct, I would argue the solution is unnecessary.

Let’s assume for a moment that the factor skepticism is fully justified. For people like me who prefer accurate histories to data models, here’s a quick example of how to use the Portfolio Charts tools to temper expectations.

How to Interpret SCV Charts


Sticking with the retirement topic, for this example I made two Withdrawal Rates charts that find the 30-year safe withdrawal rates in the United States for a portfolio of 75% US stocks and 25% intermediate term treasury bonds, the preferred benchmark in the article. On the left, I used large cap blend for the stocks. And on the right, I used small cap value. Move the slider to see the difference.

For a full explanation of this chart, read How to Harness the Flowing Nature of Withdrawal Rate Math

A 4% SWR for the LCB portfolio matches the well-known rule of thumb first established by Bill Bengen and replicated by many others. So it is no surprise at all. However, a 5.9% SWR using the same methodology but substituting SCV does look very appealing in comparison.

Now slide it all the way to the left and look at the full chart for the SCV portfolio. See that hitch in the orange line? That means that the data indicates that small cap value has performed worse in recent years and those new timeframes are establishing the all time low. It’s also possible that even newer paths with less than 15 years of data could set another new low in the future. So yes, the trend in small cap value has an impact on withdrawal rates.

Assuming for the sake of argument that the small and value premiums are truly dead, the most useful SWR using historical data is the simple 4% number using large cap blend with no small or value tilts at all. So putting both charts together, a safe number to plan for is somewhere between 4% and 5.9%, with the precise value depending largely on one’s faith in the argument for small cap value. To split the difference, let’s call it 5%.

While that may not sound very precise, just realize that it’s no less accurate than calculating a SWR to the second decimal from a dataset completely remade with one’s own assumptions. And it’s also about the same as the long term withdrawal rate that both the safe and perpetual rates approach over very long timeframes, making it a reasonable choice regardless of your portfolio to invest conservatively without going overboard in fear.

No fancy models from Nobel laureates required. Just good data and a bit of common sense. And the benefit of preserving the true history rather than sweeping it all away is that you can also continue studying those lines as they develop and adjust accordingly.

The Benefits of Diversification


Speaking of common sense, here’s a quick experiment to test yours.

This table contains the 30-year safe withdrawal rates in the previous example according to the stock choice for the 75% portion.

30-year SWR for a 75/25 portfolio by stock type

Large Cap BlendSmall Cap Value
4.0%5.9%

Instead of thinking only in binary all or nothing terms, guess what the withdrawal rate would be for a portfolio that divides the stock portion equally between LCB and SCV. Truly think about it and pick a number before checking the answer.

Click for the answer

The answer is 5.4%.

Surprised?

One of the things often lost in debates about factors is that the benefits of asset classes like small cap value go beyond the simple theoretical factor premiums. Small cap value stocks tend to represent different industries than large cap blend stocks and can thus react differently to various market events. By owning both, one can capture a pretty decent rebalancing bonus from the simple process of selling some of the one that is high to buy more of the one that is low. That’s how you can get a surprisingly high SWR from mixing the two that is noticeably better than you intuitively expect.

For a longer explanation of how this works, check out my article on Shannon’s Demon. But the short story is that regularly rebalancing multiple volatile assets can create a portfolio that is greater than the simple sum of its parts.

This phenomenon is important to this topic because it starts to hint at a few things omitted from ERN’s analysis. Yes, the article compares portfolios including LCB to others that split the money between LCB and SCV. But even though it’s ostensibly a post on factor premiums, it never actually goes full SCV. Instead, just like a previous article that discussed a Paul Merriman portfolio (a newer version of the Ultimate Buy and Hold Portfolio) when taking a similar anti-SCV position, it pivots to a well-diversified portfolio where factors are only one part of the equation.

While that may sound like nitpicking at first, it presents a real problem when transitioning from factor debates to portfolio discussions. Attributing the “ill-gotten excess returns” in the withdrawal rates of diversified portfolios that happen to contain small cap value stocks purely to factors does not tell the whole story. There’s more to it than that, and it’s important to genuinely engage the full mechanics of the portfolio being studied. A lot of those excess portfolio returns may be well justified.

A Few Small Details


On that note, we get to the part where the article discusses the Golden Butterfly. Despite what you may expect, I’m not going to defend it as the greatest portfolio to rule them all. I share information on 20+ portfolios and offer tools to study infinite more for a reason. Everyone is different, and there’s no one investing style suitable for all people.

I’m also not going to belabor a defense on certain points where I don’t believe the article does my work justice. Luckily, I have been writing about withdrawal rates for long enough that I can just point to a few resources for people who want my side of the story.

  • On the issue of the shorter Portfolio Charts dataset since 1970, first note that the 4% number I calculated above for the 75/25 ERN portfolio is less than 0.2% higher than the one he calculates since 1900. That’s within the 0.3% error range I found when calibrating my numbers against several other independent retirement researchers with much longer datasets, and also within the normal range of error for different sources of the same asset data. I cover all of that (and much more) in the Withdrawal Rates FAQ.
  • On the claim that the worst retirement start dates were in 1929, 1937, and the 1960s, that’s actually a sign of data conditioning from someone who has been looking at the same two US asset options for too long. Every portfolio has its own unique worst case year. That applies not only to different asset classes in the US but especially to international portfolios. For even more comparisons to withdrawal rates calculated from long histories outside the US, check out the “calibration” section of the Global Withdrawal Rates page.
  • On the claim that the Golden Butterfly was optimized specifically with the December 1972 cohort in mind, I have absolutely no clue where that’s coming from. The entire concept is built around consistency in every timeframe, and the track record backs it up. For a thorough walkthrough, read The Theory Behind the Golden Butterfly.

But really, those are all small things tertiary to the topic at hand and no big deal. It’s all well documented for anyone who has questions.

Errors of Omission


With that out of the way, the real problem arises when the article starts comparing numbers and introduces a sleight of hand that most people won’t immediately recognize. Perhaps anticipating some pushback, it takes care to mention this note early on in the Golden Butterfly section.

“One caveat: US investors were not allowed to own gold between 1933 and 1974. So, take the simulations with a grain of salt.”

While that is factually true, it misses the point entirely.

The big elephant in the room is not that gold was illegal to hold in the US. It’s that gold — by law, not markets — had zero return in the US from 1834 to 1971 and also globally under the Bretton Woods treaty. Gold was the foundation underlying currencies worldwide with every Dollar, Pound, and Deutschmark corresponding either directly or indirectly to a fixed weight in gold. That’s what it means when a currency is based on a gold standard. And most importantly, that economic system ended in 1971 and no longer exists today.

For a full explanation of the gold history and why it is critical to understand the historical context when interpreting backtests including gold, read Metal, Money, and the Measurable Value of Gold. But for a quick summary relevant to this conversation, check out this gold price chart using data pulled directly from the ERN SWR Toolbox.

Think about how volatile gold is today, currently up over 25% this year alone. Now look at how it maintained a perfect return of exactly zero for 50 years, experienced a step function up in the 1930’s when the US deliberately devalued their own currency, and moved only slightly over the period of the Bretton Woods timeframe based solely on the government tinkering with minor changes in negotiated exchange rates. Then look at 1971 when it took off like a rocket when Nixon ended gold convertibility.

While we’re here, also note that the flat return started decades before private US citizens were disallowed from owning gold in 1933. And the gold price started moving 3 years before they were allowed to own it again. This clearly demonstrates that ownership legality wasn’t the driving force in the price behavior. It was all about the gold standard.

Even if you know nothing about monetary history, it’s easy to see that the gold data before and after 1971 represents two completely different assets. In fact, flipping the switch from a zero-return cash equivalent to one of the more volatile assets you can buy is a MUCH bigger change than any slow fade in small cap value. And the impact on withdrawal rate calculations using datasets where most of the gold data does not even represent a free market cannot be understated.

If one wants to model the impact of gold in a portfolio under a gold monetary standard like the one that existed prior to 1971, there are two choices. First, you can load the model with gold data that had no return by law. And second, you can leave that portion of the portfolio completely empty as if you simply stuffed uninvested dollar bills in a vault. The reason for that is simple — under a gold standard, gold bullion is legal tender just like paper money.

If you don’t believe me, it is easy to test using ERN’s spreadsheet.* Looking only at the first five decades where the worst case 30-year SWR calculations do not touch the unlocked gold data after 1971, here’s how the withdrawal rate numbers compare for portfolios of 100% gold versus 100% nothing.

30-year Safe Withdrawal Rates

100% Gold100% Nothing
1900s1.93%1.93%
1910s2.22%1.98%
1920s4.50%2.84%
1930s1.87%1.78%
1940s1.80%1.74%

Trust, but verify. See the end of the section for how to replicate this on your own.

Check out the 1900s row. That’s the one that corresponds to the section of the gold history in the previous chart with a long straight horizontal line with no return. And the results of investing in gold and in literally nothing are identical.

If you’re wondering where the larger difference in the 1920’s comes from, that’s simply the result of the one-time larger gold revaluation around 1930 early on in the retirement runs. And the small deviations in the other cohorts are from tertiary effects of ongoing currency adjustments, nothing more.

This is where the comparison between the 75/25 portfolio and the Golden Butterfly in the article starts to fall apart. All of those early cohorts before 1971 are what drive down the withdrawal rates for the Golden Butterfly. The article attributes the disappointing numbers to two things: using a longer history with more worst-case years, and “correcting” the small cap value history with more “appropriate” returns. But in reality, it’s mostly due to an inappropriate use of gold data from an old economic system that no longer applies today.

To see how much zeroing out gold returns under a monetary gold standard affects the numbers, here’s a similar comparison I ran for the Golden Butterfly since 1970. The first chart uses the true gold history, and the second uses a flat zero-return gold history that applies under a gold standard. Drag the slider to see the difference.

That’s a 1.7% drop in safe withdrawal rates due solely to the applied monetary system that affects gold data alone. And it has absolutely nothing to do with dataset size or small cap value.

With that in mind, let’s circle back to the comparison in the article. It points to the 1930s as proof that the Golden Butterfly didn’t perform nearly as well as the 75/25 portfolio even before you “fix” the small cap value data. But if one credits back anything close to the haircut caused by the huge difference in gold economic regimes, the conclusion would be quite different.

50-year SWRs starting in the 1930s

75% LCB, 25% IT BondsGolden ButterflyGolden Butterfly, but use HP-Filtered SMB+HML plus 1.20%/0.80% alphaSame GB with ERN adjustments, but accounting for a free gold market
3.87%3.31%2.87%4.47%

Data from ERN calculations. I also used the slightly smaller 1.6 delta in LTWR to account for the longer 50-year timeframe.

Long story short, most of the cited withdrawal rate differences in the Golden Butterfly can be attributed NOT to small cap value but to blind use of inappropriate gold data from an antiquated economic system that has zero bearing on modern investors. Even if the top-line hypothesis about a diminishing small cap value premium is completely right, the example proves nothing but the prevalence of blind spots.

Let that sink in, and the irony is worth repeating. In the process of explaining why it’s important to correct the historical record to not use small cap value data that it argues no longer applies today, the article cites a case study using gold data that provably no longer applies today.

It’s ok. We’re all human. I also have an open mind, and I look forward to future ERN articles on the small cap value topic that provide better examples without the unnecessary noise. In addition, I’d also love to see an update to an older article on gold to address the same issue discussed above. Karsten was that close to approving of gold in a portfolio even without realizing how much the defunct gold standard depressed the SWR calculations. I think further clarification could be enlightening.

But in the meantime, just keep this in mind when researching portfolios like the Golden Butterfly for yourself. Context is important. And sometimes the things left unsaid are the points that matter the most.

* How to replicate the gold vs. nothing numbers

Because the ERN spreadsheet applies inflation to each asset and not at the portfolio level, it’s not as simple as entering a null portfolio with no assets. Go to the Asset Returns sheet, look for the commodities data in column M, and enter a 1 in every row. That will create a dummy asset option that represents nothing but uninvested cash still subject to inflation. Now you can directly compare gold against uninvested cash by using the commodities input.

Lessons in Self Awareness


If there’s one thing I want for people to take away from this article, it’s not that I have strong opinion in the small cap value debate (I can see both sides), that I feel the need to stan for the Golden Butterfly (you can take it or leave it — there are many other good options), or that I have a bone to pick with a fellow finance blogger (I genuinely hope Karsten reads this in the tone of mutual respect that I intend). Instead, let it be this.

All people — no matter how intelligent, qualified, and thoughtful — have their own biases and blind spots.

I know I do. One could argue that I have a bias towards risk parity style portfolios, and that I might have a blind spot where I am too accepting of the documented historical record. But I also try to temper those instincts to the best of my ability because insecurity is in my nature.

Maybe it’s the engineering background where failures often occur in unexpected ways or just a terminal case of imposter syndrome, but I’m always scared to death of what I don’t know. That’s why I make it a point to read outside opinions with an open mind, triple check my own calculations against as many independent sources as possible, and generally try to stay humble and avoid hard claims of thought superiority. The pursuit of knowledge is an unending process, and it requires effort to overcome normal human impulses to call it a day and assume you have figured it all out.

As bright and qualified as Karsten is, he has a few biases and blind spots, too. There’s arguably a bias against small cap value to the point of feeling the need to rewrite history, and also a blind spot with gold where properly studying it requires more nuance and shorter dataset lengths than he’s normally inclined to indulge. To be clear, he contributes so much more in the finance sphere that the occasional missteps do not negate the overall good he has accomplished. Because I have a different perspective, I just find some of his arguments more convincing than others.

Make no mistake, it also extends elsewhere in finance discussions. Interestingly, one more topic where Karsten and I agree is how it’s a head scratcher when smart researchers like Scott Cederburg and team work so hard to create a new composite global data series — that no investor in history ever experienced — in order to support a clearly stated bias against market histories in the United States. I believe the Cederburg research also has its own blind spot with international bonds, which I cover in my writeup of global withdrawal rates. But again, that’s not a shot. Every researcher has their own angle and target audience, which is what makes us all unique.

It’s for that reason that this article took longer than normal to write. I chose my words carefully, knowing full well that it is bound to make some people unhappy. And I’m not even really talking about fellow retirement researchers, but everyday fans of their work. One thing I have learned over the years writing about investing and participating in countless conversations online is that people tend to gravitate towards financial writers who share the same opinions they already have.

If factor investing sounds appealing, you probably like books by Larry Swedroe or podcasts by Ben Felix. If you like modern portfolio theory or unique assets beyond the standard stock/bond fare, you probably read Portfolio Charts or listen to Frank Vasquez. If you’re naturally inclined to invest heavily in global stocks, you probably seek out every Scott Cederburg interview. And if you enjoy sophisticated withdrawal rate analysis with an eye on valuations (but not the value factor), you’re probably a big fan of Karsten Jeske. You get the idea.

The same process works in the negative. If you just don’t care for things like small cap value or gold, you might instinctively be overly dismissive of any portfolio that contains them or any calculation that shows them in a positive light. And you probably accept any example that supports your own position without scrutiny. It’s just human nature.

You’re biased, too. We all are.

So here is my challenge to people who made it this far.

If there is a financial thesis you strongly believe in, take a moment to seriously consider the other side of the argument. Don’t buy the first reasonable sounding explanation that supports what you already believe, and don’t assume that every example tells the whole truth. No one person will have the entire story, and jumping to conclusions only shuts down the learning process.

To truly build knowledge, it is important to exercise the complementary cognitive skills of input collection and discernment. Frank Vasquez often references a Bruce Lee quote that I think is appropriate here.

“Absorb what is useful, discard what is useless, and add what is uniquely your own.”

That applies to so many things, and finance is no exception. Even if it’s just a bit of philosophy and you’re still working through the numerical back and forth, I hope I contributed something useful today worth absorbing.


Join the conversation