The 2023 Data Puzzle Is Complete

Updates

Assembling thousands of investing datapoints is a lot like solving a puzzle, and there’s really nothing like the feeling of slotting the last piece into place. The tactile sensation of the perfect fit just can’t be beat. And after countless hours of work, you can finally stand back and see the full picture.

As I mentioned when I updated all of the site data in early January, the results were only preliminary as there are always a few data stragglers. Well, the official numbers are finally in. And beyond a few revised datapoints, I also took the opportunity to update the data methodology.

Here’s everything that changed.

Final returns data


The final December numbers are all in, including a few things like bonds that had a nice last month of the year and inflation that revised down as well. So the 2023 results for your favorite portfolio may actually have ticked up a notch from when you last checked.

New bond sources


In the past I’ve had to lean heavily on my own models for bond returns, but I recently came across a few new free data sources to supplement my own work with real-world index data. A few of the bond series have now been updated with improved data that should help reduce tracking error overall.

Updated data prioritization


Collecting many years of historical data requires assembling every series from a variety of different sources. And when putting all of that together with the goal of maximizing accuracy, it’s important to be consistent in how you prioritize each source.

For a while now, I’ve been putting data from real-world ETFs at the top of the priority list as one would assume it should be very realistic rather than just theoretical. However, as I’ve learned more about things like historical expense ratios things got more murky. I’ll save the dirty details for another day, but as just one example consider the case of the Vanguard S&P 500 fund VFIAX.

Vanguard is well-known for their low expense ratios, and VFIAX doesn’t disappoint with an ER of only 0.04%. However, if you want older data and use the numbers from VFINX (a different and older share class of the same fund), you may be surprised to learn that the expense ratio back in 1977 was as high as 0.46%. And importantly, the fees are baked into the historical returns. That still may not sound like much, but even within the same basic fund that’s an 11x difference in tracking error that drifts based on the year you’re observing. And because historical expense ratios are often very hard to find, that error is also difficult to properly correct.

The good news is that there’s an easy solution — to focus less on individual funds and more on the underlying index that they are designed to track. In the case of VFIAX that means looking for S&P 500 data, and the same method can be applied to all sources. So I’ve shifted the data prioritization to move clean index data to the top of the list whenever it’s available.

For anyone interested in the details, you can read about my data methodology on the Data Sources page. Feel free to contact me if you have any questions or suggestions for further improvements. And if digging into the fine print of the instruction manual isn’t your thing, the most important takeaway is this:

I always strive to keep the data as accurate and realistic as possible.

Also, just because I’m looking to minimize the effect of ERs in the source data doesn’t mean they aren’t important. As a reminder, all of the site tools account for fund expenses! They just properly deduct the modern ERs (or custom fee of your choosing) equally from the historical index returns so that the results are more applicable to you.

New data, new challenges


As you can imagine, compiling so much data takes a lot of time. But the great thing about wrapping everything up is that it finally frees me up to dig into the numbers for new insights. I have a few cool ideas on the short list, but this also seems like a good opportunity to invite feedback.

What would you like to know?

My inbox is wide open to suggestions, and you can reach me by email, Discord, or Twitter. I look forward to your ideas! Let’s put our minds together and make it our goal to both learn something new in 2024.


Join the conversation