“All models are wrong, but some are useful.” — George Box
The statement above has been the topic of many arguments. Some think that saying that all models are wrong is not helpful.
In my opinion, I agree with this sentence. It was true in 1976 when Box first said it, and it is still true today, decades later.
Complexity Aspect
When I created the first model to predict who would win the next soccer game, it performed poorly. There were too many unknowns to account for and too many outcomes impossible to predict.
I watched the game. Within the first ten minutes, one of the best players was injured. Something that I had not considered in the model.
In an attempt to improve my model, I began to add other variables that were not immediately on my radar — the possibility of injury, weather, and the type of grass in the stadium.
The improvement was insignificant.
Even with today’s computational power, statistical models fall short of the complexities of reality. No model can ever account for all possible outcomes.
Some Are Useful
Box is right.
Instead of focusing on getting every single aspect of a model correct, models should be judged by their utility rather than absolute truth.
As Box argued in his paper, the scientist’s real job is identifying what is importantly wrong — not chasing perfection.
Focusing on how much error you can accept is, arguably, more productive than focusing on absolute truth.
In fact, nowadays, in every model that I work on, I emphasize the margin of error. This is critically important in my line of work, where accuracy is expected.
Before I start building the model, I set a threshold for the margin of error acceptable for the project. That alone has saved me hours of over-engineering models that were already useful.
Comments (0)
Leave a Comment
No comments yet. Be the first to comment!