2 Comments

Based

Expand full comment

Happy to discuss at length but the term over fitting is too squishy. It is simply applied when someone thinks a process yields results they think are suboptimal.

Your example is showing an early stopping algorithm but I guarantee you people will still accuse of overfitting, rightfully so, even if you use early stopping. Often this occurs when yoi jave a poorly specified loss function or a badly chosen data set that learning the validation set still doesn't generalize.

I find this passage to be clarifiying for describing why overfitting is an underspecified, and unhelpful, term.

This came from Bob Carpenter on the Stan mailing list:

It’s not overfitting so much as model misspecification.

If your model is correct, “overfitting” is impossible. In its usual form, “overfitting” comes from using too weak of a prior distribution.

One might say that “weakness” of a prior distribution is not precisely defined. Then again, neither is “overfitting.” They’re the same thing.

P.S. In response to some discussion in comments: One way to define overfitting is when you have a complicated statistical procedure that gives worse predictions, on average, than a simpler procedure.

Or, since we’re all Bayesians here, we can rephrase: Overfitting is when you have a complicated model that gives worse predictions, on average, than a simpler model.

I’m assuming full Bayes here, not posterior modes or whatever.

Anyway, yes, overfitting can happen. And it happens when the larger model has too weak a prior. After all, the smaller model can be viewed as a version of the larger model, just with a very strong prior that restricts some parameters to be exactly zero

Expand full comment