Happy to discuss at length but the term over fitting is too squishy. It is simply applied when someone thinks a process yields results they think are suboptimal.

Your example is showing an early stopping algorithm but I guarantee you people will still accuse of overfitting, rightfully so, even if you use early stopping. Often this occurs when yoi jave a poorly specified loss function or a badly chosen data set that learning the validation set still doesn't generalize.

I find this passage to be clarifiying for describing why overfitting is an underspecified, and unhelpful, term.

This came from Bob Carpenter on the Stan mailing list:

It’s not overfitting so much as model misspecification.

If your model is correct, “overfitting” is impossible. In its usual form, “overfitting” comes from using too weak of a prior distribution.

One might say that “weakness” of a prior distribution is not precisely defined. Then again, neither is “overfitting.” They’re the same thing.

P.S. In response to some discussion in comments: One way to define overfitting is when you have a complicated statistical procedure that gives worse predictions, on average, than a simpler procedure.

Or, since we’re all Bayesians here, we can rephrase: Overfitting is when you have a complicated model that gives worse predictions, on average, than a simpler model.

I’m assuming full Bayes here, not posterior modes or whatever.

Anyway, yes, overfitting can happen. And it happens when the larger model has too weak a prior. After all, the smaller model can be viewed as a version of the larger model, just with a very strong prior that restricts some parameters to be exactly zero

Based

Happy to discuss at length but the term over fitting is too squishy. It is simply applied when someone thinks a process yields results they think are suboptimal.

Your example is showing an early stopping algorithm but I guarantee you people will still accuse of overfitting, rightfully so, even if you use early stopping. Often this occurs when yoi jave a poorly specified loss function or a badly chosen data set that learning the validation set still doesn't generalize.

I find this passage to be clarifiying for describing why overfitting is an underspecified, and unhelpful, term.

This came from Bob Carpenter on the Stan mailing list:

It’s not overfitting so much as model misspecification.

If your model is correct, “overfitting” is impossible. In its usual form, “overfitting” comes from using too weak of a prior distribution.

One might say that “weakness” of a prior distribution is not precisely defined. Then again, neither is “overfitting.” They’re the same thing.

P.S. In response to some discussion in comments: One way to define overfitting is when you have a complicated statistical procedure that gives worse predictions, on average, than a simpler procedure.

Or, since we’re all Bayesians here, we can rephrase: Overfitting is when you have a complicated model that gives worse predictions, on average, than a simpler model.

I’m assuming full Bayes here, not posterior modes or whatever.

Anyway, yes, overfitting can happen. And it happens when the larger model has too weak a prior. After all, the smaller model can be viewed as a version of the larger model, just with a very strong prior that restricts some parameters to be exactly zero