Menu
Log in


Is k-fold cross-validation worth teaching?

  • 6 May 2026 5:23 PM
    Message # 13628615

    I have been interrogating AI about k-fold cross-validation for a course I am writing. I thought I understood it before, but I understand it better now. AI is really brilliant for this kind of thing. It is like talking to an enthusiastic RA who is smart and knowledgeable, makes some mistakes, but always sucks up to you because you hold the research grant.  ;)

    Anyway, it seems to me that the cross-validation is b-s, at least in any application I can think of. So I am intending not to teach it, or at least give it a couple of slides and say not to bother.

    Hear me out.

    CV gives you the accuracy of the modelling strategy, not the model you actually use. It estimates accuracy averaged over different possible models from different random subsets of the data. And then at the end, you apply the modelling strategy to all the data and say that the CV estimate described it. But it doesn’t describe the model; it describes the modelling strategy.

    Now I am a pretty hard core frequentist. But in this case, I want to estimate future prediction accuracy conditional on all the (training) data that led to the model. I do not want to average over models that might have been! 

    It's like estimating my life outcome averaging over different decisions I could have made over the last 68 years.

    That’s why Kaggle just uses partition – leaving some data out and judging the winner based on this test data.

    Help me out here! Have I completely missed the point? Am I becoming a closet Bayesian in my old age?!

    It could happen.!Fred Hoyle became a catholic after all....


Powered by Wild Apricot Membership Software