In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small test error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error.[2] This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning.[1]
History
Early observations of what would later be called double descent in specific models date back to 1989.[3][4] The term "double descent" was coined in 2019,[1] when the phenomenon as a broader concept shared by many models gained popularity.[5][6][7] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of bias-variance tradeoff), and the empirical observations in the 2010s that some modern machine learning models tend to perform better with larger models.[6][8]
Theoretical models
Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.[9]
Xiangyu Chang; Yingcong Li; Samet Oymak; Christos Thrampoulidis (2021). "Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks". Proceedings of the AAAI Conference on Artificial Intelligence. 35 (8). arXiv:2012.08749.