It is a mystery how overparameterized models behave and why. It is especially intriguing why overparameterized models can generalize well despite the excessive capacity, and why highly sparse neural networks can still achieve comparable performance to the dense networks (as suggested in the lottery ticket hypothesis).

Over the past few months, I have been intrigued by the possible relationship between model generalization and sparsity. The slide below shows a brief review of several related works and the questions that I cared about. Luckily I am able to answer a few of them with my own research now (see sparse double descent).

Unable to display PDF file. Download instead.