Buy from Amazon.

Buy from Cambridge University Press.

Download a draft from the arXiv.

Homepage: deeplearningtheory.com

Buy from Amazon.

Buy from Cambridge University Press.

Download a draft from the arXiv.

Homepage: deeplearningtheory.com

An overview of the material in *The Principles of Deep Learning Theory* (PDLT) was given virtually in 2021 at the Princeton Deep Learning Theory (PDLT) Summer School by Dan Roberts and Sho Yaida.

TL;DW Dan gives an overview of the course and begins a discussion of training dynamics by covering linear models and kernel methods.

TL;DW Dan introduces the quadratic model as a minimal model of representation learning, and use gradient descent to solve the training dynamics. This extends kernel methods to “nearly-kernel methods.”

TL;DW Sho explains how to recursively compute the statistics of a deep and finite-width MLP at initialization. Due to the principle of sparsity, the distribution of the network output is tractable.

TL;DW Sho solves the layer-to-layer recursions derived before with the principle of criticality. We learn that the leading finite-width effects scale like the depth-to-width ratio of the network.

TL;DW By combining init statistics & training dynamics we get this. Then, Dan explain how MLPs *-polate, how to estimate a network’s optimal aspect ratio, and how to think about complexity.