Cover image, based on "The Blueprint of Intelligence," by Rafael Araujo.
Buy from Amazon.
Buy from Cambridge University Press.
Download a draft from the arXiv.

Reload website:

The Principles of Deep Learning Theory

An Effective Theory Approach to Understanding Neural Networks

Daniel A. Roberts, Sho Yaida, Boris Hanin

A Cambridge University Press Book

This book develops an effective theory approach to understanding deep neural networks of practical relevance.

A free draft is available from the arXiv. You can also buy a copy in print from Amazon or direct from Cambridge University Press.

You can find some lectures by the book’s authors here, or you can view errata in the print edition here.

Citing the Book

If you would like to cite PDLT, you can use this BibTeX entry:

    title = "The Principles of Deep Learning Theory",
    author = "Roberts, Daniel A. and Yaida, Sho and Hanin, Boris",
    publisher = "Cambridge University Press",
    year = "2022",
    eprint = "2106.10165",
    archivePrefix = "arXiv",
    primaryClass = "cs.LG",

Yann LeCun, New York University and Chief AI Scientist at Meta:

In the history of science and technology, the engineering artifact often comes first: the telescope, the steam engine, digital communication. The theory that explains its function and its limitations often appears later: the laws of refraction, thermodynamics, and information theory. With the emergence of deep learning, AI-powered engineering wonders have entered our lives — but our theoretical understanding of the power and limits of deep learning is still partial. This is one of the first books devoted to the theory of deep learning, and lays out the methods and results from recent theoretical approaches in a coherent manner.

Edward Witten, Institute for Advanced Study:

For a physicist, it is very interesting to see deep learning approached from the point of view of statistical physics. This book provides a fascinating perspective on a topic of increasing importance in the modern world.

Scott Aaronson, University of Texas at Austin:

This is an important book that contributes big, unexpected new ideas for unraveling the mystery of deep learning’s effectiveness, in unusually clear prose. I hope it will be read and debated by experts in all the relevant disciplines.

William Bialek, Princeton University:

It is not an exaggeration to say that the world is being revolutionized by deep learning methods for AI. But why do these deep networks work? This book offers an approach to this problem through the sophisticated tools of statistical physics and the renormalization group. The authors provide an elegant guided tour of these methods, interesting for experts and non-experts alike. They write with clarity and even moments of humor. Their results, many presented here for the first time, are the first steps in what promises to be a rich research program, combining theoretical depth with practical consequences.

Gilbert Strang, Massachusetts Institute of Technology:

This book’s physics-trained authors have made a cool discovery, that feature learning depends critically on the ratio of depth to width in the neural net.

From the Back Cover

This textbook establishes a theoretical framework for understanding deep learning models of practical relevance. With an approach that borrows from theoretical physics, Roberts and Yaida provide clear and pedagogical explanations of how realistic deep neural networks actually work. To make results from the theoretical forefront accessible, the authors eschew the subject’s traditional emphasis on intimidating formality without sacrificing accuracy. Straightforward and approachable, this volume balances detailed first-principle derivations of novel results with insight and intuition for theorists and practitioners alike. This self-contained textbook is ideal for students and researchers interested in artificial intelligence with minimal prerequisites of linear algebra, calculus, and informal probability theory, and it can easily fill a semester-long course on deep learning theory. For the first time, the exciting practical advances in modern artificial intelligence capabilities can be matched with a set of effective principles, providing a timeless blueprint for theoretical research in deep learning.

  • Detailed step-by-step explanations for all equations and clear exposition of both old and new concepts in deep learning theory make the book accessible to readers with a minimal prerequisite of linear algebra, calculus, and informal probability theory
  • Many novel results that appear for the first time in the literature, taking readers to the forefront of deep learning theory
  • Provides a unique approach that bridges deep learning and theoretical physics, demonstrating to the ML community how a theoretical physics approach can be useful, while also teaching techniques that are valuable for theoretical physicists
Press Coverage