Axler
Linear Algebra Done Right
by Sheldon Axler
Axler's text is renowned for its decision to banish determinants to the end of the book. This forces the student to understand linear maps, eigenvalues, and inner product spaces based on their geometric properties rather than algebraic formulas. This "operator-centric" view aligns perfectly with modern deep learning, where layers are viewed as operators acting on function spaces. It builds the mental models necessary to understand concepts like Low-Rank Adaptation (LoRA) and the spectral properties of weight matrices, which are crucial for understanding model stability and compression.
Strang
Introduction to Linear Algebra
by Gilbert Strang
While Axler provides rigor, Strang provides the connection to computation. His focus on the "Four Fundamental Subspaces" provides a concrete mental image of how matrices manipulate data.
Hubbard & Hubbard
Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach
by John H. Hubbard, Barbara Burke Hubbard
This book is legendary among mathematics enthusiasts for treating the derivative not just as a number or a vector, but as a linear transformation (the Jacobian matrix) that best approximates a function near a point. This viewpoint is exactly how automatic differentiation engines (like PyTorch's autograd) operate—computing Jacobian-Vector products. It integrates linear algebra and calculus seamlessly, which is how they appear in machine learning. It provides proofs that allow a researcher to understand *when* optimization might fail (e.g., non-differentiable points like ReLU at 0, saddle points).
Spivak
Calculus on Manifolds
by Michael Spivak
A concise, dense classic. While elegant, Hubbard & Hubbard is generally preferred for self-study due to its more explanatory nature and unified approach.
Blitzstein & Hwang
Introduction to Probability
by Joseph Blitzstein, Jessica Hwang
Based on the famous Harvard Stat 110 course. This book is unrivaled in building *intuition*. It emphasizes "story proofs"—understanding *why* a formula works through narrative logic rather than algebraic manipulation.
Wasserman
All of Statistics: A Concise Course in Statistical Inference
by Larry Wasserman
This book covers a massive amount of ground—from basic probability to VC dimension and bootstrapping—very quickly. It is an excellent bridge to the "Elements of Statistical Learning."
Skiena
The Algorithm Design Manual
by Steven Skiena
Unlike the standard *Introduction to Algorithms* (CLRS), which is encyclopedic and theoretical, Skiena's book focuses on the *design* process and practical "war stories." It teaches you how to recognize a problem type and select the right tool, which is critical for research interviews and actual engineering work.
CS:APP
Computer Systems: A Programmer's Perspective
by Randal E. Bryant, David R. O'Hallaron
This is the standard text for understanding how software interacts with hardware.
PRML
Pattern Recognition and Machine Learning
by Christopher Bishop
This book is the gold standard for the **Bayesian** perspective. It explains regularization not just as a heuristic, but as a prior belief on the model parameters.
ESL
The Elements of Statistical Learning
by Trevor Hastie, Robert Tibshirani, Jerome Friedman
This text is more "frequentist" and statistical, excellent for understanding the bias-variance tradeoff and decision trees.
Shalev-Shwartz & Ben-David
Understanding Machine Learning: From Theory to Algorithms
by Shai Shalev-Shwartz, Shai Ben-David
This book is mathematically dense and focuses on **PAC Learning** (Probably Approximately Correct). It answers the fundamental question: "Under what conditions is learning even possible?"
Prince
Understanding Deep Learning
by Simon Prince
While Goodfellow's *Deep Learning* (2016) is a classic, it predates the Transformer revolution. Prince's book is modern, visually intuitive, and covers Transformers, Diffusion, and Generative AI. It is the superior choice for a student starting in 2025.
Hwu & Kirk
Programming Massively Parallel Processors
by David B. Kirk, Wen-mei W. Hwu
Sutton & Barto
Reinforcement Learning: An Introduction
by Richard S. Sutton, Andrew G. Barto
The foundational text of the field.