Recently, I have been thinking about machine learning. This post is a list of the books I have been reading.

Wasserman. All of Statistics: This short book is a great introduction to statistics. Wasserman comments in the preface,

*using fancy machine learning tools without understanding basic statistics is like doing brain surgery before knowing how to use a band aid.*Hyperbole aside, he has a point. Most machine learning algorithms are easy to understand if you know statistics. At the very least, if you understand basic statistics, you are unlikely to try and train a MNIST classifier using least squares....Bishop. Pattern Recognition and Machine Learning: This is a standard machine learning text. Covers all the basics and has tons of exercises.

Murphy. Machine Learning, A Probabilistic Perspective: Also a standard text. Covers the same stuff as Bishop, but it is a little thicker. There is also matlab code for every single algorithm and figure in the book which is fantastic!

MacKay. Information Theory, Inference and Learning Algorithms: This is my favorite book on the subject. MacKay is a master expositor. The number of pictures and examples is mind boggling. In my opinion, the exercises are more thoughtful than Bishop and Murphy.

Goodfellow, Bengio, Courville. Deep Learning: In the past few years, supervised deep learning has matured into a tool which many companies are using to extract value from their data. This book explains how to transform the beautiful statistical ideas behind neural networks into concrete software.

Goodman and Tenenbaum. Probabilistic Models of Cognition: This book has a different flavor to the others. Code in a conventional programming language describes a sequence of machine instructions. Code in a probabilistic programming language describes a joint distribution. The compiler takes this code and returns machine code which samples from the joint distribution. Many machine learning problems can be expressed and solved inside probabilistic programming languages, so it is good to understand the basic idea, even if the technology doesn't scale well at this point.

Hal Abelson, Jerry Sussman and Julie Sussman. The Structure and Interpretation of Computer Programs: In statistical inference, you spend time extracting useful information from data. For example, if you have a ton of patient records from a clinical trial, you want to decide if the drug works or not. Maybe you have a lot of user data from a website and you want to increase the number of sales. On the other hand, machine learning is about using statistical tools to solve problems which we know can be done by a computer in theory, but are hard to do in a conventional logic based programming language, for example image classification, image generation or sentiment analysis. Computation if a fundamental part of machine learning and my favourite computer science book is Structure and Interpretation of Computer Programs. It teaches you about how we compute rather than teaching you the syntax of a popular language.