Grokking Machine Learning
By Luis Serrano
()
About this ebook
In Grokking Machine Learning you will learn:
Supervised algorithms for classifying and splitting data
Methods for cleaning and simplifying data
Machine learning packages and tools
Neural networks and ensemble methods for complex datasets
Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. No specialist knowledge is required to tackle the hands-on exercises using Python and readily available machine learning tools. Packed with easy-to-follow Python-based exercises and mini-projects, this book sets you on the path to becoming a machine learning expert.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Discover powerful machine learning techniques you can understand and apply using only high school math! Put simply, machine learning is a set of techniques for data analysis based on algorithms that deliver better results as you give them more data. ML powers many cutting-edge technologies, such as recommendation systems, facial recognition software, smart speakers, and even self-driving cars. This unique book introduces the core concepts of machine learning, using relatable examples, engaging exercises, and crisp illustrations.
About the book
Grokking Machine Learning presents machine learning algorithms and techniques in a way that anyone can understand. This book skips the confused academic jargon and offers clear explanations that require only basic algebra. As you go, you’ll build interesting projects with Python, including models for spam detection and image recognition. You’ll also pick up practical skills for cleaning and preparing data.
What's inside
Supervised algorithms for classifying and splitting data
Methods for cleaning and simplifying data
Machine learning packages and tools
Neural networks and ensemble methods for complex datasets
About the reader
For readers who know basic Python. No machine learning knowledge necessary.
About the author
Luis G. Serrano is a research scientist in quantum artificial intelligence. Previously, he was a Machine Learning Engineer at Google and Lead Artificial Intelligence Educator at Apple.
Table of Contents
1 What is machine learning? It is common sense, except done by a computer
2 Types of machine learning
3 Drawing a line close to our points: Linear regression
4 Optimizing the training process: Underfitting, overfitting, testing, and regularization
5 Using lines to split our points: The perceptron algorithm
6 A continuous approach to splitting points: Logistic classifiers
7 How do you measure classification models? Accuracy and its friends
8 Using probability to its maximum: The naive Bayes model
9 Splitting data by asking questions: Decision trees
10 Combining building blocks to gain more power: Neural networks
11 Finding boundaries with style: Support vector machines and the kernel method
12 Combining models to maximize results: Ensemble learning
13 Putting it all in practice: A real-life example of data engineering and machine learning
Luis Serrano
Luis G. Serrano is a research scientist in quantum artificial intelligence at Zapata Computing. He has worked previously as a Machine Learning Engineer at Google, as a Lead Artificial Intelligence Educator at Apple, and as the Head of Content in Artificial Intelligence and Data Science at Udacity. Luis has a PhD in mathematics from the University of Michigan, a bachelor’s and master’s in mathematics from the University of Waterloo, and worked as a postdoctoral researcher at the Laboratoire de Combinatoire et d’Informatique Mathématique at the University of Quebec at Montreal. Luis maintains a popular YouTube channel about machine learning with over 75,000 subscribers and over 3 million views, and is a frequent speaker at artificial intelligence and data science conferences.
Related to Grokking Machine Learning
Related ebooks
Grokking Artificial Intelligence Algorithms Rating: 0 out of 5 stars0 ratingsMachine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5Deep Learning for Vision Systems Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Deep Learning with Python, Second Edition Rating: 0 out of 5 stars0 ratingsBuild a Career in Data Science Rating: 5 out of 5 stars5/5Deep Learning with Structured Data Rating: 0 out of 5 stars0 ratingsMachine Learning with TensorFlow, Second Edition Rating: 0 out of 5 stars0 ratingsGANs in Action: Deep learning with Generative Adversarial Networks Rating: 0 out of 5 stars0 ratingsMLOps Engineering at Scale Rating: 0 out of 5 stars0 ratingsMachine Learning Systems: Designs that scale Rating: 0 out of 5 stars0 ratingsHuman-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI Rating: 0 out of 5 stars0 ratingsFeature Engineering Bookcamp Rating: 0 out of 5 stars0 ratingsInterpretable AI: Building explainable machine learning systems Rating: 0 out of 5 stars0 ratingsPractical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions Rating: 0 out of 5 stars0 ratingsMachine Learning: Adaptive Behaviour Through Experience: Thinking Machines Rating: 4 out of 5 stars4/5Managing Machine Learning Projects: From design to deployment Rating: 5 out of 5 stars5/5Grokking Deep Learning Rating: 0 out of 5 stars0 ratingsGrokking Deep Reinforcement Learning Rating: 5 out of 5 stars5/5Classic Computer Science Problems in Python Rating: 0 out of 5 stars0 ratingsNatural Language Processing in Action: Understanding, analyzing, and generating text with Python Rating: 0 out of 5 stars0 ratingsDeep Learning for Search Rating: 0 out of 5 stars0 ratingsDeep Reinforcement Learning in Action Rating: 4 out of 5 stars4/5Grokking Simplicity: Taming complex software with functional thinking Rating: 4 out of 5 stars4/5Machine Learning in Action Rating: 0 out of 5 stars0 ratingsReal-World Machine Learning Rating: 0 out of 5 stars0 ratingsData Science Bookcamp: Five real-world Python projects Rating: 5 out of 5 stars5/5Advanced Algorithms and Data Structures Rating: 0 out of 5 stars0 ratingsMath for Programmers: 3D graphics, machine learning, and simulations with Python Rating: 4 out of 5 stars4/5
Programming For You
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsLearn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5HTML in 30 Pages Rating: 5 out of 5 stars5/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Spies, Lies, and Algorithms: The History and Future of American Intelligence Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5iPhone For Dummies Rating: 0 out of 5 stars0 ratingsC# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2 Rating: 0 out of 5 stars0 ratingsSQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Java for Beginners: A Crash Course to Learn Java Programming in 1 Week Rating: 5 out of 5 stars5/5Tiny Python Projects: Learn coding and testing with puzzles and games Rating: 4 out of 5 stars4/5Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications Rating: 0 out of 5 stars0 ratingsPython: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5
Reviews for Grokking Machine Learning
0 ratings0 reviews
Book preview
Grokking Machine Learning - Luis Serrano
inside front cover
The way to descend from the mountain is to take that one small step in the direction that makes us descend the most and to continue doing this for a long time.
Grokking Machine Learning
Luis G. Serrano
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
www.manning.com
Copyright
For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: [email protected]
©2021 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
ISBN: 9781617295911
contents
front matter
foreword
preface
acknowledgments
about this book
about the author
1 What is machine learning? It is common sense, except done by a computer
Do I need a heavy math and coding background to understand machine learning?
OK, so what exactly is machine learning?
How do we get machines to make decisions with data? The remember-formulate-predict framework
2 Types of machine learning
What is the difference between labeled and unlabeled data?
Supervised learning: The branch of machine learning that works with labeled data
Unsupervised learning: The branch of machine learning that works with unlabeled data
What is reinforcement learning?
3 Drawing a line close to our points: Linear regression
The problem: We need to predict the price of a house
The solution: Building a regression model for housing prices
How to get the computer to draw this line: The linear regression algorithm
How do we measure our results? The error function
Real-life application: Using Turi Create to predict housing prices in India
What if the data is not in a line? Polynomial regression
Parameters and hyperparameters
Applications of regression
4 Optimizing the training process: Underfitting, overfitting, testing, and regularization
An example of underfitting and overfitting using polynomial regression
How do we get the computer to pick the right model? By testing
Where did we break the golden rule, and how do we fix it? The validation set
A numerical way to decide how complex our model should be: The model complexity graph
Another alternative to avoiding overfitting: Regularization
Polynomial regression, testing, and regularization with Turi Create
5 Using lines to split our points: The perceptron algorithm
The problem: We are on an alien planet, and we don’t know their language!
How do we determine whether a classifier is good or bad? The error function
How to find a good classifier? The perceptron algorithm
Coding the perceptron algorithm
Applications of the perceptron algorithm
6 A continuous approach to splitting points: Logistic classifiers
Logistic classifiers: A continuous version of perceptron classifiers
How to find a good logistic classifier? The logistic regression algorithm
Coding the logistic regression algorithm
Real-life application: Classifying IMDB reviews with Turi Create
Classifying into multiple classes: The softmax function
7 How do you measure classification models? Accuracy and its friends
Accuracy: How often is my model correct?
How to fix the accuracy problem? Defining different types of errors and how to measure them
A useful tool to evaluate our model: The receiver operating characteristic (ROC) curve
8 Using probability to its maximum: The naive Bayes model
Sick or healthy? A story with Bayes’ theorem as the hero
Use case: Spam-detection model
Building a spam-detection model with real data
9 Splitting data by asking questions: Decision trees
The problem: We need to recommend apps to users according to what they are likely to download
The solution: Building an app-recommendation system
Beyond questions like yes/no
The graphical boundary of decision trees
Real-life application: Modeling student admissions with Scikit-Learn
Decision trees for regression
Applications
10 Combining building blocks to gain more power: Neural networks
Neural networks with an example: A more complicated alien planet
Training neural networks
Coding neural networks in Keras
Neural networks for regression
Other architectures for more complex datasets
11 Finding boundaries with style: Support vector machines and the kernel method
Using a new error function to build better classifiers
Coding support vector machines in Scikit-Learn
Training SVMs with nonlinear boundaries: The kernel method
12 Combining models to maximize results: Ensemble learning
With a little help from our friends
Bagging: Joining some weak learners randomly to build a strong learner
AdaBoost: Joining weak learners in a clever way to build a strong learner
Gradient boosting: Using decision trees to build strong learners
XGBoost: An extreme way to do gradient boosting
Applications of ensemble methods
13 Putting it all in practice: A real-life example of data engineering and machine learning
The Titanic dataset
Cleaning up our dataset: Missing values and how to deal with them
Feature engineering: Transforming the features in our dataset before training the models
Training our models
Tuning the hyperparameters to find the best model: Grid search
Using K-fold cross-validation to reuse our data as training and validation
Appendix A. Solutions to the exercises
Appendix B. The math behind gradient descent: Coming down a mountain using derivatives and slopes
Appendix C. References
index
front matter
foreword
Did you think machine learning is complicated and hard to master? It’s not! Read this book!
Luis Serrano is a wizard when it comes to explaining things in plain English. I met him first when he taught machine learning on Udacity. He made our students feel that all of machine learning is as simple as adding or subtracting numbers. And most of all, he made the material fun. The videos he produced for Udacity were incredibly engaging and remain among the most liked content offered on the platform.
This book is better! Even the most fearful will enjoy the material presented herein, as Serrano demystifies some of the best-held secrets of the machine learning society. He takes you step by step through each of the critical algorithms and techniques in the field. You can become a machine learning aficionado even if you dislike math. Serrano minimizes the mathematical kauderwelsch that so many of us hard-core academics have come to love, and instead relies on intuition and practical explanations.
The true goal of this book is to empower you to master these methods yourself. So the book is full of fun exercises, in which you get to try out those mystical (and now demystified) techniques yourself. Would you rather gorge on the latest Netflix TV show, or spend your time applying machine learning to problems in computer vision and natural language understanding? If the latter, this book is for you. I can’t express how much fun it is to play with the latest in machine learning, and see your computer do magic under your supervision.
And since machine learning is just about the hottest technology to emerge in the past few years, you will now be able to leverage your new-found skills in your job. A few years back, the New York Times proclaimed that there were only 10,000 machine learning experts in the world, with millions of open positions. That is still the case today! Work through this book and become a professional machine learning engineer. You are guaranteed to possess one of the most in-demand skills in the world today.
With this book, Luis Serrano has done an admirable job explaining complex algorithms and making them accessible to almost everyone. But he doesn’t compromise depth. Instead, he focuses on the empowerment of the reader through a sequence of enlightening projects and exercises. In this sense, this is not a passive read. To fully benefit from this book, you have to work. At Udacity, we have a saying: You won’t lose weight by watching someone else exercise. To grok machine learning, you have to learn to apply it to real-world problems. If you are ready to do this, this is your book—whoever you are!
Sebastian Thrun, PhD
Founder, Udacity
Adjunct Professor, Stanford University
preface
The future is here, and that future has a name: machine learning. With applications in pretty much every industry, from medicine to banking, from self-driving cars to ordering our coffee, the interest in machine learning has rapidly grown day after day. But what is machine learning?
Most of the time, when I read a machine learning book or attend a machine learning lecture, I see either a sea of complicated formulas or a sea of lines of code. For a long time, I thought that this was machine learning, and that machine learning was reserved only for those who had a solid knowledge of both math and computer science.
However, I began to compare machine learning with other subjects, such as music. Musical theory and practice are complicated subjects. But when we think of music, we do not think of scores and scales; we think of songs and melodies. And then I wondered, is machine learning the same? Is it really just a bunch of formulas and code, or is there a melody behind it?
Figure FM.1 Music is not only about scales and notes. There is a melody behind all the technicalities. In the same way, machine learning is not only about formulas and code. There is also a melody, and in this book, we sing it.
With this in mind, I embarked on a journey to understand the melody of machine learning. I stared at formulas and code for months. I drew many diagrams. I scribbled drawings on napkins and showed them to my family, friends, and colleagues. I trained models on small and large datasets. I experimented. After a while, I started listening to the melody of machine learning. All of a sudden, some very pretty pictures started forming in my mind. I started writing stories that go along with all the machine learning concepts. Melodies, pictures, stories—that is how I enjoy learning any topic, and it is those melodies, those pictures, and those stories that I share with you in this book. My goal is to make machine learning fully understandable to every human, and this book is a step in that journey—a step that I’m happy you are taking with me!
acknowledgments
First and foremost, I would like to thank my editor, Marina Michaels, without whom this book wouldn’t exist. Her organization, thorough editing, and valuable input helped shape Grokking Machine Learning. I thank Marjan Bace, Bert Bates, and the rest of the Manning team for their support, professionalism, great ideas, and patience. I thank my technical proofers, Shirley Yap and Karsten Strøbæk; my technical development editor, Kris Athi; and the reviewers for giving me great feedback and correcting many of my mistakes. I thank the production editor, Keri Hales, the copy editor, Pamela Hunt, the graphics editor, Jennifer Houle, the proofreader, Jason Everett, and the entire production team for their wonderful work in making this book a reality. I thank Laura Montoya for her help with inclusive language and AI ethics, Diego Hernandez for valuable additions to the code, and Christian Picón for his immense help with the technical aspects of the repository and the packages.
I am grateful to Sebastian Thrun for his excellent work democratizing education. Udacity was the platform that first gave me a voice to teach the world, and I would like to thank the wonderful colleagues and students I met there. Alejandro Perdomo and the Zapata Computing team deserve thanks for introducing me to the world of quantum machine learning. Thanks also to the many wonderful leaders and colleagues I met at Google and Apple who were instrumental in my career. Special thanks to Roberto Cipriani and the team at Paper Inc. for letting me be part of the family and for the wonderful job they do in the education community.
I’d like to thank my many academic mentors who have shaped my career and my way of thinking: Mary Falk de Losada and her team at the Colombian Mathematical Olympiads, where I first started loving mathematics and had the chance to meet great mentors and create friendships that have lasted a lifetime; my PhD advisor, Sergey Fomin, who was instrumental in my mathematical education and my style of teaching; my master’s advisor, Ian Goulden; Nantel and François Bergeron, Bruce Sagan and Federico Ardila, and the many professors and colleagues I had the opportunity to work with, in particular those at the Universities of Waterloo, Michigan, Quebec at Montreal, and York; and finally, Richard Hoshino and the team and students at Quest University, who helped me test and improve the material in this book.
To all the reviewers: Al Pezewski, Albert Nogués Sabater, Amit Lamba, Bill Mitchell, Borko Djurkovic, Daniele Andreis, Erik Sapper, Hao Liu, Jeremy R. Loscheider, Juan Gabriel Bono, Kay Engelhardt, Krzysztof Kamyczek, Matthew Margolis, Matthias Busch, Michael Bright, Millad Dagdoni, Polina Keselman, Tony Holdroyd, and Valerie Parham-Thompson, your suggestions helped make this a better book.
I would like to thank my wife, Carolina Lasso, who supported me at every step of this process with love and kindness; my mom, Cecilia Herrera, who raised me with love and always encouraged me to follow my passions; my grandma, Maruja, for being the angel that looks at me from heaven; my best friend, Alejandro Morales, for always being there for me; and my friends who have enlightened my path and brightened my life, I thank you and love you with all my heart.
YouTube, blogs, podcasts, and social media have given me the chance to connect with thousands of brilliant souls all over the world. Curious minds with an endless passion for learning, fellow educators who generously share their knowledge and insights, form an e-tribe that inspires me every day and gives me the energy to continue teaching and learning. To anyone who shares their knowledge with the world or who strives to learn every day, I thank you.
I thank anyone out there who is striving to make this world a more fair and peaceful place. To anyone who fights for justice, for peace, for the environment, and for equal opportunities for every human on Earth regardless of their race, gender, place of birth, conditions, and choices, I thank you from the bottom of my heart.
And last, but certainly not least, this book is dedicated to you, the reader. You have chosen the path of learning, the path of improving, the path of feeling comfortable in the uncomfortable, and that is admirable. I hope this book is a positive step along your way to following your passions and creating a better world.
about this book
This book teaches you two things: machine learning models and how to use them. Machine learning models come in different types. Some of them return a deterministic answer, such as yes or no, whereas others return the answer as a probability. Some of them use equations; others use if statements. One thing they have in common is that they all return an answer, or a prediction. The branch of machine learning that comprises the models that return a prediction is aptly named predictive machine learning. This is the type of machine learning that we focus on in this book.
How this book is organized: A roadmap
Types of chapters
This book has two types of chapters. The majority of them (chapters 3, 5, 6, 8, 9, 10, 11, and 12) each contain one type of machine learning model. The corresponding model in each chapter is studied in detail, including examples, formulas, code, and exercises for you to solve. Other chapters (chapters 4, 7, and 13) contain useful techniques to use to train, evaluate, and improve machine learning models. In particular, chapter 13 contains an end-to-end example on a real dataset, in which you’ll be able to apply all the knowledge you’ve obtained in the previous chapters.
Recommended learning paths
You can use this book in two ways. The one I recommend is to go through it linearly, chapter by chapter, because you’ll find that the alternation between learning models and learning techniques to train them is rewarding. However, another learning path is to first learn all the models (chapters 3, 5, 6, 8, 9, 10, 11, and 12), and then learn the techniques for training them (chapters 4, 7, and 13). And of course, because we all learn in different ways, you can create your own learning path!
Appendices
This book has three appendices. Appendix A contains the solutions to each chapter’s exercises. Appendix B contains some formal mathematical derivations that are useful but more technical than the rest of the book. Appendix C contains a list of references and resources that I recommend if you’d like to further your understanding.
Requirements and learning goals
This book provides you with a solid framework of predictive machine learning. To get the most out of this book, you should have a visual mind and a good understanding of elementary mathematics, such as graphs of lines, equations, and basic probability. It is helpful (although not mandatory) if you know how to code, especially in Python, because you are given the opportunity to implement and apply several models in real datasets throughout the book. After reading this book, you will be able to do the following:
Describe the most important models in predictive machine learning and how they work, including linear and logistic regression, naive Bayes, decision trees, neural networks, support vector machines, and ensemble methods.
Identify their strengths and weaknesses and what parameters they use.
Identify how these models are used in the real world, and formulate potential ways to apply machine learning to any particular problem you would like to solve.
Learn how to optimize these models, compare them, and improve them, to build the best machine learning models we can.
Code the models, whether by hand or using an existing package, and use them to make predictions on real datasets.
If you have a particular dataset or problem in mind, I invite you to think about how to apply what you learn in this book to it, and to use it as a starting point to implement and experiment with your own models.
I am super excited to start this journey with you, and I hope you are as excited!
Other resources
This book is self-contained. This means that aside from the requirements described earlier, every concept that we need is introduced in the book. However, I include many references, which I recommend you check out if you’d like to understand the concepts at a deeper level or if you’d like to explore further topics. The references are all in appendix C and also at this link: http://serrano.academy/grokking-machine-learning.
In particular, several of my own resources accompany this book’s material. In my page at http://serrano.academy, you can find a lot of materials in the form of videos, posts, and code. The videos are also in my YouTube channel www.youtube.com/c/LuisSerrano, which I recommend you check out. As a matter of fact, most of the chapters in this book have a corresponding video that I recommend you watch as you read the chapter.
We’ll be writing code
In this book, we’ll be writing code in Python. However, if your plan is to learn the concepts without the code, you can still follow the book while ignoring the code. Nevertheless, I recommend you at least take a look at the code, so you get familiarized with it.
This book comes with a code repository, and most chapters will give you the opportunity to code the algorithms from scratch or to use some very popular Python packages to build models that fit given datasets. The GitHub repository is www.github.com/luisguiserrano/manning, and I link the corresponding notebooks throughout the book. In the README of the repository, you will find the instructions for the packages to install to run the code successfully.
The main Python packages we use in this book are the following:
NumPy: for storing arrays and performing complex mathematical calculations
Pandas: for storing, manipulating, and analyzing large datasets
Matplotlib: for plotting data
Turi Create: for storing and manipulating data and training machine learning models
Scikit-Learn: for training machine learning models
Keras (TensorFlow): for training neural networks
About the code
This book contains many examples of source code in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code.
In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.
The code for the examples in this book is available for download on the Manning website (https://www.manning.com/books/grokking-machine-learning), and from GitHub at www.github.com/luisguiserrano/manning.
liveBook discussion forum
Purchase of Grokking Machine Learning includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to https://livebook.manning.com/#!/book/grokking-machine-learning/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion.
Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
about the author
1 What is machine learning? It is common sense, except done by a computer
In this chapter
what is machine learning
is machine learning hard (spoiler: no)
what do we learn in this book
what is artificial intelligence, and how does it differ from machine learning
how do humans think, and how can we inject those ideas into a machine
some basic machine learning examples in real life
I am super happy to join you in your learning journey!
Welcome to this book! I’m super happy to be joining you in this journey through understanding machine learning. At a high level, machine learning is a process in which the computer solves problems and makes decisions in much the same way as humans.
In this book, I want to bring one message to you: machine learning is easy! You do not need to have a heavy math and programming background to understand it. You do need some basic mathematics, but the main ingredients are common sense, a good visual intuition, and a desire to learn and apply these methods to anything that you are passionate about and where you want to make an improvement in the world. I’ve had an absolute blast writing this book, because I love growing my understanding of this topic, and I hope you have a blast reading it and diving deep into machine learning!
Machine learning is everywhere
Machine learning is everywhere. This statement seems to be truer every day. I have a hard time imagining a single aspect of life that cannot be improved in some way or another by machine learning. For any job that requires repetition or looking at data and gathering conclusions, machine learning can help. During the last few years, machine learning has seen tremendous growth due to the advances in computing power and the ubiquity of data collection. Just to name a few applications of machine learning: recommendation systems, image recognition, text processing, self-driving cars, spam recognition, medical diagnoses . . . the list goes on. Perhaps you have a goal or an area in which you want to make an impact (or maybe you are already making it!). Very likely, machine learning can be applied to that field—perhaps that is what brought you to this book. Let’s find out together!
Do I need a heavy math and coding background to understand machine learning?
No. Machine learning requires imagination, creativity, and a visual mind. Machine learning is about picking up patterns that appear in the world and using those patterns to make predictions in the future. If you enjoy finding patterns and spotting correlations, then you can do machine learning. If I were to tell you that I stopped smoking and am eating more vegetables and exercising, what would you predict will happen to my health in one year? Perhaps that it will improve. If I were to tell you that I’ve switched from wearing red sweaters to green sweaters, what would you predict will happen to my health in one year? Perhaps that it won’t change much (it may, but not based on the information I gave you). Spotting these correlations and patterns is what machine learning is about. The only difference is that in machine learning, we attach formulas and numbers to these patterns to get computers to spot them.
Some mathematics and coding knowledge are needed to do machine learning, but you don’t need to be an expert. If you are an expert in either of them, or both, you will certainly find your skills will be rewarded. But if you are not, you can still learn machine learning and pick up the mathematics and coding as you go. In this book, we introduce all the mathematical concepts we need at the moment we need them. When it comes to coding, how much code you write in machine learning is up to you. Machine learning jobs range from those who code all day long, to those who don’t code at all. Many packages, APIs, and tools help us do machine learning with minimal coding. Every day, machine learning is more available to everyone in the world, and I’m glad you’ve jumped on the bandwagon!
Formulas and code are fun when seen as a language
In most machine learning books, algorithms are explained mathematically using formulas, derivatives, and so on. Although these precise descriptions of the methods work well in practice, a formula sitting by itself can be more confusing than illustrative. However, like a musical score, a formula may hide a beautiful melody behind the confusion. For example, let’s look at this formula: Σi⁴=1i. It looks ugly at first glance, but it represents a very simple sum, namely, 1 + 2 + 3 + 4. And what about Σin=1wi? That is simply the sum of many (n) numbers. But when I think of a sum of many numbers, I’d rather imagine something like 3 + 2 + 4 + 27, rather than 1 Σin=1wi. Whenever I see a formula, I immediately have to imagine a small example of it, and then the picture is clearer in my mind. When I see something like P(A|B), what comes to mind? That is a conditional probability, so I think of some sentence along the lines of The probability that an event A occurs given that another event B already occurs.
For example, if A represents rain today and B represents living in the Amazon rain forest, then the formula P(A|B) = 0.8 simply means The probability that it rains today given that we live in the Amazon rain forest is 80%.
If you do love formulas, don’t worry—this book still has them. But they will appear right after the example that illustrates them.
The same phenomenon happens with code. If we look at code from far away, it may look complicated, and we might find it hard to imagine that someone could fit all of that in their head. However, code is simply a sequence of steps, and normally each of these steps is simple. In this book, we’ll write code, but it will be broken down into simple steps, and each step will be carefully explained with examples or illustrations. During the first few chapters, we will be coding our models from scratch to understand how they work. In the later chapters, however, the models get more complicated. For these, we will use packages such as Scikit-Learn, Turi Create, or Keras, which have implemented most machine learning algorithms with great clarity and power.
OK, so what exactly is machine learning?
To define machine learning, first let’s define a more general term: artificial intelligence.
What is artificial intelligence?
Artificial intelligence (AI) is a general term, which we define as follows:
artificial intelligence The set of all tasks in which a computer can make decisions
In many cases, a computer makes these decisions by mimicking the ways a human makes decisions. In other cases, they may mimic evolutionary processes, genetic processes, or physical processes. But in general, any time we see a computer solving a problem by itself, be it driving a car, finding a route between two points, diagnosing a patient, or recommending a movie, we are looking at artificial intelligence.
What is machine learning?
Machine learning is similar to artificial intelligence, and often their definitions are confused. Machine learning (ML) is a part of artificial intelligence, and we define it as follows:
machine learning The set of all tasks in which a computer can make decisions based on data
What does this mean? Allow me to illustrate with the diagram in figure 1.1.
Figure 1.1 Machine learning is a part of artificial intelligence.
Let’s go back to looking at how humans make decisions. In general terms, we make decisions in the following two ways:
By using logic and reasoning
By using our experience
For example, imagine that we are trying to decide what car to buy. We can look carefully at the features of the car, such as price, fuel consumption, and navigation, and try to figure out the best combination of them that adjusts to our budget. That is using logic and reasoning. If instead we ask all our friends what cars they own, and what they like and dislike about them, we form a list of information and use that list to decide, then we are using experience (in this case, our friends’ experiences).
Machine learning represents the second method: making decisions using our experience. In computer lingo, the term for experience is data. Therefore, in machine learning, computers make decisions based on data. Thus, any time we get a computer to solve a problem or make a decision using only data, we are doing machine learning. Colloquially, we could describe machine learning in the following way:
Machine learning is common sense, except done by a computer.
Going from solving problems using any means necessary to solving problems using only data may feel like a small step for a computer, but it has been a huge step for humanity (figure 1.2). Once upon a time, if we wanted to get a computer to perform a task, we had to write a program, namely, a whole set of instructions for the computer to follow. This process is good for simple tasks, but some tasks are too complicated for this framework. For example, consider the task of identifying if an image contains an apple. If we start writing a computer program to develop this task, we quickly find out that it is hard.
Figure 1.2 Machine learning encompasses all the tasks in which computers make decisions based on data. In the same way that humans make decisions based on previous experiences, computers can make decisions based on previous data.
Let’s take a step back and ask the following question. How did we, as humans, learn how an apple looks? The way we learned most words was not by someone explaining to us what they mean; we learned them by repetition. We saw many objects during our childhood, and adults would tell us what these objects were. To learn what an apple was, we saw many apples throughout the years while hearing the word apple, until one day it clicked, and we knew what an apple was. In machine learning, that is what we get the computer to do. We show the computer many images, and we tell it which ones contain an apple (that constitutes our data). We repeat this process until the computer catches the right patterns and attributes that constitute an apple. At the end of the process, when we feed the computer a new image, it can use these patterns to determine whether the image contains an apple. Of course, we still need to program the computer so that it catches these patterns. For that, we have several techniques, which we will learn in this book.
And now that we’re at it, what is deep learning?
In the same way that machine learning is part of artificial intelligence, deep learning is a part of machine learning. In the previous section, we learned we have several techniques we use to get the computer to learn from data. One of these techniques has been performing tremendously well, so it has its own field of study called deep learning (DL), which we define as follows and as shown in figure 1.3:
deep learning The field of machine learning that uses certain objects called neural networks
What are neural networks? We’ll learn about them in chapter 10. Deep learning is arguably the most used type of machine learning because it works really well. If we are looking at any of the cutting-edge applications, such as image recognition, text generation, playing Go, or self-driving cars, very likely we are looking at deep learning in some way or another.
Figure 1.3 Deep learning is a part of machine learning.
In other words, deep learning is part of machine learning, which in turn is part of artificial intelligence. If this book were about transportation, then AI would be vehicles, ML would be cars, and DL would be Ferraris.
How do we get machines to make decisions with data? The remember-formulate-predict framework
In the previous section, we discussed that machine learning consists of a set of techniques that we use to get the computer to make decisions based on data. In this section, we learn what is meant by making decisions based on data and how some of these techniques work. For this, let’s again analyze the process humans use to make decisions based on experience. This is what is called the remember-formulate-predict framework, shown in figure 1.4. The goal of machine learning is to teach computers how to think in the same way, following the same framework.
How do humans think?
When we, as humans, need to make a decision based on our experience, we normally use the following framework:
We remember past situations that were similar.
We formulate a general rule.
We use this rule to predict what may happen in the future.
For example, if the question is, Will it rain today?,
the process to make a guess is the following:
We remember that last week it rained most of the time.
We formulate that in this place, it rains most of the time.
We predict that today it will rain.
We may be right or wrong, but at least we are trying to make the most accurate prediction we can based on the information we have.
Figure 1.4 The remember-formulate-predict framework is the main framework we use in this book. It consists of three steps: (1) We remember previous data; (2) we formulate a general rule; and (3) we use that rule to make predictions about the future.
Some machine learning lingo—models and algorithms
Before we delve into more examples that illustrate the techniques used in machine learning, let’s define some useful terms that we use throughout this book. We know that in machine learning, we get the computer to learn how to solve a problem using data. The way the computer solves the problem is by using the data to build a model. What is a model? We define a model as follows:
model A set of rules that represent our data and can be used to make predictions
We can think of a model as a representation of reality using a set of rules that mimic the existing data as closely as possible. In the rain example in the previous section, the model was our representation of reality, which is a world in which it rains most of the time. This is a simple world with one rule: it rains most of the time. This representation may or may not be accurate, but according to our data, it is the most accurate representation of reality that we can formulate. We later use this rule to make predictions on unseen data.
An algorithm is the process that we used to build the model. In the current example, the process is simple: we looked at how many days it rained and realized it was the majority. Of course, machine learning algorithms can get much more complicated than that, but at the end of the day, they are always composed of a set of steps. Our definition of algorithm follows:
algorithm A procedure, or a set of steps, used to solve a problem or perform a computation. In this book, the goal of an algorithm is to build a model.
In short, a model is what we use to make predictions, and an algorithm is what we use to build the model. Those two definitions are easy to confuse and are often interchanged, but to keep them clear, let’s look at a few examples.
Some examples of models that humans use
In this section we focus on a common application of machine learning: spam detection. In the following examples, we will detect spam and non-spam emails. Non-spam emails are also referred to as ham.
spam and ham spam is the common term used for junk or unwanted email, such as chain letters, promotions, and so on. The term comes from a 1972 Monty Python sketch in which every item in the menu of a restaurant contained Spam as an ingredient. Among software developers, the term ham is used to refer to non-spam emails.
Example 1: An annoying email friend
In this example, our friend Bob likes to send us email. A lot of his emails are spam, in the form of chain letters. We are starting to get a bit annoyed with him. It is Saturday, and we just got a notification of an email from Bob. Can we guess if this email is spam or ham without looking at it?
To figure this out, we use the remember-formulate-predict method. First, let us remember, say, the last 10 emails that we got from Bob. That is our data. We remember that six of them were spam, and the other four were ham. From this information, we can formulate the following model:
Model 1: Six out of every 10 emails that Bob sends us are spam.
This rule will be our model. Note, this rule does not need to be true. It could be outrageously wrong. But given our data, it is the best that we can come up with, so we’ll live with it. Later in this book, we learn how to evaluate models and improve them when needed.
Now that we have our rule, we can use it to predict whether the email is spam. If six out of 10 of Bob’s emails are spam, then we can assume that this new email is 60% likely to be spam and 40% likely to be ham. Judging by this rule, it’s a little safer to think that the email is spam. Therefore, we predict that the email is spam (figure 1.5).
Again, our prediction may be wrong. We may open the email and realize it is ham. But we have made the prediction to the best of our knowledge. This is what machine learning is all about.
You may be thinking, can we do better? We seem to be judging every email from Bob in the same way, but there may be more information that can help us tell the spam and ham emails apart. Let’s try to analyze the emails a little more. For example, let’s see when Bob sent the emails to see if we find a pattern.
Figure 1.5 A very simple machine learning model
Example 2: A seasonal annoying email friend
Let’s look more carefully at the emails that Bob sent us in the previous month. More specifically, we’ll look at what day he sent them. Here are the emails with dates and information about being spam or ham:
Monday: Ham
Tuesday: Ham
Saturday: Spam
Sunday: Spam
Sunday: Spam
Wednesday: Ham
Friday: Ham
Saturday: Spam
Tuesday: Ham
Thursday: Ham
Now things are different. Can you see a pattern? It seems that every email Bob sent during the week is ham, and every email he sent during the weekend is spam. This makes sense—maybe during the week he sends us work email, whereas during the weekend, he has time to send spam and decides to roam free. So, we can formulate a more educated rule, or model, as follows:
Model 2: Every email that Bob sends during the week is ham, and those he sends during the weekend are spam.
Now let’s look at what day it is today. If it is Sunday and we just got an email from Bob, then we can predict with great confidence that the email he sent is spam (figure 1.6). We make this prediction, and without looking, we send the email to the trash and carry on with our day.
Figure 1.6 A slightly more complex machine learning model
Example 3: Things are getting complicated!
Now, let’s say we continue with this rule, and one day we see Bob in the street, and he asks, Why didn’t you come to my birthday party?
We have no idea what he is talking about. It turns out last Sunday he sent us an invitation to his birthday party, and we missed it! Why did we miss it? Because he sent it on the weekend, and we assumed that it would be spam. It seems that we need a better model. Let’s go back to look at Bob’s emails—this is our remember step. Let’s see if we can find a pattern.
1 KB: Ham
2 KB: Ham
16 KB: Spam
20 KB: Spam
18 KB: Spam
3 KB: Ham
5 KB: Ham
25 KB: Spam
1 KB: Ham
3 KB: Ham
What do we see? It seems that the large emails tend to be spam, whereas the smaller ones tend to be ham. This makes sense, because the spam emails frequently have large attachments.
So, we can formulate the following rule:
Model 3: Any email of size 10 KB or larger is spam, and any email of size less than 10 KB is ham.
Now that we have formulated our rule, we can make a prediction. We look at the email we received today from Bob, and the size is 19 KB. So, we conclude that it is spam (figure 1.7).
Figure 1.7 Another slightly more complex machine learning model
Is this the end of the story? Not even close.
But before we keep going, notice that to make our predictions, we used the day of the week and the size of the email. These are examples of features. A feature is one of the most important concepts in this book.
feature Any property or characteristic of the data that the model can use to make predictions
You can imagine that there are many more features that could indicate if an email is spam or ham. Can you think of some more? In the next paragraphs, we’ll see a few more features.
Example 4: More?
Our two classifiers were good, because they rule out large emails and emails sent on