Machine Learning Systems: Designs that scale

Ebook481 pages7 hours

Machine Learning Systems: Designs that scale

Name: Machine Learning Systems: Designs that scale
Author: Jeffrey Smith
ISBN: 9781638355366

By Jeffrey Smith

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Summary

Machine Learning Systems: Designs that scale is an example-rich guide that teaches you how to implement reactive design solutions in your machine learning systems to make them as reliable as a well-built web app.

Foreword by Sean Owen, Director of Data Science, Cloudera

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Technology

If you’re building machine learning models to be used on a small scale, you don't need this book. But if you're a developer building a production-grade ML application that needs quick response times, reliability, and good user experience, this is the book for you. It collects principles and practices of machine learning systems that are dramatically easier to run and maintain, and that are reliably better for users.

About the Book

Machine Learning Systems: Designs that scale teaches you to design and implement production-ready ML systems. You'll learn the principles of reactive design as you build pipelines with Spark, create highly scalable services with Akka, and use powerful machine learning libraries like MLib on massive datasets. The examples use the Scala language, but the same ideas and tools work in Java, as well.

What's Inside

Working with Spark, MLlib, and Akka
Reactive design patterns
Monitoring and maintaining a large-scale system
Futures, actors, and supervision

About the Reader

Readers need intermediate skills in Java or Scala. No prior machine learning experience is assumed.

About the Author

Jeff Smith builds powerful machine learning systems. For the past decade, he has been working on building data science applications, teams, and companies as part of various teams in New York, San Francisco, and Hong Kong. He blogs (https: //medium.com/@jeffksmithjr), tweets (@jeffksmithjr), and speaks (www.jeffsmith.tech/speaking) about various aspects of building real-world machine learning systems.

Table of Contents

PART 1 - FUNDAMENTALS OF REACTIVE MACHINE LEARNING

Learning reactive machine learning
Using reactive tools

PART 2 - BUILDING A REACTIVE MACHINE LEARNING SYSTEM

Collecting data
Generating features
Learning models
Evaluating models
Publishing models
Responding

PART 3 - OPERATING A MACHINE LEARNING SYSTEM

Delivering
Evolving intelligence

Skip carousel

Computers

LanguageEnglish

PublisherManning

Release dateMay 21, 2018

ISBN9781638355366

Author

Jeffrey Smith

Jeffrey A. Smith has an undergraduate degree in religion, with a focus on the ancient world, from Dartmouth College (USA) and a master’s degree in history from the University of Birmingham (UK). He has taught humanities and ancient history at The Stony Brook School, a boarding school on the North Shore of Long Island, for the past decade.

Related to Machine Learning Systems

Related ebooks

Skip carousel

Machine Learning Engineering in Action
Ebook
Machine Learning Engineering in Action
byBen Wilson
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Structured Data
Ebook
Deep Learning with Structured Data
byMark Ryan
Rating: 0 out of 5 stars
0 ratings
Machine Learning Bookcamp: Build a portfolio of real-life projects
Ebook
Machine Learning Bookcamp: Build a portfolio of real-life projects
byAlexey Grigorev
Rating: 4 out of 5 stars
4/5
MLOps Engineering at Scale
Ebook
MLOps Engineering at Scale
byCarl Osipov
Rating: 0 out of 5 stars
0 ratings
Machine Learning with TensorFlow, Second Edition
Ebook
Machine Learning with TensorFlow, Second Edition
byChris Mattmann
Rating: 0 out of 5 stars
0 ratings
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Ebook
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
byAlok Kumar
Rating: 0 out of 5 stars
0 ratings
Graph-Powered Machine Learning
Ebook
Graph-Powered Machine Learning
byAlessandro Negro
Rating: 0 out of 5 stars
0 ratings
Graph Databases in Action: Examples in Gremlin
Ebook
Graph Databases in Action: Examples in Gremlin
byJosh Perryman
Rating: 0 out of 5 stars
0 ratings
Applied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1
Ebook
Applied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1
byrayaan
Rating: 0 out of 5 stars
0 ratings
Introducing Data Science: Big data, machine learning, and more, using Python tools
Ebook
Introducing Data Science: Big data, machine learning, and more, using Python tools
byDavy Cielen
Rating: 5 out of 5 stars
5/5
Grokking Machine Learning
Ebook
Grokking Machine Learning
byLuis Serrano
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Feature Engineering Bookcamp
Ebook
Feature Engineering Bookcamp
bySinan Ozdemir
Rating: 0 out of 5 stars
0 ratings
Effective Data Science Infrastructure: How to make data scientists productive
Ebook
Effective Data Science Infrastructure: How to make data scientists productive
byVille Tuulos
Rating: 0 out of 5 stars
0 ratings
Streaming Data: Understanding the real-time pipeline
Ebook
Streaming Data: Understanding the real-time pipeline
byAndrew Psaltis
Rating: 0 out of 5 stars
0 ratings
Re-Engineering Legacy Software
Ebook
Re-Engineering Legacy Software
byChris Birchall
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
Ebook
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
bySuhas Pote
Rating: 0 out of 5 stars
0 ratings
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
Ebook
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
byJohn Wolohan
Rating: 0 out of 5 stars
0 ratings
Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability
Ebook
Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability
byBeate Sick
Rating: 0 out of 5 stars
0 ratings
Real-World Machine Learning
Ebook
Real-World Machine Learning
byHenrik Brink
Rating: 0 out of 5 stars
0 ratings
Deep Reinforcement Learning in Action
Ebook
Deep Reinforcement Learning in Action
byBrandon Brown
Rating: 4 out of 5 stars
4/5
Natural Language Processing in Action: Understanding, analyzing, and generating text with Python
Ebook
Natural Language Processing in Action: Understanding, analyzing, and generating text with Python
byHannes Hapke
Rating: 0 out of 5 stars
0 ratings
Practical Recommender Systems
Ebook
Practical Recommender Systems
byKim Falk
Rating: 5 out of 5 stars
5/5
Inside Deep Learning: Math, Algorithms, Models
Ebook
Inside Deep Learning: Math, Algorithms, Models
byEdward Raff
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Search
Ebook
Deep Learning for Search
byTommaso Teofili
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Vision Systems
Ebook
Deep Learning for Vision Systems
byMohamed Elgendy
Rating: 5 out of 5 stars
5/5
Spark GraphX in Action
Ebook
Spark GraphX in Action
byMichael Malak
Rating: 0 out of 5 stars
0 ratings
Grokking Deep Reinforcement Learning
Ebook
Grokking Deep Reinforcement Learning
byMiguel Morales
Rating: 5 out of 5 stars
5/5
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
byIvan Vasilev
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 5 out of 5 stars
5/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
Ebook
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
byJohannes Wild
Rating: 0 out of 5 stars
0 ratings
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
Ebook
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
byJoe Shelley
Rating: 5 out of 5 stars
5/5
Uncanny Valley: A Memoir
Ebook
Uncanny Valley: A Memoir
byAnna Wiener
Rating: 4 out of 5 stars
4/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 4 out of 5 stars
4/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
Ebook
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
byCory Althoff
Rating: 0 out of 5 stars
0 ratings
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
Ebook
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
byChris Mason
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Computer Science I Essentials
Ebook
Computer Science I Essentials
byRandall Raus
Rating: 5 out of 5 stars
5/5
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
Ebook
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
byMichael Lopp
Rating: 4 out of 5 stars
4/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
UNLIMITED
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
UNLIMITED
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
Data Mechanics: Data Engineering with Jean-Yves Stephan: Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud.
UNLIMITED
Data Mechanics: Data Engineering with Jean-Yves Stephan: Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
S1:E1 "The Beginning"
UNLIMITED
S1:E1 "The Beginning"
byData Science Now
0 ratings
0% found this document useful
046 jsAir - React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis: React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis Description: JavaScript is taking the software world by storm, and we're going to talk about yet another enabling technology: React Native. Show sponsors:Egghead.io - Bite-size...
UNLIMITED
046 jsAir - React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis: React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis Description: JavaScript is taking the software world by storm, and we're going to talk about yet another enabling technology: React Native. Show sponsors:Egghead.io - Bite-size...
byJavaScript Air
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
UNLIMITED
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
UNLIMITED
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
UNLIMITED
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
byMaintainable
0 ratings
0% found this document useful
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
UNLIMITED
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
byScreaming in the Cloud
0 ratings
0% found this document useful
#58 - Uncle Bob Martin // The Clean Coder Behind Test Driven Development, SOLID Principles and the Agile Manifesto
UNLIMITED
#58 - Uncle Bob Martin // The Clean Coder Behind Test Driven Development, SOLID Principles and the Agile Manifesto
byalphalist.CTO Podcast - For CTOs and Technical Leaders
0 ratings
0% found this document useful
Yugabyte and Database Innovations with Karthik Ranganathan: This week Corey is joined by Karthik Ranganathan, CTO and Co-Founder of Yugabyte, to talk about databases of which YugabyteDB is one of the best. Karthik started at Facebook building distributed databases and now has moved onto building even more! Why? We
UNLIMITED
Yugabyte and Database Innovations with Karthik Ranganathan: This week Corey is joined by Karthik Ranganathan, CTO and Co-Founder of Yugabyte, to talk about databases of which YugabyteDB is one of the best. Karthik started at Facebook building distributed databases and now has moved onto building even more! Why? We
byScreaming in the Cloud
0 ratings
0% found this document useful
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
UNLIMITED
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
UNLIMITED
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
byScreaming in the Cloud
0 ratings
0% found this document useful
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
UNLIMITED
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
UNLIMITED
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
byDataFramed
0 ratings
0% found this document useful
Fluent Python revisited: Ahead of the release of the second edition of his landmark book, Fluent Python, our team catch up with author Luciano Ramalho to hear about what’s happening in the world of Python — and why it’s popularity continues to endure.
UNLIMITED
Fluent Python revisited: Ahead of the release of the second edition of his landmark book, Fluent Python, our team catch up with author Luciano Ramalho to hear about what’s happening in the world of Python — and why it’s popularity continues to endure.
byThoughtworks Technology Podcast
0 ratings
0% found this document useful
[AI is Here] Unlocking NLP's Potential in Banking - with Christophe Makni of Migros Bank: Today’s guest is Christophe Makni, Head of Business Operations at Migros Bank. Christophe shares a few key insights in this episode, starting with where natural language processing is finding a fit in banking today and the real deployments in the...
UNLIMITED
[AI is Here] Unlocking NLP's Potential in Banking - with Christophe Makni of Migros Bank: Today’s guest is Christophe Makni, Head of Business Operations at Migros Bank. Christophe shares a few key insights in this episode, starting with where natural language processing is finding a fit in banking today and the real deployments in the...
byThe AI in Business Podcast
0 ratings
0% found this document useful
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
UNLIMITED
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
byThe Python Podcast.__init__
0 ratings
0% found this document useful
The Past, Present, and Future of Deep Learning In PyTorch: An interview with the creator of the popular PyTorch deep learning framework
UNLIMITED
The Past, Present, and Future of Deep Learning In PyTorch: An interview with the creator of the popular PyTorch deep learning framework
byThe Python Podcast.__init__
0 ratings
0% found this document useful
MLA 020 Kubeflow: Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker) - Data Scientist at Dept Agency . (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of...
UNLIMITED
MLA 020 Kubeflow: Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker) - Data Scientist at Dept Agency . (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of...
byMachine Learning Guide
0 ratings
0% found this document useful
Visualization and Interpretability: joins us to discuss how data visualization can be used to help make machine learning more interpretable and explainable. Find out more about Enrico at . More from Enrico with co-host Moritz Stefaner on the podcast!
UNLIMITED
Visualization and Interpretability: joins us to discuss how data visualization can be used to help make machine learning more interpretable and explainable. Find out more about Enrico at . More from Enrico with co-host Moritz Stefaner on the podcast!
byData Skeptic
0 ratings
0% found this document useful
Working with Code: How Does a Coder at NASA Do His Job?
UNLIMITED
Working with Code: How Does a Coder at NASA Do His Job?
byWorking
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
UNLIMITED
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
UNLIMITED
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
UNLIMITED
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
LLMs, Retrieval Augmented Generation, Knowledge Graph, Vector Databases with Mike Dillinger: <p>RAG, Retrieval Augemented Generation, is the term you now constantly hear in conjunction with LLM that provides context. But how does it actually work? And what's the relationship with Vector Databases and Knowledge Graphs? This will be a geeky AI e...
UNLIMITED
LLMs, Retrieval Augmented Generation, Knowledge Graph, Vector Databases with Mike Dillinger: <p>RAG, Retrieval Augemented Generation, is the term you now constantly hear in conjunction with LLM that provides context. But how does it actually work? And what's the relationship with Vector Databases and Knowledge Graphs? This will be a geeky AI e...
byCatalog & Cocktails: The Honest, No-BS Data Podcast
0 ratings
0% found this document useful
Advantages of Completing Small Python Projects
UNLIMITED
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
UNLIMITED
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Open Source Reverse ETL For Everyone With Grouparoo: An interview with Brian Leonard about the open source reverse ETL framework Grouparoo and how you can start using it today.
UNLIMITED
Open Source Reverse ETL For Everyone With Grouparoo: An interview with Brian Leonard about the open source reverse ETL framework Grouparoo and how you can start using it today.
byData Engineering Podcast
0 ratings
0% found this document useful
Taming Distributed Architecture with Caitie McCaffrey: Distributed systems programming will always be a world of tradeoffs -- there is no silver bullet in the future. But life can be made easier with tactics such as the actor pattern and the use of conflict-free replicated data types (CRDTs). -
UNLIMITED
Taming Distributed Architecture with Caitie McCaffrey: Distributed systems programming will always be a world of tradeoffs -- there is no silver bullet in the future. But life can be made easier with tactics such as the actor pattern and the use of conflict-free replicated data types (CRDTs). -
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful

Skip carousel

The Future Of The Database
Linux Format
UNLIMITED
The Future Of The Database
Aug 27, 2019
7 min read
Usability
Linux Format
UNLIMITED
Usability
Oct 19, 2021
3 min read
How Image Recognition Works
APC
UNLIMITED
How Image Recognition Works
Nov 4, 2019
4 min read
An Introduction To Rabbitmq
Linux Format
UNLIMITED
An Introduction To Rabbitmq
Jun 29, 2021
RabbitMQ is a Message Broker, which means that it can safely hold messages generated by applications and make them available to other applications. The main advantages are reliability, support for clustering and high-availability queues, tracing capa
1 min read
The Fundamental Limits of Machine Learning
Nautilus
UNLIMITED
The Fundamental Limits of Machine Learning
Sep 20, 2016
5 min read
Create A RESTful Server In Go
Linux Format
UNLIMITED
Create A RESTful Server In Go
Oct 19, 2021
8 min read
DJANGO Create A Database-driven Website
Linux Format
UNLIMITED
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
A.I.-POWERED RASPBERRY Pi
Linux Format
UNLIMITED
A.I.-POWERED RASPBERRY Pi
Sep 19, 2023
1 min read
The Coders Programming Themselves Out of a Job
The Atlantic
UNLIMITED
The Coders Programming Themselves Out of a Job
Oct 2, 2018
8 min read
Don’t Be Misled by GPT-4’s Gift of Gab
The Atlantic
UNLIMITED
Don’t Be Misled by GPT-4’s Gift of Gab
Mar 15, 2023
4 min read
Deep Learning Technique for Object Detection
Techfastly
UNLIMITED
Deep Learning Technique for Object Detection
Jun 1, 2021
3 min read
Ice Cold With Kali
Linux Format
UNLIMITED
Ice Cold With Kali
May 2, 2023
3 min read
Metrics & Visuals In Go
Linux Format
UNLIMITED
Metrics & Visuals In Go
Nov 17, 2020
Mihalis Tsoukalos is a DataOps engineer and a technical writer. He’s the author of Go Systems Programming and Mastering Go, 2nd edition. The subject of this tutorial is two-fold. First, it’s about creating a Go application that exports metrics to P
7 min read
Tensor Flow 101
APC
UNLIMITED
Tensor Flow 101
Jan 27, 2020
4 min read
What Is The Future Of Game Streaming Now That Stadia Is Dead?
APC
UNLIMITED
What Is The Future Of Game Streaming Now That Stadia Is Dead?
Oct 31, 2022
Once hyped as being ‘the future of gaming’, the Google Stadia game streaming service was officially, just three years after launch and before even making it to Australian shores. When game streaming first launched we did have some apprehension about
2 min read
Learn How the Right Payment Processor Can Drive More Sales
Entrepreneur
UNLIMITED
Learn How the Right Payment Processor Can Drive More Sales
Jul 1, 2018
2 min read
Introduction to eBPF Revolutionizing Linux Kernel Technology
Techfastly
UNLIMITED
Introduction to eBPF Revolutionizing Linux Kernel Technology
Apr 1, 2022
6 min read
Machine Learning in Business: Issues for Society
Rotman Management
UNLIMITED
Machine Learning in Business: Issues for Society
Jan 1, 2020
11 min read
Programmers: Stop Calling Yourselves Engineers
The Atlantic
UNLIMITED
Programmers: Stop Calling Yourselves Engineers
Nov 5, 2015
10 min read
Mucking About With AI
APC
UNLIMITED
Mucking About With AI
May 22, 2023
2 min read
An easy-to-Understand Overview of Popular extended BPF Tools: BCC, Falco, and More
Techfastly
UNLIMITED
An easy-to-Understand Overview of Popular extended BPF Tools: BCC, Falco, and More
Apr 1, 2022
7 min read
Zulip Economy
Linux Format
UNLIMITED
Zulip Economy
Oct 20, 2020
10 min read
Generative AI: What Leaders Need To Know
Rotman Management
UNLIMITED
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
A.i. Coding
Linux Format
UNLIMITED
A.i. Coding
Aug 22, 2023
16 min read
Picture In A Mainframe
Linux Format
UNLIMITED
Picture In A Mainframe
Jul 2, 2019
11 min read
In Conversation with Surbhi Rathore
Techfastly
UNLIMITED
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
2 The Use of Python in AI and ML
Techfastly
UNLIMITED
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Upgrade Your Marketing With Machine Learning
Fast Company
UNLIMITED
Upgrade Your Marketing With Machine Learning
Sep 9, 2019
2 min read
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
TechLife News
UNLIMITED
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
Dec 16, 2023
4 min read
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
AppleMagazine
UNLIMITED
AWS Chief Adam Selipsky Talks Generative AI, Amazon’s Investment In Anthropic And Cloud Cost Cutting
Dec 15, 2023
4 min read

Related categories

Skip carousel

Reviews for Machine Learning Systems

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Machine Learning Systems - Jeffrey Smith

Copyright

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact

Special Sales Department

Manning Publications Co.

20 Baldwin Road

PO Box 761

Shelter Island, NY 11964

Email:

[email protected]

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

Development editor: Susanna Kline

Review editor: Aleksandar Dragosavljević

Technical development editor: Kostas Passadis

Project editor: Tiffany Taylor

Copyeditor: Corbin Collins

Proofreader: Katie Tennant

Technical proofreader: Jerry Kuch

Typesetter: Gordan Salinovic

Cover designer: Marija Tudor

ISBN 9781617293337

Printed in the United States of America

1 2 3 4 5 6 7 8 9 10 – EBM – 23 22 21 20 19 18

Brief Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this book

About the author

About the cover illustration

1. Fundamentals of reactive machine learning

Chapter 1. Learning reactive machine learning

Chapter 2. Using reactive tools

2. Building a reactive machine learning system

Chapter 3. Collecting data

Chapter 4. Generating features

Chapter 5. Learning models

Chapter 6. Evaluating models

Chapter 7. Publishing models

Chapter 8. Responding

3. Operating a machine learning system

Chapter 9. Delivering

Chapter 10. Evolving intelligence

Getting set up

A reactive machine learning system

Phases of machine learning

Index

List of Figures

List of Tables

List of Listings

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this book

About the author

About the cover illustration

1. Fundamentals of reactive machine learning

Chapter 1. Learning reactive machine learning

1.1. An example machine learning system

1.1.1. Building a prototype system

1.1.2. Building a better system

1.2. Reactive machine learning

1.2.1. Machine learning

1.2.2. Reactive systems

1.2.3. Making machine learning systems reactive

1.2.4. When not to use reactive machine learning

Summary

Chapter 2. Using reactive tools

2.1. Scala, a reactive language

2.1.1. Reacting to uncertainty in Scala

2.1.2. The uncertainty of time

2.2. Akka, a reactive toolkit

2.2.1. The actor model

2.2.2. Ensuring resilience with Akka

2.3. Spark, a reactive big data framework

Summary

2. Building a reactive machine learning system

Chapter 3. Collecting data

3.1. Sensing uncertain data

3.2. Collecting data at scale

3.2.1. Maintaining state in a distributed system

3.2.2. Understanding data collection

3.3. Persisting data

3.3.1. Elastic and resilient databases

3.3.2. Fact databases

3.3.3. Querying persisted facts

3.3.4. Understanding distributed-fact databases

3.4. Applications

3.5. Reactivities

Summary

Chapter 4. Generating features

4.1. Spark ML

4.2. Extracting features

4.3. Transforming features

4.3.1. Common feature transforms

4.3.2. Transforming concepts

4.4. Selecting features

4.5. Structuring feature code

4.5.1. Feature generators

4.5.2. Feature set composition

4.6. Applications

4.7. Reactivities

Summary

Chapter 5. Learning models

5.1. Implementing learning algorithms

5.1.1. Bayesian modeling

5.1.2. Implementing Naive Bayes

5.2. Using MLlib

5.2.1. Building an ML pipeline

5.2.2. Evolving modeling techniques

5.3. Building facades

5.3.1. Learning artistic style

5.4. Reactivities

Summary

Chapter 6. Evaluating models

6.1. Detecting fraud

6.2. Holding out data

6.3. Model metrics

6.4. Testing models

6.5. Data leakage

6.6. Recording provenance

6.7. Reactivities

Summary

Chapter 7. Publishing models

7.1. The uncertainty of farming

7.2. Persisting models

7.3. Serving models

7.3.1. Microservices

7.3.2. Akka HTTP

7.4. Containerizing applications

7.5. Reactivities

Summary

Chapter 8. Responding

8.1. Moving at the speed of turtles

8.2. Building services with tasks

8.3. Predicting traffic

8.4. Handling failure

8.5. Architecting response systems

8.6. Reactivities

Summary

3. Operating a machine learning system

Chapter 9. Delivering

9.1. Shipping fruit

9.2. Building and packaging

9.3. Build pipelines

9.4. Evaluating models

9.5. Deploying

9.6. Reactivities

Summary

Chapter 10. Evolving intelligence

10.1. Chatting

10.2. Artificial intelligence

10.3. Reflex agents

10.4. Intelligent agents

10.5. Learning agents

10.6. Reactive learning agents

10.6.1. Reactive principles

10.6.2. Reactive strategies

10.6.3. Reactive machine learning

10.7. Reactivities

10.7.1. Libraries

10.7.2. System data

10.8. Reactive explorations

10.8.1. Users

10.8.2. System dimensions

10.8.3. Applying reactive principles

Summary

Getting set up

Scala

Git code repository

sbt

Spark

Couchbase

Docker

A reactive machine learning system

Phases of machine learning

Index

List of Figures

List of Tables

List of Listings

Foreword

Today’s data scientists and software engineers are spoiled for choice when looking for tools to build machine learning systems. They have a range of new technologies that make it easier than ever to build entire machine learning systems. Considering where we—the machine learning community—started, it’s exciting to see a book that explores how powerful and approachable the current technologies are.

To better understand how we got here, I’d like to share a bit of my own story. They tell me I’m a data scientist, but I think I’m only here by accident. I began as a software person and grew up on Java 1.3 and EJB. I left the software-engineer role at Google a decade ago, although I dabbled in open source and created a recommender system that went on to be part of Apache Mahout in 2009. Its goal was to implement machine learning algorithms on the then-new Apache Hadoop MapReduce framework. The engineering parts were familiar—MapReduce came from Google, after all. The machine learning was new and exciting, but the tools were lacking.

Not knowing any better, and with no formal background in ML, I tried to help build ML at scale. In theory, this was going to open an era of better ML, because more data generally means better models. ML just needed tooling rebuilt on the nascent distributed computing platforms like Hadoop.

Mahout (0.x) was what you’d expect when developers with a lot of engineering background and a little stats background try to build ML tools: JVM-based, modular, scalable, complex, developer-oriented, baroque, and sometimes eccentric in its interpretation of stats concepts. In retrospect, classic Mahout wasn’t interesting because it was a better version of stats tooling. In truth, it was much less usable than, say, R (which I admit having never heard of until 2010). Mahout was interesting, because it was built from the beginning to work at web scale, using tooling developed for enterprise software engineering. The collision of stats tooling with new approaches to handling web-scale data gave birth to what became known as data science.

The more I back-filled my missing context about how real statisticians and analysts had been successfully applying ML for decades, thank you very much, the more I realized that the existing world of analytics tooling optimizes for some usages and not others. Python, R, and their ecosystems have rich analytics libraries and visualization tools. They’re not as concerned with issues of scale or production deployment.

Coming from an enterprise software world, I was somewhat surprised that the tooling generally ended at building a model. What about doing something with the model in production? I found this was usually viewed as a separate activity for software engineers to undertake. The engineering community hadn’t settled on clear patterns for product application around Hadoop-related technologies.

In 2012, I spun out a small company, Myrrix, to expand on the core premise of Mahout and make it into a continuously learning, updating service with the ability to serve results from the model in production—not just a library that output coefficients. This became part of Cloudera and was reimagined again, on top of Apache Spark, as Oryx (https://github.com/OryxProject/oryx).

Spark was another game changer for the Hadoop ecosystem. It brought a higher-level, natural functional paradigm to big data software development, more like you’d encounter in Python. It added language bindings to Python and R. It brought a new machine learning library, Spark MLlib. By 2015, the big data ecosystem at large was suddenly much closer to the world of conventional analytics tools.

These and other tools have bridged the worlds of stats and software engineering such that the two now interact regularly. Today’s big data engineer has ready access to Python-only tooling like TensorFlow for deep learning and Seaborn for visualization. The software-engineering culture of version control and testing and strongly typed languages has flowed into the data science community, too.

That brings us back to this book. It doesn’t cover just tools but also the entire job of building a machine learning system. It gets into topics that people used to gloss over, like model serialization and building model servers. The language of the book is primarily Scala, a unique language that is both principled and expressive without sacrificing conveniences like type inference. Scala has been used to build powerful technologies like Spark and Akka, which the book shows you how to use to build machine learning systems. The book also doesn’t ignore the importance of interoperability with Python technologies or portable application builds with Docker.

We’ve come a long way, and there’s farther to go. The person who can master the tools and techniques in this book will be well prepared to play a role in machine learning’s even more exciting future.

SEAN OWEN

DIRECTOR OF DATA SCIENCE, CLOUDERA

Preface

I’ve been working with data for my entire professional career. Following my interests, I’ve worked on ever-more-analytically sophisticated systems as my career has progressed, leading to a focus on machine learning and artificial intelligence systems.

As my work content evolved from more traditional data-warehousing sorts of tasks to building machine learning systems, I was struck by a strange absence. When I was working primarily with databases, I could rely on the rich body of academic and professional literature about how to build databases and applications that interact with them to help me define what a good design was. So, I was confused and surprised to find that machine learning as a field generally lacked this sort of guidance. There were no canonical implementations of anything other than the model learning algorithms. Huge chunks of the system that needed to be built were largely glossed over in the literature. Often, I couldn’t even find a consistent name for a given system component, so my colleagues and I inevitably confused each other with our choices of terminology.

What I wanted was a framework, something like a Ruby on Rails for machine learning, but no such framework seemed to exist.[¹] Barring a commonly accepted framework, I wanted at least some clear design patterns for how to build machine learning systems; but alas, there was no Design Patterns for Machine Learning Systems to be found, either.

Eventually, I came across Sean Owen’s work on Oryx and Simon Chan’s on PredictionIO, which were super-instructive. If you’re interested in the background of machine learning architectures, you’ll benefit from reviewing them both.

So, I built machine learning systems the hard way: by trying things and figuring out what didn’t work. When I needed to invent terminology, I just picked reasonable terms. Over time, I tried to synthesize some of my learnings about what worked for machine learning system design and what didn’t into a coherent whole. Fields like distributed systems and functional programming offered the promise of adding coherence to my views about machine learning systems, but neither was particularly focused on application to machine learning.

Then, I discovered reactive systems design, via reading the Reactive Manifesto (www.reactivemanifesto.org). It was startling in its simple coherence and bold mission. Here was a complete world view of what the challenge of building modern software applications was and a principled way of building applications that met that challenge. I was excited by the promise of the approach and immediately began attempting to apply it to the problems I’d seen in architecting and building machine learning systems.

Poop prediction

This inquiry led me to poop—specifically, to dog poop. I tried to imagine how a naive machine learning system could be refactored into something much better, using the tools from reactive systems design. To do this, I wrote a blog post about a dog poop prediction startup (http://mng.bz/9YK8; see figure).

The post got a surprisingly large and serious response from a wide range of people. I learned two things from that response:

I wasn’t the only one interested in coming up with a principled approach to building machine learning systems.

People really enjoyed talking about machine learning in terms of cartoon animals.

Those insights led to the book you’re reading. In this book, I try to cover a range of issues you’re likely to encounter in building real-world machine learning systems that have to keep customers happy. My focus is on all the stuff you won’t find in other books. I’ve tried to make the book as broad as possible, in the hopes of covering the full responsibilities of the modern data scientist or engineer. I explore how to use general principles and techniques to break down the seemingly unique problems of a given component of a machine learning system. My goal is to be as comprehensive as possible in my coverage of machine learning system components, but that means I can’t be comprehensive on huge topics like model learning algorithms and distributed systems. Instead, I’ve designed examples that provide you with experience building various components of a machine learning system.

I firmly believe that to build a truly powerful machine learning system, you must take a system-level view of the problem. In this book, I provide that high-level perspective and then help you build skills around each of the key components in that system. I learned through my experience as a technical lead and manager that understanding the entire machine learning system and the composition of its components is one of the most important skills a developer of machine learning systems can have. So, the book tries to cover all the different pieces it takes to build up a powerful, real-world machine learning system. Throughout, we’ll take the perspective of teams shipping sophisticated machine learning systems for live users. So, we’ll explore how to build everything in a machine learning system. It’s a big job, and I’m excited that you’re interested in taking it on.

Acknowledgments

A book is the opposite of an academic paper when it comes to attribution. In an academic paper, everyone who ever even grabbed lunch at the lab can get their name on the paper; but in a book, for some reason, we only put one or two names on the cover. But it’s not that simple to pull a book together; lots of people are involved. Here are all the people who made this book happen.

As I mentioned in the preface, the book grew out of (believe it or not) a blog post about dog poop (http://mng.bz/9YK8). I’m immensely grateful to the serious and accomplished people who took my cartoons about dog poop seriously enough to provide useful feedback: Roland Kuhn, Simon Chan, and Sean Owen.

In the early days of the book, the members of the reactive study group and the data team at Intent Media were invaluable in helping me understand where I was trying to take these ideas about building machine learning systems. I’m also indebted to Chelsea Alburger from Intent Media, who provided great early art direction for the book’s visuals.

Thanks go to the team at Manning who took my original ideas and helped them become a book: Frank Pöhlmann, who suggested that there might be a book in this reactive machine learning stuff; Susanna Kline, who dragged me kicking and screaming through the dark forest; Kostas Passadis, who kept me from looking like a complete fool; and Marjan Bace, who green-lit the whole mad endeavor. I also want to thank the technical peer reviewers, led by Aleksandar Dragosavljevic: David Andrzejewski, Jose Carlos Estefania Aulet, Óscar Belmonte-Fernández, Tony M. Dubitsky, Vipul Gupta, Jason Hales, Massimo Ilario, Shobha Iyer, Shanker Janakiraman, Jon Lehto, Anuja Kelkar, Alexander Myltsev, Tommy O’Dell, Jean Safar, José San Leandro, Jeff Smith, Chris Snow, Ian Stirk, Fabien Tison, Jeremy Townson, Joseph Wang, and Jonathan Woodard.

Once the book really got rolling, the team at x.ai were immensely helpful in providing a test lab for various ideas and supporting me as I took the book’s ideas on the road in the form of talks. I thank you, Dennis Mortensen, Alex Poon, and everyone on the tech team.

Also, thanks go to anyone who came out to hear one of the talks associated with the book at conferences and meetups. All the feedback provided, in person and online, was instrumental to helping me understand how the material was evolving.

Finally, I thank my illustrator, yifan, without whom the book wouldn’t have been possible. You’ve brought to life my vision of cartoon animals who do machine learning, and now I’m excited to be able to share it with the world.

P.S. Thanks to my muse: nom nom, the data dog. Who’s a good little machine learner? You are!

About this book

This book serves two slightly different audiences. First, it serves software engineers who are interested in machine learning but haven’t built many real-world machine learning systems. I presume such readers want to put their skills into practice by actually building something with machine learning. The book is different from other books you may have picked up on machine learning. In it, you’ll find techniques applicable to building whole production-grade systems, not just naive scripts. We’ll explore the entire range of possible components you might need to implement in a machine learning system, with lots of hard-won tips about common design pitfalls. Along the way, you’ll learn about the various jobs of a machine learning system, in the context of implementing systems that fulfill those needs. So, if you don’t have a lot of background in machine learning, don’t worry that you’ll have to wade through pages of math before you get to build things. The book will have you coding all the way through, often relying on libraries to handle the more complex implementation concerns like model learning algorithms and distributed data processing.

Second, this book serves data scientists who are interested in the bigger picture of machine learning systems. I presume that such readers know the concepts of machine learning but may only have implemented simple machine learning functionality (for example, scripts over files on a laptop). For such readers, the book may introduce you to a range of concerns that you’ve never before considered part of the work of machine learning. In places, I’ll introduce vocabulary to name components of a system that are often neglected in academic machine learning discussions, and then I’ll show you how to implement them. Although the book does get into some powerful programming techniques, I don’t presume that you have deep experience in software engineering, and I’ll introduce all concepts beyond the very basic, in context.

For either type of reader, I assume that you have some interest in reactive systems and how this approach can be used to build better machine learning systems. The reactive perspective on system design underpins every part of the book, so you’ll spend a lot of time examining the properties your system has or doesn’t have, often presuming that real-world problems like server outages and network partitions will occur in your system.

Concretely, this focus on reactive systems means the book contains a fair bit of material on distributed systems and functional programming. The goal of unifying these concerns with the task of building machine learning systems is to give you tools to solve some of the hardest problems in technology today. Again, if you don’t have a background in distributed systems or functional programming, don’t worry: I’ll introduce this material in context with the appropriate motivation. Once you see tools like Scala, Spark, and Akka in action, I hope it will become clear to you how helpful they can be in solving real-world machine learning problems.

How this book is organized

This book is organized into three parts. Part 1 introduces the overall motivation of the book and some of the tools you’ll use:

Chapter 1 introduces machine learning, reactive systems, and the goals of reactive machine learning.

Chapter 2 introduces three of the technologies the book uses: Scala, Spark, and Akka.

Part 2 forms the bulk of the book. It proceeds component by component, helping you to deeply understand all the things a machine learning system must do, and how you can do them better using reactive techniques:

Chapter 3 discusses the challenges of collecting data and ingesting it into a machine learning system. As part of that, it introduces various concepts around handling uncertain data. It also goes into detail about how to persist data, focusing on properties of distributed databases.

Chapter 4 gets into how you can extract features from raw data and the various ways in which you can compose this functionality.

Chapter 5 covers model learning. You’ll implement your own model learning algorithms and use library implementations. It also covers how to work with model learning algorithms from other languages.

Chapter 6 covers a range of concerns related to evaluating models once they’ve been learned.

Chapter 7 shows how to take learned models and make them available for use. In the service of this goal, this chapter introduces Akka HTTP, microservices, and containerization via Docker.

Chapter 8 is all about using machine learned models to act on the real world. It also introduces an alternative to Akka HTTP for building services: http4s.

Finally, part 3 introduces a few more concerns that become relevant once you’ve built a machine learning system and need to keep it running and evolve it into something better:

Chapter 9 shows how to build Scala applications using SBT. It also introduces concepts from continuous delivery.

Chapter 10 shows how to build artificially intelligent agents of various levels of complexity as an example of system evolution. It also covers more techniques for analyzing the reactive properties of a machine learning system.

How should you read this book? If you have good experience in Scala, Spark, and Akka, then you might skip chapter 2. The heart of the book is the journey through the various system components in part 2. Although they’re meant to stand alone as much as possible, it will probably be easiest to follow the flow of the data through the system if you proceed in order from chapter 3 through chapter 8. The final two chapters are separate concerns and can be read in any order (after you’ve read part 2).

Code conventions and downloads

This book contains many examples of source code, both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text.

In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers ( ). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.

The code used in the book can be found on the book’s website, www.manning.com/books/machine-learning-systems, and in this Git repository: http://github.com/jeffreyksmithjr/reactive-machine-learning-systems.

Book forum

Purchase of Machine Learning Systems includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to https://forums.manning.com/forums/machine-learning-systems. You can also learn more about Manning’s forums and the rules of conduct at https://forums.manning.com/forums/about.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking him some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

Other online resources

For more information about Scala and pointers to various resources on how to learn the language, the language website

Enjoying the preview?

Page 1 of 1

Machine Learning Systems: Designs that scale

About this ebook

Jeffrey Smith

Read more from Jeffrey Smith

Shadow Knights: The Secret War Against Hitler

The Corinthian War, 395–387 BC: The Twilight of Sparta's Empire

Related authors

Related to Machine Learning Systems

Related ebooks

Machine Learning Engineering in Action

Deep Learning with Structured Data

Machine Learning Bookcamp: Build a portfolio of real-life projects

MLOps Engineering at Scale

Machine Learning with TensorFlow, Second Edition

Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions

Graph-Powered Machine Learning

Graph Databases in Action: Examples in Gremlin

Applied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1

Introducing Data Science: Big data, machine learning, and more, using Python tools

Grokking Machine Learning

Learning Data Mining with Python

Feature Engineering Bookcamp

Effective Data Science Infrastructure: How to make data scientists productive

Streaming Data: Understanding the real-time pipeline

Re-Engineering Legacy Software

Mastering Scala Machine Learning

Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)

Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code

Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability

Real-World Machine Learning

Deep Reinforcement Learning in Action

Natural Language Processing in Action: Understanding, analyzing, and generating text with Python

Practical Recommender Systems

Inside Deep Learning: Math, Algorithms, Models

Deep Learning for Search

Deep Learning for Vision Systems

Spark GraphX in Action

Grokking Deep Reinforcement Learning

Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch

Computers For You

The Invisible Rainbow: A History of Electricity and Life

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind

Elon Musk

The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!

Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad

CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide

Uncanny Valley: A Memoir

Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition

Deep Search: How to Explore the Internet More Effectively

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology

Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls

The Hacker Crackdown: Law and Disorder on the Electronic Frontier

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)

How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)

CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61

Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence

The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters

How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally

The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet

Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Computer Science I Essentials

Managing Humans: Biting and Humorous Tales of a Software Engineering Manager

The Professional Voiceover Handbook: Voiceover training, #1

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning Systems

What did you think?

Book preview

Machine Learning Systems - Jeffrey Smith

Copyright

Brief Table of Contents

Table of Contents

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters