From the course: AWS Certified Machine Learning - Specialty (MLS-C01) Cert Prep: 1 Data Engineering

Overview

- [Noah] Welcome to the AWS Certified Machine Learning Specialty Exam, MLS-C01 for 2023. I'm going to talk about this course and all of the key learnings that will occur. My name is Noah Gift and my background is I'm a prolific O'Reilly author. I've written a written multiple books over the years including recently "Python for DevOps" that talks about how to do DevOps. I've also recently written "Developing on AWS with C#" and this covers how to build solutions using the .NET framework. I've also written two books on machine learning operations, "Practical MLOps" and also "Implementing MLOps." And I am the author of a podcast called "52 Weeks of Cloud." So I've got a lot of background with cloud computing and machine learning, and I've also written successful courses for Coursera and been an author for many different materials around data science and machine learning for programs like Duke MIDS, the Masters in Interdisciplinary Data Science, and UC Davis Graduate School of Management. So I've been teaching machine learning for quite some time and I'm also an AWS Machine Learning Hero. So with that background out of the way, let's talk through the core competencies necessary for the AWS Certified Machine Learning Specialty. There are four categories here. In Domain 1, 20% of the exam is going to cover data engineering. Data engineering is a fundamental skill necessary to do machine learning, in that you need to have the ability to create data pipelines, move data across systems, and that's going to encompass 20%. In Domain 2, you're going to need to cover exploratory data analysis, so it's critical to know things like ingesting data, scatter-plotting data, you know, creating visualizations, understanding, also clustering, which could be very important for exploratory data analysis and getting an idea of the features that you will use. And then in Domain 3, you'll need to know modeling. This is a huge portion of the exam, so if you have some experience with data science already, some experience with Kaggle, building data science projects, you're going to feel really comfortable in this section. The Domain 4 is going to cover machine learning implementation and operations. Another way to put this is MLOps. So you're going to need to know how to take those models and put them into production. Now let's again dive into these four domains a little bit more in detail here, and talk about some of the key things that you'll need to know when you're diving into this exam. So in the first domain here, which is data engineering, you're going to need to know how to identify data sources, content, and location. Also identify and implement a data ingestion solution. Really, these are some of the key things to know and also identify and implement a data transformation solution. In Domain 2, you're going to need to know how to sanitize and prepare data for modeling and also dive into the other aspects, including performing feature engineering, and then analyzing and visualizing data for machine learning. In Domain 3, Modeling, you'll need to know how to frame business problems as machine learning problems. Also select the appropriate model for a given machine learning problem, train machine learning models, and then also perform hyperparameter optimization. Finally, you'll need to know how to evaluate machine learning models. In Domain 4 you'll need to dive into the machine learning implementation and operations, or MLOps. This will be building machine learning solutions for performance, availability, scalability, resilience, and fault tolerance, and also how to recommend and implement the appropriate machine learning services and features for a given problem. And then finally, two other things you'll need to know would be how to apply basic AWS security practices to ML, and then how to deploy and then operationalize your machine learning model. All right, let's go ahead and get started.

Contents