C2W3 Lab 01 Model Evaluation and Selection
C2W3 Lab 01 Model Evaluation and Selection
C2W3 Lab 01 Model Evaluation and Selection
Quantifying a learning algorithm’s performance and comparing different models are some of the
common tasks when applying machine learning to real world applications. In this lab, you will
practice doing these using the tips shared in class. Specifically, you will:
• split datasets into training, cross validation, and test sets
• evaluate regression and classification models
• add polynomial features to improve the performance of a linear regression model
• compare several neural network architectures
This lab will also help you become familiar with the code you’ll see in this week’s programming
assignment. Let’s begin!
First, you will import the packages needed for the tasks in this lab. We also included some
commands to make the outputs later more readable by reducing verbosity and suppressing non-
critical warnings.
[1]: # for array computations and loading data
import numpy as np
# custom functions
import utils
1
# suppress warnings
tf.get_logger().setLevel('ERROR')
tf.autograph.set_verbosity(0)
1.2 Regression
First, you will be tasked to develop a model for a regression problem. You are given the dataset
below consisting of 50 examples of an input feature x and its corresponding target y.
[2]: # Load the dataset from the text file
data = np.loadtxt('./data/data_w3_ex1.csv', delimiter=',')
# Convert 1-D arrays into 2-D because the commands later will require it
x = np.expand_dims(x, axis=1)
y = np.expand_dims(y, axis=1)
2
1.3 Split the dataset into training, cross validation, and test sets
In previous labs, you might have used the entire dataset to train your models. In practice however,
it is best to hold out a portion of your data to measure how well your model generalizes to new
examples. This will let you know if the model has overfit to your training set.
As mentioned in the lecture, it is common to split your data into three parts:
• training set - used to train the model
• cross validation set (also called validation, development, or dev set) - used to
evaluate the different model configurations you are choosing from. For example, you can use
this to make a decision on what polynomial features to add to your dataset.
• test set - used to give a fair estimate of your chosen model’s performance against new
examples. This should not be used to make decisions while you are still developing the
models.
Scikit-learn provides a train_test_split function to split your data into the parts mentioned
above. In the code cell below, you will split the entire dataset into 60% training, 20% cross
validation, and 20% test.
[4]: # Get 60% of the dataset as the training set. Put the remaining 40% in␣
,→temporary variables: x_ and y_.
3
# Split the 40% subset above into two: one half for cross validation and the␣
,→other for the test set
4
1.4 Fit a linear model
Now that you have split the data, one of the first things you can try is to fit a linear model. You
will do that in the next sections below.
In the previous course of this specialization, you saw that it is usually a good idea to perform
feature scaling to help your model converge faster. This is especially true if your input features
have widely different ranges of values. Later in this lab, you will be adding polynomial terms so
your input features will indeed have different ranges. For example, x runs from around 1600 to
3600, while x2 will run from 2.56 million to 12.96 million.
You will only use x for this first model but it’s good to practice feature scaling now so you can
apply it later. For that, you will use the StandardScaler class from scikit-learn. This computes
the z-score of your inputs. As a refresher, the z-score is given by the equation:
x−µ
z=
σ
where µ is the mean of the feature values and σ is the standard deviation. The code below shows
how to prepare the training set using the said class. You can plot the results again to inspect if it
5
still follows the same pattern as before. The new graph should have a reduced range of values for
x.
[6]: # Initialize the class
scaler_linear = StandardScaler()
# Compute the mean and standard deviation of the training set then transform it
X_train_scaled = scaler_linear.fit_transform(x_train)
Next, you will create and train a regression model. For this lab, you will use the LinearRegression
class but take note that there are other linear regressors which you can also use.
6
[7]: # Initialize the class
linear_model = LinearRegression()
To evaluate the performance of your model, you will measure the error for the training and cross
validation sets. For the training error, recall the equation for calculating the mean squared error
(MSE):
[m ]
1 ∑train
(i) (i)
Jtrain (w,
⃗ b) = (fw,b
⃗ (⃗xtrain ) − ytrain )2
2mtrain
i=1
Scikit-learn also has a built-in mean_squared_error() function that you can use. Take note though
that as per the documentation, scikit-learn’s implementation only divides by m and not 2*m, where m
is the number of examples. As mentioned in Course 1 of this Specialization (cost function lectures),
dividing by 2m is a convention we will follow but the calculations should still work whether or not
you include it. Thus, to match the equation above, you can use the scikit-learn function then divide
by 2 as shown below. We also included a for-loop implementation so you can check that it’s equal.
Another thing to take note: Since you trained the model on scaled values (i.e. using the z-score),
you should also feed in the scaled training set instead of its raw values.
[8]: # Feed the scaled training set and get the predictions
yhat = linear_model.predict(X_train_scaled)
# for-loop implementation
total_squared_error = 0
for i in range(len(yhat)):
squared_error_i = (yhat[i] - y_train[i])**2
total_squared_error += squared_error_i
7
You can then compute the MSE for the cross validation set with basically the same equation:
[m ]
1 ∑cv
Jcv (w,
⃗ b) = (fw,b
⃗ (⃗xcv ) − ycv )
(i) (i) 2
2mcv
i=1
As with the training set, you will also want to scale the cross validation set. An important thing to
note when using the z-score is you have to use the mean and standard deviation of the training
set when scaling the cross validation set. This is to ensure that your input features are transformed
as expected by the model. One way to gain intuition is with this scenario:
• Say that your training set has an input feature equal to 500 which is scaled down to 0.5
using the z-score.
• After training, your model is able to accurately map this scaled input x=0.5 to the target
output y=300.
• Now let’s say that you deployed this model and one of your users fed it a sample equal to
500.
• If you get this input sample’s z-score using any other values of the mean and standard de-
viation, then it might not be scaled to 0.5 and your model will most likely make a wrong
prediction (i.e. not equal to y=300).
You will scale the cross validation set below by using the same StandardScaler you used earlier
but only calling its transform() method instead of fit_transform().
[9]: # Scale the cross validation set using the mean and standard deviation of the␣
,→training set
X_cv_scaled = scaler_linear.transform(x_cv)
From the graphs earlier, you may have noticed that the target y rises more sharply at smaller
values of x compared to higher ones. A straight line might not be the best choice because the
target y seems to flatten out as x increases. Now that you have these values of the training and
cross validation MSE from the linear model, you can try adding polynomial features to see if you
8
can get a better performance. The code will mostly be the same but with a few extra preprocessing
steps. Let’s see that below.
First, you will generate the polynomial features from your training set. The code below demon-
strates how to do this using the PolynomialFeatures class. It will create a new input feature
which has the squared values of the input x (i.e. degree=2).
# Preview the first 5 elements of the new training set. Left column is `x` and␣
,→right column is `x^2`
# Note: The `e+<number>` in the output denotes how many places the decimal␣
,→point should
[[3.32e+03 1.11e+07]
[2.34e+03 5.50e+06]
[3.49e+03 1.22e+07]
[2.63e+03 6.92e+06]
[2.59e+03 6.71e+06]]
You will then scale the inputs as before to narrow down the range of values.
[11]: # Instantiate the class
scaler_poly = StandardScaler()
# Compute the mean and standard deviation of the training set then transform it
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
[[ 1.43 1.47]
[-0.28 -0.36]
[ 1.71 1.84]
[ 0.22 0.11]
[ 0.15 0.04]]
You can then proceed to train the model. After that, you will measure the model’s performance
against the cross validation set. Like before, you should make sure to perform the same transfor-
mations as you did in the training set. You will add the same number of polynomial features then
9
scale the range of values.
[12]: # Initialize the class
model = LinearRegression()
# Scale the cross validation set using the mean and standard deviation of the␣
,→training set
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)
# Loop over 10 times. Each adding one more degree of polynomial higher than the␣
,→last.
10
polys.append(poly)
11
1.5.2 Choosing the best model
When selecting a model, you want to choose one that performs well both on the training and
cross validation set. It implies that it is able to learn the patterns from your training set without
overfitting. If you used the defaults in this lab, you will notice a sharp drop in cross validation
error from the models with degree=1 to degree=2. This is followed by a relatively flat line up to
degree=5. After that, however, the cross validation error is generally getting worse as you add
more polynomial features. Given these, you can decide to use the model with the lowest cv_mse as
the one best suited for your application.
[14]: # Get the model with the lowest CV MSE (add 1 because list indices start at 0)
# This also corresponds to the degree of the polynomial added
degree = np.argmin(cv_mses) + 1
print(f"Lowest CV MSE is found in the model with degree={degree}")
12
# Scale the test set
X_test_mapped_scaled = scalers[degree-1].transform(X_test_mapped)
The same model selection process can also be used when choosing between different neural network
architectures. In this section, you will create the models shown below and apply it to the same
regression task above.
You will use the same training, cross validation, and test sets you generated in the previous section.
From earlier lectures in this course, you may have known that neural networks can learn non-linear
relationships so you can opt to skip adding polynomial features. The code is still included below in
case you want to try later and see what effect it will have on your results. The default degree is set
to 1 to indicate that it will just use x_train, x_cv, and x_test as is (i.e. without any additional
polynomial features).
Next, you will scale the input features to help gradient descent converge faster. Again, notice
that you are using the mean and standard deviation computed from the training set by just using
transform() in the cross validation and test sets instead of fit_transform().
[17]: # Scale the features using the z-score
scaler = StandardScaler()
X_train_mapped_scaled = scaler.fit_transform(X_train_mapped)
X_cv_mapped_scaled = scaler.transform(X_cv_mapped)
13
X_test_mapped_scaled = scaler.transform(X_test_mapped)
You will then create the neural network architectures shown earlier. The code is provided in the
build_models() function in the utils.py file in case you want to inspect or modify it. You will
use that in the loop below then proceed to train the models. For each model, you will also record
the training and cross validation errors.
[18]: # Initialize lists that will contain the errors for each model
nn_train_mses = []
nn_cv_mses = []
print(f"Training {model.name}...")
print("Done!\n")
14
# print results
print("RESULTS:")
for model_num in range(len(nn_train_mses)):
print(
f"Model {model_num+1}: Training MSE: {nn_train_mses[model_num]:.2f}, " +
f"CV MSE: {nn_cv_mses[model_num]:.2f}"
)
Training model_1…
Done!
Training model_2…
Done!
Training model_3…
Done!
RESULTS:
Model 1: Training MSE: 73.44, CV MSE: 113.87
Model 2: Training MSE: 73.40, CV MSE: 112.28
Model 3: Training MSE: 44.56, CV MSE: 88.51
From the recorded errors, you can decide which is the best model for your application. Look at the
results above and see if you agree with the selected model_num below. Finally, you will compute
the test error to estimate how well it generalizes to new examples.
[19]: # Select the model with the lowest CV MSE
model_num = 3
Selected Model: 3
Training MSE: 44.56
Cross Validation MSE: 88.51
Test MSE: 87.77
15
1.7 Classification
In this last part of the lab, you will practice model evaluation and selection on a classification task.
The process will be similar, with the main difference being the computation of the errors. You will
see that in the following sections.
First, you will load a dataset for a binary classification task. It has 200 examples of two input
features (x1 and x2), and a target y of either 0 or 1.
# Convert y into 2-D because the commands later will require it (x is already␣
,→2-D)
16
1.7.2 Split and prepare the dataset
Next, you will generate the training, cross validation, and test sets. You will use the same 60/20/20
proportions as before. You will also scale the features as you did in the previous section.
[22]: from sklearn.model_selection import train_test_split
# Get 60% of the dataset as the training set. Put the remaining 40% in␣
,→temporary variables.
# Split the 40% subset above into two: one half for cross validation and the␣
,→other for the test set
17
print(f"the shape of the cross validation set (input) is: {x_bc_cv.shape}")
print(f"the shape of the cross validation set (target) is: {y_bc_cv.shape}\n")
print(f"the shape of the test set (input) is: {x_bc_test.shape}")
print(f"the shape of the test set (target) is: {y_bc_test.shape}")
# Compute the mean and standard deviation of the training set then transform it
x_bc_train_scaled = scaler_linear.fit_transform(x_bc_train)
x_bc_cv_scaled = scaler_linear.transform(x_bc_cv)
x_bc_test_scaled = scaler_linear.transform(x_bc_test)
In the previous sections on regression models, you used the mean squared error to measure how
well your model is doing. For classification, you can get a similar metric by getting the fraction of
the data that the model has misclassified. For example, if your model made wrong predictions for
2 samples out of 5, then you will report an error of 40% or 0.4. The code below demonstrates this
using a for-loop and also with Numpy’s mean() function.
[24]: # Sample model output
probabilities = np.array([0.2, 0.6, 0.7, 0.3, 0.8])
# Apply a threshold to the model output. If greater than 0.5, set to 1. Else 0.
predictions = np.where(probabilities >= 0.5, 1, 0)
18
# Loop over each prediction
for i in range(num_predictions):
print(f"probabilities: {probabilities}")
print(f"predictions with threshold=0.5: {predictions}")
print(f"targets: {ground_truth}")
print(f"fraction of misclassified data (for-loop): {fraction_error}")
print(f"fraction of misclassified data (with np.mean()): {np.mean(predictions !
,→= ground_truth)}")
You will use the same neural network architectures in the previous section so you can call the
build_models() function again to create new instances of these models.
You will follow the recommended approach mentioned last week where you use a linear activation
for the output layer (instead of sigmoid) then set from_logits=True when declaring the loss
function of the model. You will use the binary crossentropy loss because this is a binary classification
problem.
After training, you will use a sigmoid function to convert the model outputs into probabilities.
From there, you can set a threshold and get the fraction of misclassified examples from the training
and cross validation sets.
You can see all these in the code cell below.
[25]: # Initialize lists that will contain the errors for each model
nn_train_error = []
nn_cv_error = []
19
# Loop over each model
for model in models_bc:
print(f"Training {model.name}...")
print("Done!\n")
# Record the fraction of misclassified examples for the cross validation set
yhat = model.predict(x_bc_cv_scaled)
yhat = tf.math.sigmoid(yhat)
yhat = np.where(yhat >= threshold, 1, 0)
cv_error = np.mean(yhat != y_bc_cv)
nn_cv_error.append(cv_error)
Training model_1…
20
Done!
Training model_2…
Done!
Training model_3…
Done!
Selected Model: 3
Training Set Classification Error: 0.0500
CV Set Classification Error: 0.1500
Test Set Classification Error: 0.1750
1.8 Wrap Up
In this lab, you practiced evaluating a model’s performance and choosing between different model
configurations. You split your datasets into training, cross validation, and test sets and saw how
each of these are used in machine learning applications. In the next section of the course, you will
see more tips on how to improve your models by diagnosing bias and variance. Keep it up!
21