Answer 1722791857 NLP and Classification Practical MCQ 4991
Answer 1722791857 NLP and Classification Practical MCQ 4991
Answer 1722791857 NLP and Classification Practical MCQ 4991
TEST
Correct Answer
Answered in 20.433333333333 Minutes
Question 1/15
What does the CountVectorizer output X represent in the code snippet below?
Explanation:
Question 2/15
What is the accuracy of the logistic regression model on the test data?
0.958
0.975
0.962
0.945
Explanation:
Question 3/15
What is the value of True Positive (TP) in the confusion matrix generated by the RandomForestClassifier below?
Modify the code to print the value.
130
97
113
118
Explanation:
Question 4/15
What is the best value of the parameter 'C' for the SVC according to the grid search? Modify the code to print
the best parameter.
# Load a dataset
digits = load_digits()
X = digits.data
y = digits.target
0.001
0.1
1.0
0.01
Explanation:
Question 5/15
Which code snippet can be used to fill in the missing lines of code to train the SVM classifier, predict the test set
results, and print the classification report?
svm_rbf.train(X_train, y_train)
y_pred = svm_rbf.classify(X_test)
report = classification_report(y_test,
y_pred)
print(report)
svm_rbf.train(X_train, y_train)
y_pred = svm_rbf.test(X_test)
print(classification_report(y_pred, y_test))
svm_rbf.fit(X_train, y_train)
y_pred = svm_rbf.predict(X_test)
report = classification_report(y_pred,
y_test)
print(report)
svm_rbf.fit(X_train, y_train)
y_pred = svm_rbf.predict(X_test)
print(classification_report(y_test, y_pred))
Explanation:
Question 6/15
Given the code below, your task is to select the function from the options provided that correctly completes the
task by:
i) Creating a function that determines which classifier (KNN or Naive Bayes) has a higher F1 score, or if they
have equal scores.
ii) Printing the name of the classifier along with its F1 score in the format: 'ClassifierName has the higher F1
score of Score' or 'Both classifiers have the same F1 score of Score'.
Explanation:
-
def print_best_classifier(f1_knn, f1_nb):
Question 7/15
Which of the following options will complete the missing code lines to:
# [Your Code Here] - Train the MLPClassifier on the scaled training data
# [Your Code Here] - Predict the labels for the scaled test data
# [Your Code Here] - Print the number of misclassified samples in the test set
mlp.fit(X_train_scaled, y_train)
y_pred = mlp.predict(X_test_scaled)
print(np.sum(y_test != y_pred))
mlp.train(X_train_scaled, y_train)
y_pred = mlp.test(X_test_scaled)
print(np.count_nonzero(y_test == y_pred))
mlp.fit(X_train_scaled, y_train)
y_pred = mlp.predict(X_test_scaled)
misclassified = np.where(y_test != y_pred,
1, 0)
print(misclassified.sum())
mlp.train(X_train_scaled, y_train)
y_pred = mlp.classify(X_test_scaled)
print((y_test - y_pred).count_nonzero())
Explanation:
Question 8/15
Before running the final line of the code in the snippet below to fit thegrid_search object, you are asked to
perform the following tasks directly in the code:
param_grid['max_features'] = [1, 2, 3, 4]
grid_search = GridSearchCV(dt,
param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(f"Optimal Parameters:
{grid_search.best_params_}, CV
Accuracy: {grid_search.best_score_}")
param_grid['max_features'] = range(1, 5)
grid_search.fit(X_train, y_train)
print(f"Best Params:
{grid_search.best_params_}, CV Score:
{grid_search.best_score_}")
Explanation:
-
param_grid['max_features'] = range(1, 5)
grid_search.fit(X_train, y_train)
print(f"Best Params: {grid_search.best_params_},
CV Score: {grid_search.best_score_}")
-
param_grid.update({'max_features': [1, 2]})
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
print(f"Best parameters found: {best_params},
Score: {grid_search.best_score_}")
-
param_grid = {'max_features': [1, 2, 3, 4]}
grid_search.fit(X_train, y_train)
print("Best Parameters:",
grid_search.best_params_)
print("Best Cross-validation Score:",
grid_search.best_score_)
Question 9/15
You are fine-tuning a decision tree classifier for a marketing dataset. To prevent overfitting and ensure robust
generalisability, you must adjust the depth of the decision tree after its initialisation but before it is fitted with
data. Considering the decision tree `dt` has already been initialised with a random state, which of the following is
the correct way to modify the tree's maximum depth?
# Load data
data = load_breast_cancer()
X = data.data
y = data.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
dt.set_params(max_depth=5).fit(X_train,
y_train)
dt.max_depth = 42
dt.set_params(max_depth=5)
dt = DecisionTreeClassifier(max_depth=5,
random_state=42)
Explanation:
- dt.set_params(max_depth=5)
- dt = DecisionTreeClassifier(max_depth=5,
random_state=42)
- dt.max_depth = 42
- dt.set_params(max_depth=5).fit(X_train,
y_train)
Question 10/15
Suppose you are analysing the performance of a new email spam detection system using precision and recall.
You have already computed these metrics, and you are about to explore their trade-offs to optimise the
classifier's threshold. Given the code snippet below, identify the correct function call that would allow you to
adjust and visualise the precision-recall trade-off.
# [Your Code Here] - Generate precision and recall values for various thresholds
precision_recall_curve(classifier, X_test,
y_test)
precision, recall =
precision_recall_curve(y_test, y_scores)
plt.plot(precision_recall_curve(y_test,
y_scores))
Explanation:
- precision_recall_curve(classifier, X_test,
y_test)
- plt.plot(precision_recall_curve(y_test,
y_scores))
- precision, recall =
precision_recall_curve(y_test, y_scores)
Question 11/15
You are tasked with enhancing the robustness of a logistic regression model by incorporating feature scaling.
You're currently working with a dataset that has significantly varying scales among its features, which can affect
the model's performance. Below is a preliminary setup for the logistic regression model. Identify the correct
sequence of steps to integrate feature scaling into the modelling process.
scaler = StandardScaler()
X_train_scaled =
scaler.fit_transform(X_train)
lr.fit(X_train_scaled, y_train)
scaler = StandardScaler()
X_test_scaled =
scaler.fit_transform(X_test)
scaler = StandardScaler()
X_train_scaled =
scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
lr.fit(X_train_scaled, y_train)
scaler = StandardScaler()
X_train_scaled = scaler.transform(X_train)
lr.fit(X_train_scaled, y_train)
X_test_scaled =
scaler.fit_transform(X_test)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lr.fit(X_scaled, y)
Explanation:
-
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
lr.fit(X_train_scaled, y_train)
-
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
lr.fit(X_train_scaled, y_train)
scaler = StandardScaler()
X_test_scaled = scaler.fit_transform(X_test)
-
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lr.fit(X_scaled, y)
-
scaler = StandardScaler()
X_train_scaled = scaler.transform(X_train)
lr.fit(X_train_scaled, y_train)
X_test_scaled = scaler.fit_transform(X_test)
Question 12/15
You are fine-tuning a support vector machine (SVM) classifier to categorise images based on their content. The
dataset consists of various animal images, and you suspect that different kernel functions might yield better
classification accuracy. You decide to test which SVM kernel—linear or radial basis function (RBF)—works best
for your specific dataset. Below is your initial code setup:
# Initialise two SVM classifiers, one with a linear kernel and another with an RBF kernel
svm_linear = SVC(kernel='linear')
svm_rbf = SVC(kernel='rbf')
Which of the following options correctly completes the task of training both SVM classifiers, predicting the test
set results, and calculating the accuracy for each
svm_linear.fit(X_train, y_train)
y_pred_linear =
svm_linear.predict(X_train)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_train)
print("Accuracy with Linear Kernel:",
accuracy_score(y_train, y_pred_linear))
print("Accuracy with RBF Kernel:",
accuracy_score(y_train, y_pred_rbf))
svm_linear.train(X_train, y_train)
svm_rbf.train(X_train, y_train)
y_pred_linear =
svm_linear.classify(X_test)
y_pred_rbf = svm_rbf.classify(X_test)
print("Linear Kernel Accuracy:",
accuracy_score(y_test, y_pred_linear))
print("RBF Kernel Accuracy:",
accuracy_score(y_test, y_pred_rbf))
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
print("Accuracy with Linear Kernel:",
accuracy_score(y_test, y_pred_linear))
print("Accuracy with RBF Kernel:",
accuracy_score(y_test, y_pred_rbf))
svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)
print("Linear Accuracy:",
accuracy_score(y_test, y_pred_linear))
print("RBF Accuracy:",
accuracy_score(y_test, y_pred_rbf))
Explanation:
-
svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)
print("Linear Accuracy:",
accuracy_score(y_test, y_pred_linear))
print("RBF Accuracy:", accuracy_score(y_test,
y_pred_rbf))
-
svm_linear.train(X_train, y_train)
svm_rbf.train(X_train, y_train)
y_pred_linear = svm_linear.classify(X_test)
y_pred_rbf = svm_rbf.classify(X_test)
print("Linear Kernel Accuracy:",
accuracy_score(y_test, y_pred_linear))
print("RBF Kernel Accuracy:",
accuracy_score(y_test, y_pred_rbf))
svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)
y_pred_linear = svm_linear.test(X_test)
y_pred_rbf = svm_rbf.test(X_test)
print("Linear Test Accuracy:",
accuracy_score(y_test, y_pred_linear))
print("RBF Test Accuracy:",
accuracy_score(y_test, y_pred_rbf))
Question 13/15
You are currently evaluating two classifiers, K-Nearest Neighbours (KNN) and Naive Bayes, for a project that
involves classifying texts into different categories based on their content. To finalise your model selection, you
decide to visually compare their performance using a bar chart. Below is the setup for calculating the accuracy
of both models on your dataset. Complete the code by adding the necessary lines to plot the accuracies in a bar
chart:
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
# Load data
data = fetch_20newsgroups(subset='all')
X = data.data
y = data.target
# Initialise classifiers
knn = KNeighborsClassifier()
nb = MultinomialNB()
# Train classifiers
knn.fit(X_train_tfidf, y_train)
nb.fit(X_train_tfidf, y_train)
Which snippet of code will correctly plot the accuracies of KNN and Naive Bayes classifiers in a bar chart?
Explanation:
-
plt.bar(['KNN', 'Naive Bayes'], [knn_accuracy,
nb_accuracy])
plt.xlabel('Classifier')
plt.ylabel('Accuracy')
plt.title('Classifier Accuracies')
plt.show()
-
plt.plot(['KNN', 'Naive Bayes'], [knn_accuracy,
nb_accuracy])
plt.xlabel('Classifier')
plt.ylabel('Accuracy')
plt.title('Comparison of Classifier Performance')
plt.show()
-
acc_data = [knn_accuracy, nb_accuracy]
labels = ['KNN', 'Naive Bayes']
plt.barh(labels, acc_data)
plt.xlabel('Accuracy')
plt.ylabel('Classifier')
plt.title('Accuracy Comparison')
plt.show()
Question 14/15
You are tasked with evaluating a simple binary classification model using a confusion matrix. The dataset
involves predicting whether a given email is spam or not. To better understand the model's performance, you
plan to extract specific metrics from the confusion matrix, specifically True Positives (TP) and False Positives
(FP). Below is your initial code setup:
# [Your code here] - Extract and print True Positives and False Positives
Which snippet of code correctly extracts and prints the True Positives (TP) and False Positives (FP) from the
confusion matrix?
Which snippet of code correctly completes the setup to create a pipeline includingPolynomialFeatures and
LogisticRegression, fits it on the training data, and makes predictions?
tp = cm[1, 1]
fp = cm[0, 1]
print("True Positives:", tp)
print("False Positives:", fp)
print("TP:", cm[1][1])
print("FP:", cm[2][1])
Explanation:
Question 15/15
You are refining a logistic regression model to predict customer churn. The dataset includes various customer
interaction metrics. To enhance your model, explore how polynomial features can improve prediction accuracy.
This approach allows the model to capture complex interactions between variables.
model = LogisticRegression()
model.fit(X_train_poly, y_train)
y_pred = model.predict(X_test_poly)
model = LogisticRegression()
model.fit(X_train_poly, y_test)
y_pred = model.predict(X_test_poly)
model = LogisticRegression()
model.fit(X_test_poly, y_test)
y_pred = model.predict(X_train_poly)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Explanation:
-
model = LogisticRegression()
model.fit(X_train_poly, y_train)
y_pred = model.predict(X_test_poly)
-
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
-
model = LogisticRegression()
model.fit(X_test_poly, y_test)
y_pred = model.predict(X_train_poly)
-
model = LogisticRegression()
model.fit(X_train_poly, y_test)
y_pred = model.predict(X_test_poly)