Tutorial 9 - Questions 2023
Tutorial 9 - Questions 2023
Tutorial 9 - Questions 2023
Tutorial 9
Classification - Machine learning models
SVM
The support vector machine constructs a hyperplane (or set of hyperplanes) that
maximize the margin width between two classes in a high dimensional space. In
these, the cases that define the hyperplane are support vectors, as shown in the
following figure:
# We can remove the state, area_code, and account_length attributes, which are not
appropriate for classification features:
# Then, we split 70 percent of the data into the training dataset and 30 percent of
the data into the testing dataset using a sample function:
Note: set.seed() function in R is used to reproduce results i.e. it produces the same
sample again and again. When we generate randoms numbers without set.seed()
function it will produce different samples at different time of execution.
# Lastly, use dim to explore the dimensions of both the training and testing
datasets:
dim(trainset)
dim(testset)
What is the disadvantage of this training-testing data partition strategy? How can
we improve?
# Train the support vector machine using the svm function with trainset as the
input dataset and use churn as the classification category:
# Finally, you can obtain overall information about the built model with summary:
summary(model)
# Then, you can use the table function to generate a classification table with the
prediction result and labels of the testing dataset:
svm.table=table(svm.pred, testset$churn)
svm.table
# Now, you can use confusionMatrix from package caret to measure the
prediction performance based on the classification table:
confusionMatrix(svm.table)