Tutorial 9 - Questions 2023

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

ICT583 Data Science Applications

Tutorial 9
Classification - Machine learning models
SVM

After completing this lab, you should know:


a. How to prepare the training and testing dataset
b. How to build support vector machines in R - libsvm

Do you still remember how support vector machines works?

The support vector machine constructs a hyperplane (or set of hyperplanes) that
maximize the margin width between two classes in a high dimensional space. In
these, the cases that define the hyperplane are support vectors, as shown in the
following figure:

Building a classification model requires a training dataset to train the


classification model, and testing dataset is needed to then evaluate the prediction
performance. We will use the customer churn dataset as the input data, and split
the data into training and testing datasets.

# Retrieve the churn dataset:


install.packages("modeldata")
library(modeldata)
data(mlc_churn)

Understand your dataset first!

# We can remove the state, area_code, and account_length attributes, which are not
appropriate for classification features:

mlc_churn = mlc_churn[,! names(mlc_churn) %in% c("state", "area_code",


"account_length") ]

# Then, we split 70 percent of the data into the training dataset and 30 percent of
the data into the testing dataset using a sample function:

# Set random seed


set.seed(123)

Note: set.seed() function in R is used to reproduce results i.e. it produces the same
sample again and again. When we generate randoms numbers without set.seed()
function it will produce different samples at different time of execution.

ind = sample(2, nrow(mlc_churn), replace = TRUE,


prob=c(0.7, 0.3))
trainset = mlc_churn[ind == 1,]
testset = mlc_churn[ind == 2,]

# Lastly, use dim to explore the dimensions of both the training and testing
datasets:

dim(trainset)
dim(testset)

What is the disadvantage of this training-testing data partition strategy? How can
we improve?

We train the SVM using the following steps:

# Load the e1071 package:


library(e1071)

# Train the support vector machine using the svm function with trainset as the
input dataset and use churn as the classification category:

model =svm(churn~., data = trainset, kernel="radial", cost=1,


gamma = 1/ncol(trainset))

# Finally, you can obtain overall information about the built model with summary:

summary(model)

Predict labels based on the model trained by support vector machine

svm.pred = predict(model, testset[, !names(testset)


%in% c("churn")])

# Then, you can use the table function to generate a classification table with the
prediction result and labels of the testing dataset:

svm.table=table(svm.pred, testset$churn)
svm.table
# Now, you can use confusionMatrix from package caret to measure the
prediction performance based on the classification table:

confusionMatrix(svm.table)

You might also like