Lab Week 7

Report CSC 645
Muhd Fakhrullah Bin Mohd Zaini 2020995275 5B
Random Data
Step 1: Open anaconda navigator and launch Jupyter Notebook

Save the iris.csv dataset in the same folder with the coding
Create new phyton file and start doing the coding
Step 2: Import library pandas in the coding

Import pandas as pd
data = pd.read_csv('iris.csv.csv')
Step 3: Drop the last column

input_data = data.drop(columns = 'species')
input_data
Step 4: Separate the input column and target column

target_data = data['species']
target_data

target_data = data['species']
target_data
Step 6: Import Sklearn and split the testing data randomly using only 20% of the data
from sklearn.model_selection import train_test_split_data
input_train, input_test, target_train, target_test = train_test_split(input_data, target_data,
nnnnnn test_size = 0.2)
Step 7: After split the data user must import SVM and SVC model, also choose 4 different kernels
sklearn.svm import SVC
svcModel = SVC (kernel = 'linear'), or sigmoid, or rbf, or poly
svcModel.fit(input_train, target_train)
Step 8: User must make a prediction

output_test = svcModel.predict(input_test)
output _test
Step 9: Import sklearn.metrics to get a report

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
sssssssssprint (classification_report(output_test, target_test)) k
xx print (confusion_matrix(output_test, target_test)) cccccccccccccccccccccccccccccccccccc
c print (accuracy_score(output_test, target_test))
Step 10: Compare the accuracy of each kernel and choose the highest accuracy
Step 11: In this practice the best kernel is poly as it get 100% accuracy
Fixed Data
Step 1: Open anaconda navigator and launch Jupyter Notebook

Save the iris.csv dataset in the same folder with the coding
Create new phyton file and start doing the coding
Step 2: Separate the data testing by 30 data and data training by 120 data in new microsoft excel
ssss file using random generator
Step 3: Import library pandas in the coding

Import pandas as pd
data_train = pd.read_csv('iris.train.csv')
Step 4: Drop the last and number column

input_data_train = data.drop(columns = ['No', 'species'])
input_data_train

target_data_train = data_train['species']
target_data_train
Step 6: import SVM and SVC model, also choose 4 different kernels
sklearn.svm import SVC
svcModel = SVC (kernel = 'linear'), or sigmoid, or rbf, or poly
svcModel.fit(input_data_train, target_data_train)
Step 7: import the testing data

data_test = pd.read_csv('iris.test.csv')
data_test
Step 8: Drop the last column and number column for testing data
input_data_test = data_test.drop(columns = ['No', 'species'])
input_data_test

target_data_test = data_test['species'] g…… ..
..vvv target_data_test
Step 10: User must make a prediction

output_test = svcModel.predict(input_data_test)
output _test
Step 11: Import sklearn.metrics to get a report

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
sssssssssprint (classification_report(target_data_test, output_test)) k
xx print (confusion_matrix(target_data_test, output_test)) ccccccccccccccccccccc l
dd print (accuracy_score(target_data_test, output_test))
Step 12: Compare the accuracy of each kernel
Step 13: In this practice the best kernel is Linear as it get 96.7 accuracy
The difference result accuracy is based on different kernel that has been used to find the most
optimized and efficient kernel technique
Advantage of SVM
• SVM works relatively well when there is a clear margin of separation between classes.
• SVM is more effective in high dimensional spaces.
• SVM is effective in cases where the number of dimensions is greater than the number of
samples.
• SVM is relatively memory efficient
Disadvantage of SVM
• SVM algorithm is not suitable for large data sets.

• SVM does not perform very well when the data set has more noise i.e. target classes are
overlapping.
• In cases where the number of features for each data point exceeds the number of training
data samples, the SVM will underperform.
• As the support vector classifier works by putting data points, above and below the classifying
hyperplane there is no probabilistic explanation for the classification.

Lab Week 7

Uploaded by

Copyright:

Available Formats

Lab Week 7

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Week 7

Uploaded by

Copyright:

Available Formats

Report CSC 645

Muhd Fakhrullah Bin Mohd Zaini 2020995275 5B

Step 1: Open anaconda navigator and launch Jupyter Notebook

Step 2: Import library pandas in the coding

Step 3: Drop the last column

Step 4: Separate the input column and target column

Step 5: Separate the input column and target column

Step 8: User must make a prediction

Step 9: Import sklearn.metrics to get a report

Step 1: Open anaconda navigator and launch Jupyter Notebook

Step 3: Import library pandas in the coding

Step 4: Drop the last and number column

Step 5: Separate the input column and target column

Step 7: import the testing data

Step 9: Separate the input column and target column

Step 10: User must make a prediction

Step 11: Import sklearn.metrics to get a report

Step 12: Compare the accuracy of each kernel

• SVM algorithm is not suitable for large data sets.

You might also like