Affiliated To VTU, Belgaum and Approved by AICTE
Affiliated To VTU, Belgaum and Approved by AICTE
Affiliated To VTU, Belgaum and Approved by AICTE
LABORATORY MANUAL
Machine Learning Laboratory
[As per Choice Based Credit System (CBCS) scheme]
(Effective from the academic year 2015 -2016)
15CSL76
PREPARED BY
HK BK COLLEGE OF ENGINEERING
Nagawara, Bangaluru -560 045
www.hkbkeducation.org
HKBK COLLEGE OF ENGINEERING
(Affiliated to VTU, Belgaum and Approved by AICTE)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Mission
Vision
To empower students through wholesome education and enable the students to develop
into highly qualified and trained professionals with ethics and emerge as responsible
citizen with broad outlook to build a vibrant nation.
Mission
Vision
To advance the intellectual capacity of the nation and the international community by
imparting knowledge to graduates who are globally recognized as innovators,
entrepreneur and competent professionals.
HKBK COLLEGE OF ENGINEERING
(Affiliated to VTU, Belgaum and Approved by AICTE)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
PEO-3 To equip students in the chosen field of engineering and related fields to
enable him to work in multidisciplinary teams.
PEO-5 To provide students with an environment for life-long learning which allow
them to successfully adapt to the evolving technologies throughout their
professional carrier and face the global challenges.
Programme Outcomes
e. Modern Tool Usage: Create, select and apply appropriate techniques, resources
and modern engineering and IT tools including prediction and modeling to
complex engineering activities with an under- standing of the limitations.
k. Life-long Learning: Recognize the need for and have the preparation and
ability to engage in independent and life- long learning in the broadest context
of technological change.
Course objectives:
This course will enable students to
1. Make use of Data sets in implementing the machine learning algorithms
2. Implement the machine learning concepts and algorithms in any suitable language of choice.
S# Lab Experiments:
1. Implement and demonstratethe FIND-Salgorithm for finding the most specific hypothesis based on a
given set of training data samples. Read the training data from a .CSV file
2. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithmto output a description of the set of all hypotheses consistent with
the training examples
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
4. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the
same using appropriate data sets.
5. Write a program to implement the naïve Bayesian classifier for a sample training data set stored as a
.CSV file. Compute the accuracy of the classifier, considering few test data sets.
6. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.
7. Write a program to construct aBayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering
using k-Means algorithm. Compare the results of these two algorithms and comment on the quality
of clustering. You can add Java/Python ML library classes/API in the program.
9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print both
correct and wrong predictions. Java/Python ML library classes can be used for this problem
10. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and draw graphs.
Course outcomes:
The students should be able to:
1. Understand the implementation procedures for the machine learning algorithms.
2. Design Java/Python programs for various Learning algorithms.
3. Apply appropriate data sets to the Machine Learning algorithms.
4. Identify and apply Machine Learning algorithms to solve real world problems
Mapping of Course outcome to Programme Outcomes
PO a b c d e f g h i j k l m n o
CO
1 2 2 2 2 2 - - - - - 3 - 3 3 -
2 2 2 2 2 2 - - - - - 3 - 3 3 -
3 2 2 2 2 2 - - - - - 3 - 3 3 -
3 High
2 Moderate
1 Low
- Nil
15CSL76 Machine Learning Lab Manual
1. Implement and demonstratethe FIND-Salgorithm for finding the most specific hypothesis based on
a given set of training data samples. Read the training data from a .CSV file.
Algorithm:
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x Then do nothing
Else replace a, in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Code:
import csv
a = []
print("\n The Given Training Data Set \n")
num_attributes=len(a[0])-1
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
print(hypothesis)
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Strong,Cool,Change,Yes
Output:
The most general hypothesis : ['?','?','?','?','?','?']
The most specific hypothesis : ['0','0','0','0','0','0']
For Training Example No :0 the hypothesis is ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']
For Training Example No :1 the hypothesis is ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training Example No :2 the hypothesis is ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training Example No :3 the hypothesis is ['Sunny', 'Warm', '?', 'Strong', '?', '?']
2. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithmto output a description of the set of all hypotheses consistent
with the training examples.
Algorithm:
Code:
import csv
a = []
print("\n The Given Training Data Set \n")
S[j] = a[0][j];
for i in range(0,len(a)):
print("------------------------------------------------------------------------------")
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=S[j]:
S[j]='?'
for j in range(0,num_attributes):
for k in range(1,len(temp)):
if temp[k][j]!= '?' and temp[k][j] !=S[j]:
del temp[k]
if a[i][num_attributes]=='No':
for j in range(0,num_attributes):
if S[j] != a[i][j] and S[j]!= '?':
G[j]=S[j]
temp.append(G)
G = ['?'] * num_attributes
Output:
------------------------------------------------------------------------------
For Training Example No :1 the hypothesis is S1 ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm',
'Same']
For Training Example No :1 the hypothesis is G1 ['?', '?', '?', '?', '?', '?']
------------------------------------------------------------------------------
For Training Example No :2 the hypothesis is S2 ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training Example No :2 the hypothesis is G2 ['?', '?', '?', '?', '?', '?']
------------------------------------------------------------------------------
For Training Example No :3 the hypothesis is S3 ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training Example No :3 the hypothesis is G3 [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', 'Same']]
------------------------------------------------------------------------------
For Training Example No :4 the hypothesis is S4 ['Sunny', 'Warm', '?', 'Strong', '?', '?']
For Training Example No :4 the hypothesis is G4 [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?',
'?']]
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample
Algorithm:
Examples are the training examples. Target_attribute is the attribute whose value is to be
predicted by the tree. Attributes is a list of other attributes that may be tested by the learneddecision
tree. Returns a decision tree that correctly classifies the given Examples.
i.
CSE @ HKBKCE 11 2018-19
15CSL76 Machine Learning Lab Manual
Code:
import numpy as np
import math
import csv
class Node:
def __init__(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""
def __str__(self):
return self.attribute
def read_data(filename):
for x in range(items.shape[0]):
for y in range(data.shape[0]):
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
return items, dict
def entropy(S):
items = np.unique(S)
if items.size == 1:
return 0
for x in range(items.shape[0]):
return sums
for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy
split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))
return node
def empty(size):
s = ""
for x in range(size):
s += " "
return s
print(empty(level), node.attribute)
Output:
outlook
overcast
[b'yes']
rainy
windy
b'Strong'
[b'no']
b'Weak'
[b'yes']
sunny
humidity
b'high'
[b'no']
b'normal'
[b'yes']
4. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the
same using appropriate data sets.
Algorithm:
BACKPROPOGATION (training_examples,η,nin,nout,nhidden)
Each training example is a pair of the form (x,t ), where x is the vector of network input values, and
t is the vector of target network output values.
η is the learning rate (e.g., .O5). nin is the number of network inputs, nhidden the number of units in the
hidden layer, and nout the number of output units.
The input from unit i into unit j is denoted xji, and the weight from unit i to unit j is denoted wji.
1. Create a feed-forward network with nin inputs, nhidden hidden units, and nout output units.
2. Initialize all network weights to small random numbers
3. Until the termination condition is met, Do
o For each (x,t ) in trainingaxamples, Do
Propagate the input forward through the network:
1. Input the instance x to the network and compute the output O, of every
unit u in the network.
Propagate the errors backward through the network:
2. For each network output unit k, calculate its error term δk
Where
Code:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
y = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
Output:
Input:
[[2. 9.]
[1. 5.]
[3. 6.]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89345619]
[0.87813113]
[0.89718075]]
5. Write a program to implement the naïve Bayesian classifier for a sample training data set stored as
a .CSV file. Compute the accuracy of the classifier, considering few test data sets
Algorithm:
NaiveBaiseClassifier(training_examples, New_Instance)
Each instance x is described by a conjunction of attribute values(ai) and the target V can take j finite
set of values.
Where VNB denotes the target value output by the naive Bayes classifier
d. Output VNB
Code:
import numpy as np
import math
import csv
def read_data(filename):
def splitDataset(dataset, splitRatio): #splits dataset to training set and test set based on split ratio
trainSize = int(len(dataset) * splitRatio)
trainSet = []
testset = list(dataset)
i=0
while len(trainSet) < trainSize:
trainSet.append(testset.pop(i))
return [trainSet, testset]
def classify(data,test):
total_size = data.shape[0]
print("training data size=",total_size)
probno=prob[0]
probyes=prob[1]
for i in range(test.shape[1]-1):
probno=probno*prob0[i]
probyes=probyes*prob1[i]
if probno>probyes: # prediction
predict='no'
else:
predict='yes'
print(t+1,"\t",predict,"\t ",test[t,test.shape[1]-1])
final_accuracy=(accuracy/test.shape[0])*100
print("accuracy",final_accuracy,"%")
return
classify(training,testing)
Output:
------------------Training Data-------------------
[['sunny', 'hot', 'high', 'Weak', 'no'], ['sunny', 'hot', 'high', 'Strong', 'no'], ['overcast', 'hot', 'high', 'Weak',
'yes'], ['rainy', 'mild', 'high', 'Weak', 'yes'], ['rainy', 'cool', 'normal', 'Weak', 'yes'], ['rainy', 'cool',
'normal', 'Strong', 'no'], ['overcast', 'cool', 'normal', 'Strong', 'yes'], ['sunny', 'mild', 'high', 'Weak', 'no']]
-------------------Test Data-------------------
[['sunny', 'cool', 'normal', 'Weak', 'yes'], ['rainy', 'mild', 'normal', 'Weak', 'yes'], ['sunny', 'mild',
'normal', 'Strong', 'yes'], ['overcast', 'mild', 'high', 'Strong', 'yes'], ['overcast', 'hot', 'normal', 'Weak',
'yes'], ['rainy', 'mild', 'high', 'Strong', 'no']]
training data size= 8
test data sixe= 6
['no' 'yes']
target count probability
no 4 0.5
yes 4 0.5
Instance prediction taget
1 no yes
2 yes yes
3 no yes
4 yes yes
5 yes yes
6 no no
accuracy 66.66666666666666 %
6. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
#splitting the dataset into train and test data
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
# Training Naive Bayes (NB) classifier on training data.
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
Output:
The dimensions of the dataset (18, 2)
the vocabulary
['about', 'am', 'amazing', 'an', 'and', 'beers', 'boss', 'can', 'dance', 'deal', 'do', 'enemy', 'feel', 'fun', 'good',
'great', 'have', 'holiday', 'horrible', 'house', 'is', 'juice', 'like', 'love', 'my', 'not', 'of', 'place', 'restaurant',
'sandwich', 'sick', 'stuff', 'taste', 'the', 'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very', 'we', 'went',
'what', 'will', 'with']
Accuracy metrics
Accuracy of the classifer is 0.6
Confusion matrix
[[1 1]
[1 2]]
Recall and Precison
0.6666666666666666
0.6666666666666666
7. Write a program to construct aBayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
import pandas as pd
import bayespy as bp
#import csv
data=pd.read_csv("heart_disease_data1.csv")
heart_disease=pd.DataFrame(data)
print(heart_disease)
Output
For age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3, Teen:4
For Gender Enter Male:0, Female:1
For Family History Enter yes:1, No:0
For diet Enter High:0, Medium:1
for lifeStyle Enter Athlete:0, Active:1, Moderate:2, Sedetary:3
for cholesterol Enter High:0, BorderLine:1, Normal:2
enter age1
enter Gender0
enter Family history1
enter diet1
enter Lifestyle2
enter cholestrol1
8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on
the quality of clustering. You can add Java/Python ML library classes/API in the program.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
data=pd.read_csv("clusterdata.csv")
df1=pd.DataFrame(data)
print(df1)
f1 = df1['Distance_Feature'].values
f2 = df1['Speeding_Feature'].values
X=np.matrix(list(zip(f1,f2)))
plt.plot(1)
plt.subplot(511)
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.ylabel('speeding_feature')
plt.xlabel('distance_feature')
plt.scatter(f1,f2)
for i, l in enumerate(kmeans_model.labels_):
plt.plot(f1[i], f2[i], color=colors[l],marker=markers[l])
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('K- Means')
plt.ylabel('speeding_feature')
plt.xlabel('distance_feature')
plt.show()
Output
9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print both
correct and wrong predictions. Java/Python ML library classes can be used for this problem
classifier=KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train,y_train)
y_pred=classifier.predict(x_test)
print('Confusion matrix is as follows')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Matrics')
print(classification_report(y_test,y_pred))
Output
Confusion matrix is as follows
[[18 0 0]
[ 0 11 1]
[ 0 0 15]]
Accuracy Matrics
precision recall f1-score support
10. Implement the non-parametric Locally Weighted Regressionalgorithm in order to fit data points.
Select appropriate data set for your experiment and draw graphs.
def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
ypred = np1.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
#set k here
ypred = localWeightRegression(X,mtip,2)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='blue')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=1)
plt.xlabel('Total bill')
plt.ylabel('Tip')
#plt.show();
Output: