Iris Classification: It Workshop Report (BTCS 305-18)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

IT WORKSHOP REPORT

(BTCS 305-18)

IRIS CLASSIFICATION

Submitted in Partial Fulfilment of the Requirements for

the Award of Degree of

Degree of Bachelor of Technology in Information Technology


s

Submitted to: Heena Wadhwa Submitted by: Shivam Kumar

1802638

Department of Information Technology


Chandigarh Engineering College
ABSTRACT

In Machine Learning, we are using semi-automated extraction of knowledge of


data for identifying IRIS flower species. Classification is a supervised learning in
which the response is categorical that is its values are in finite unordered set. To
simply the problem of classification, scikit learn tools has been used. Here the
problem concerns the identification of IRIS flower species on the basis of flowers
attribute measurements. Classification of IRIS data set would be discovering
patterns from examining petal and sepal size of the IRIS flower and how the
prediction was made from analyzing the pattern to from the class of IRIS flower. In
thiswe train the machine learning model with data and when unseen data is
discovered the predictive model predicts the species using what it has been learnt
from the trained data.

Shivam Kumar Page 2 1802638


CERTIFICATE

This is to certify that the work presented in the thesis entitled “Iris Classification” is
a bonafide record of the work done during the period from Jul, 2019 to Dec, 2019
at Chandigarh Engineering College Landran, by Shivam Kumar (1802638).

The project work is an authentic record of my own work and is carried out
under the supervision and guidance of Heena Wadhwa. The matter presented in
the report has not been submitted elsewhere, wholly or in part, for the award of any
other degree or diploma.

This is to certify that the above statement made by the candidate is correct to the
best of my knowledge.

Ms Heena Wadhwa

Designation

Department of Information & Technology

Dr. Shashi Bhushan

Professor and Head

Department of Information & Technology

Chandigarh Engineering College, Landran, Mohali, PUNJAB

Shivam Kumar Page 3 1802638


ACKNOWLEDGEMENT

It is great pleasure to present this report on the project “IRIS CLASSIFICATION”


undertaken by me as part of our B.tech(CSE) curriculum.

I am thankful to our university and college i.e. Punjab Technical University and
Chandigarh Engineering College Landran Mohali for offering us such a wonderful
challenging opportunity and I express my deepest thanks to all coordinators, for
providing all the possible help and assistance and their constant encouragement.

It is pleasure that I find myself penning down these lines to express my sincere
thanks to the people who helped me along the way in completing my project. I find
inadequate words to express my sincere gratitude towards them.

Shivam Kumar(1802638)

Shivam Kumar Page 4 1802638


Table of contents
Abstract………………................................................................................................................... 2
Cerificate......................................................................................................................................... 3
Acknowledgement ......................................................................................................................... 4
Table of contents ............................................................................................................................5

1. Introduction………………..……………………………………………………..6
2. Motivational Work……………………………………………………..………7-8
3. Block Diagram……………………………………………………………………9
4. Working……………………………………………………………………..10-11

Shivam Kumar Page 5 1802638


1.INTRODUCTION
The Machine Learning is the subfield of computer science, according to Arthur
Samuel in 1959 told “computers are having the ability to learn without being
explicitly programmed”. Evolved from the study of pattern recognition and
computational learning theory in artificial intelligence machine learning explores
the study and construction of algorithms that can learn from and make predictions
on data such algorithms overcome following strictly static program instructions by
making data-driven predictions or decisions, through building a model from sample
inputs. Machine learning is employed in a range of computing tasks where
designing and programming explicitly algorithms with good performance is difficult
or unfeasible; example applications include email filtering, detection of network
intruders, learning to rank and computer vision.

This project focuses on IRIS flower classification using Machine Learning with
scikit tools. The problem statement concerns the identification of IRIS flower
species on the basic of flower attribute measurements. Classification of IRIS data
set would be discovering patterns from examining petal and sepal size of the IRIS
flower and how the prediction was made from analyzing the pattern to form the
class of IRIS flower. In this paper we train the Machine Learning Model with data
and when unseen data is discovered the predictive model predicts the species
using what it has learn from trained data.

Shivam Kumar Page 6 1802638


2.MOTIVATIONAL WORK

2.1 Motivation for the Work:- It is observed from the literature survey that
the existing algorithms face several difficulties like the computational power is
increases when run Deep Learning on latest computation, requires a large
amount of data, is extremely computationally expensive to train, they do not have
explanatory power that is they may extract the best signals to accurately classify
and cluster data, but cannot get how they reached a certain conclusion. Neural
Networks cannot be retrained that is it is impossible to add data later. To address
these problems the current work is taken up to develop a new technique for
Identification of Iris Flower Species using Machine Learning.

2.2 Iris Flower Species:-

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced
by the British statistician and biologist Ronald Fisher in his 1936 paper. The use of
multiple measurements in taxonomic problems as an example of linear
discriminant analysis. It is sometimes called Anderson’s Iris data set because
Edgar Anderson collected the data to quantify the morphologic variation of Iris
Flower of three related species. Two of the three species were collected in Gaspe
Peninsula all from the same pasture, and picked on the same day and measured
at the same time by the same person with same apparatus.

The data set consists of 50 samples from each of three species of Iris that is 1) Iris
Setosa 2) Iris Virginica 3) Iris Versicolor. Four features were measured from each
sample. They are 1) Sepal Length 2) Sepal Width 3) Petal Length 4) Petal Width.
All these four parameters are measured in Centimeters. Based on the combination
of these four features, the species among three can be predicted.

4.3 Problem Definition:- To design and implement the Identification of


Iris Flower species using machine learning using Python and the tool
Scikit-Learn.

Shivam Kumar Page 7 1802638


2.4 Work Carried Out:-
 Data collection: Various datasets of Iris Flower are collected. There are
totally 150 datasets belonging to three different species of Iris Flower
that is Setosa, Versicolor and Virginca.
 Literature survey: Studied various papers related to proposed work.
 Algorithms developed
1. A K-Nearest Neighbor Algorithm to predict the species of Iris Flower.
2. A Logistic Regression Algorithm to predict the species of Iris Flower.

Shivam Kumar Page 8 1802638


5. BLOCK DIAGRAM
The proposed method comprises of sub-phases that is Loading and Modeling as schematic
diagram of the proposed model is given in figure 1.

Shivam Kumar Page 9 1802638


4.WORKING

Step 1: Import the class which is needed from Scikit-learn.


In first case, we import KNeighborsClassifier from Sklearn Neighbors. Sklearn
Neighbors provides functionality for supervised neighbors-based learning
methods. The principle behind nearest neighbor methods is to find a predefined
number of training samples closest in distance to the new point, and predict the
label from these. In second case, we import Logistic Regression from Sklearn
Linear Model module. The module implements generalized linear models. It
includes Ridge regression, Bayesian Regression, Lasso and Elastic Net estimators
computed with Least Angle Regression and coordinate descent. It also implements
Stochastic Gradient Descent related algorithms.
Step 2: Here we Instantiate the Estimator.
Scikit-learn refers its model as Estimator. A estimator is an object that fits a model
based on some training data and is capable of inferring some properties on new
data. It can be, for instance, a classifier or a regressor. Instantiation concerns the
creation of an object that is Instantiate the object “Estimator”. Here in first case,
Instantiate the Estimator means make instance of KNeighborsClassifier Class. The
object here has various parameters that is
KNeighborsClassifier(algorithm='auto',leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=5, p=2, weights='uniform'). Here in
first case, Instantiate the Estimator means make instance of LogisticRegression
Class. The object here has various parameters that is
LogisticRegression(C=1.0,class_weight=None,dual=False,fit_intercept=True,inter
cept_scaling=1,max_iter=100,multi_ class='ovr', n_jobs=1, penalty='l2',
random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False)
Now, there are Objects that knows how to do K-Nearest Neighbor and Logistic
Regression and waiting for user to give data. The name of the Estimator object can
be anything, we can tend to choose the name that reflex the model it represents,
“est” short of estimator or “clf ” short of classifier. The Tuning Parameter that is
Hyper Parameter can be specified at this step. For example, n_neighbors is a
tuning parameter. All the other parameters which are not specified here are set to
their default values. By printing the Estimator object we can get all the parameters
and its values.

Step 3: Fit the Model with Data This is the model training step.
Here the Model learns the relationship between the features X and response y.
Here fit method is used on the object of type KNeighborsClassifier Class and
LogisticRegression Class. The fit method takes two parameters that is the feature
matrix X and response vector y. The model is underfitting or over fitting the training
data. The model is underfitting the training data when the model performs poorly
on the training data. This is because the model is unable to capture the relationship
between the input examples (often called X) and the target values (often called Y).
The model is overfitting your training data when you see that the model performs

Shivam Kumar Page 10 1802638


well on the training data but does not perform well on the evaluation data. This is
because the model is memorizing the data it has seen and is unable to generalize
to unseen examples.

Step 4: Predict the response for a new observation.


In this step, the response is predicted for a new observation. Here a new
observation means “out-of-sample” data. Here, its inputing the measurements for
unknown iris and asking the fitted model to predict the iris species based on what
it has learnt in previous step. The predict method is used on the KNeighbors
Classifier Class object and Logistic Regression Class object and pass the features
of Unknown iris as a Python list. Actually, expects numpy array but it still works
with list since numpy automatically converts it to an array of appropriate shape.
The predict method returns a object of type numpy array with predicted response
value. The model can predict the species for multiple observations at once.

Shivam Kumar Page 11 1802638


5.CONCLUSION
The primary goal of supervised learning is to build a model that “generalizes”.
Here in this project we make predictions on unseen data which is the data not
used to train the model hence the machine learning model built should accurately
predicts the species of future flowers rather than accurately predicting the label of
already trained data.

Shivam Kumar Page 12 1802638


Shivam Kumar Page 13 1802638

You might also like