HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

HEART DISEASE PREDICTION USING MACHINE


LEARNING ALGORITHMS
A Project report submitted in partial fulfillment of the requirements for
the award of the degree of

BACHELOR OF TECHNOLOGY
IN COMPUTER SCIENCE ENGINEERING

Submitted by
Neeraj Yadav – 2100100109011
Aryan Srivastava -- 2100100109005
Vishal Kumar Yadav – 2000100100192
Amir Hamza Khan -- 2100100109003

Under the guidance of


VIVAK PANDEY
Professor
ACKNOWLEDGEMENT
We would like to express our deep gratitude to our project
guide VIVAK PANDEY, Professor, Department of Computer
Science and Engineering, UCER(United College of Engineering
and Research), for her guidance unsurpassed knowledge, and
immense encouragement. We are grateful to Vijay Dwivedi,
Head of the Department, of Computer Science and Engineering,
for providing us with the required facilities for the completion
of the project work.
We are very much thankful to the Management, UCER(United
College of Engineering and Research), for their encouragement
and cooperation in carrying out this work.
We express our thanks to Project Coordinator VIVAK PANDEY,
for her continuous support and encouragement. We thank all
teaching faculty of the Department of CSE, whose suggestions
during reviews helped us in the accomplishment of our project.
We would like to thank Pallavi Shukla of the Department of
CSE, UCER for providing great assistance in the accomplishment
of our project.
We would like to thank our parents, friends, and classmates for
their encouragement throughout our project period. At last but
not the least, we thank everyone for supporting us directly or
indirectly in completing this project successfully.
ABSTRACT .

Machine Learning is used across many ranges around the


world. The healthcare industry is no exception. Machine
Learning can play an essential role in predicting the
presence/absence of locomotor disorders, Heart diseases, and
more. Such information, if predicted well in advance, can
provide important intuitions to doctors who can then adapt
their diagnosis and deal with it on a patient basis.
We work on predicting possible Heart Diseases in people using
Machine Learning algorithms. In this project we perform the
comparative analysis of classifiers like decision tree, Naïve
Bayes, Logistic Regression, SVM, and Random Forest, and we
propose an ensemble classifier that performs hybrid
classification by taking strong and weak classifiers since it can
have multiple numbers of samples for training and validating
the data so we perform the analysis of existing classifier and
proposed classifier like Ada-boost and XG-boost which can give
the better accuracy and predictive analysis.
INTRODUCTION .
Heart disease forecast is one of the most notable points in the
machine learning field for expectation. It clusters the blood in
all aspects of the body. If the blood does not siphon to every
part of the body, at that point, the brain and different organs
will stop working and the person may die. It is hard to recognize
heart disease on account of a few factors, for example,
diabetes, hypertension, high cholesterol, heartbeat rate, and
various other factors. As per the World Health Organization
heart-related diseases are liable for taking 17.7 million lives
every year, 31% of all worldwide. In India, heart disease has
become the main source of mortality. Heart disease killed 1.7
million Indians in 2016, as indicated by the 2016 Worldwide
Weight of Infection Report.
In clinical science coronary illness is one of the huge challenges,
because a lot of parameter and technicality is involved in
predicting this disease. Machine learning could be a superior
decision for accomplishing high precision for heart disease as
well as other diseases and its diverse Information types under
different conditions for predicting the heart disease calculation,
for example, Naive Bayes, Decision Tree, KNN, Neural Networks
are utilized to predict risk of heart algorithm and its specialty
such as Naive Bayes is utilized for predicting heart disease,
while Decision Tree is utilized to give ordered report to the
heart disease, though the Neural Network give chances to limit
the mistake for prediction of heart disease.
OBJECTIVE:
The dataset currently has 14 anatomical factors that are going
to be used to create a machine-learning model. However, when
undergoing assessment for heart disease, there are multiple
factors that need to be reviewed.
Like high cholesterol levels, stress levels play a significant role in
heart disease detection. Food habits constitute the risk of heart
disease.
This study would generate an equation that individuals can use
to assess themselves and understand where they stand in terms
of heart disease. The 14 factors from the dataset are not
empirical and therefore the solution provided by the study
would be one of the solutions for detecting heart disease and
not the only solution.

Keywords: SVM; Naive Bayes; Decision Tree; Random Forest;


Logistic Regression; Adaboost; XG-boost; python programming;
confusion matrix; correlation matrix
SYSTEM CONFIGURATION .

Hardware requirements:
❖Processer : Any Update Processer
❖Ram : Min 4GB
❖Hard Disk : Min 100GB
Software requirements:
❖Operating System : Windows family
❖Technology : Python3.7
❖IDE : Jupiter notebook
WORKING OF SYSTEM .

1. SYSTEM ARCHITECTURE
The system architecture gives an overview of the working
of the system. The working of this system is described as
follows: Dataset collection is collecting data which contains
patient details.
Attributes selection process selects the useful attributes
for the prediction of heart disease. After identifying the
available data resources, they are further selected,
cleaned, made into the desired form. Different
classification techniques as stated will be applied on
preprocessed data to predict the accuracy of heart
disease. Accuracy measure compares the accuracy of
different classifiers.
2. MACHINE LEARNING
In machine learning, classification refers to a predictive
modeling problem where a class label is predicted for a
given example of input data.

●2.1. Supervised Learning


Supervised learning is the type of machine learning in
which machines are trained using well "labeled" training
data, and on the basis of that data, machines predict the
output.
The labeled data means some input data is already tagged
with the correct output. In supervised learning, the
training data provided to the machines work as the
supervisor that teaches the machines to predict the
output correctly.

● Unsupervised learning
Unsupervised learning cannot be directly applied to a
regression or classification problem because, unlike
supervised learning, we have the input data but no
corresponding output data.
The goal of unsupervised learning is to find the underlying
structure of the dataset, group that data according to
similarities, and represent that dataset in a compressed
format.
3. ALGORITHMS
• SUPPORT VECTOR MACHINE (SVM):
Support Vector Machine or SVM is one of the most popular
Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine
Learning.
The goal of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional space
into classes so that we can easily put the new data point in
the correct category in the future. This best decision
boundary is called a hyperplane.SVM chooses the extreme
points/vectors that help in creating the hyperplane. These
extreme cases are called support vectors, and hence the
algorithm is termed as Support Vector Machine.

• NAIVE BAYES ALGORITHM:


Naive Bayes algorithm is a supervised learning algorithm,
which is based on Bayes theorem and used for solving
classification problems.It is mainly used in text
classification that includes a high-dimensional training
dataset.
Naive Bayes Classifier is one of the simple and most
effective Classification algorithms which helps in building
the fast machine learning models that can make quick
predictions.
It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object. Some popular
examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

• DECISION TREE ALGORITHM:


Decision Tree is a Supervised learning technique that can
be used for both classification and regression problems,
but mostly it is preferred for solving classification
problems.
It is a tree-structured classifier, where
internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node
represents the outcome.
In a Decision Tree, there are two nodes, which are the
Decision Node and Leaf Node.
Decision nodes are used to make any decision and have
multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
The decisions or the test are performed on the basis of
features of the given dataset.

• RANDOM FOREST ALGORITHM:


Random Forest is a supervised learning algorithm. It is an
extension of machine learning classifiers which include the
bagging to improve the performance of Decision Tree. It
combines tree predictors, and trees are dependent on a
random vector which is independently sampled.
The distribution of all trees are the same. Random Forests
splits nodes using the best among of a predictor subset that are
randomly chosen from the node itself, instead of splitting nodes
based on the variables. The time complexity of the worst case
of learning with Random Forests is O(M(dnlogn)) , where M is
the number of growing trees, n is the number of instances, and
d is the data dimension. It can be used both for classification
and regression. It is also the most flexible and easy to use
algorithm.
A forest consists of trees. It is said that the more trees
it has, the more robust a forest is. Random Forests create
Decision Trees on randomly selected data samples, get
predictions from each tree and select the best solution by
means of voting. It also provides a pretty good indicator of
the feature importance.

• LOGISTIC REGRESSION ALGORITHM:


Logistic regression is one of the most popular Machine
Learning algorithms,which comes under the Supervised
Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent
variables. Logistic regression predicts the output of a
categorical dependent variable.
Therefore the outcome must be a categorical or
discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1,
it gives the probabilistic values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression
except that how they are used. Linear Regression is used for
solving Regression problems, whereas logistic regression is used
for solving the classification problems.

• ADABOOST ALGORITHM:
Adaboost was the first really successful boosting algorithm
developed for the purpose of binary classification.
Adaboost is short for Adaptive Boosting and is a very
popular boosting technique which combines multiple
“weak classifiers” into a single “strong classifier”
Algorithm:
1. Initially, Adaboost selects a training subset randomly.
2. It iteratively trains the Adaboost machine learning
model by selecting the training set based on the accurate
prediction of the last training.
3. It assigns the higher weight to wrong classified
observations so that in the next iteration these
observations will get the high probability for classification.
4. Also, it assigns the weight to the trained classifier in
each iteration according to the accuracy of the classifier.
The more accurate classifier will get high weight.
5. This process iterates until the complete training data fits
without any error or until reached to the specified
maximum number of estimators.

• XGBOOST ALGORITHM:
XG-boost is an implementation of Gradient Boosted
decision trees. It is a type of Software library that was
designed basically to improve speed and model
performance. In this algorithm, decision trees are created
in sequential form. Weights play an important role in XG-
boost. Weights are assigned to all the independent
variables which are then fed into the decision tree which
predicts results. Weight of variables predicted wrong by
the tree is increased and these the variables are then fed
to the second decision tree. These individual
classifiers/predictors then assemble to give a strong and
more precise model.
It can work on regression, classification, ranking, and user-
defined predict. Regularization: XG-boost has in-built L1
(Lasso Regression) and L2 (Ridge Regression) regularization
which prevents the model from overfitting. That is why,
XG-boost is also called regularized form of GBM (Gradient
Boosting Machine). While using Scikit Learn libarary, we
pass two hyper-parameters (alpha and lambda) to XG-
boost related to regularization. alpha is used for L1
regularization and lambda is used for L2 regularization.
References:-

[1] Effective heart disease prediction using hybrid


machine learning techniques:-Senthil Kumar
Mohan,chandrasegar Thirumalai, and Gautam Srivastava
-(19 JUNE 2019).

[2] Prediction of Heart Disease Using Machine Learning:-


Aditi Gavhane, Gouthami Kokkula, Isha Pandya, Prof.
Kailas Devadkar (PhD) 2018.

[3] Prediction of Heart Disease at an early stage using


Data Mining and Big Data Analytics: A Survey:- Salma
Banu N.K, Suma Swamy.-2019.

[4] Prediction of Heart Disease Using Machine Learning


Algorithms. :- Mr.Santhana Krishnan.J, Dr.Geetha.S 2019.

You might also like