Final Report
Final Report
Final Report
Prediction
A Project work-I Report
Submitted in partial fulfillment of requirement of the
Degree of
BACHELOR OF TECHNOLOGY in COMPUTER
SCIENCE & ENGINEERING
BY
EN19CS301013 EN19CS301014 EN19CS301015
ABHISHEK RANA ABHISHEK SODHIYA ABHISHEKTIWARI
ii
Report Approval
Internal Examiner
Name:
Designation:
Affiliation
External Examiner
Name:
Designation
Affiliation
iii
Declaration
I/We hereby declare that the project entitled “Chronic Kidney Disease
Prediction”submittedin partial fulfillment for the award of the degree of Bachelor
of Technology in ‘Computer Science & Engineering’ completed under the
supervision of Dr. Harsh Pratap Singh Computer Science department , Faculty
of Engineering, Medi-Caps University Indore is an authentic work.
Further, I/we declare that the content of this Project work, in full or in parts, have
neither been taken from any other source nor have been submittedto any other
Institute or University for the award of any degree or diploma.
iv
Certificate
I/We, Dr. Harsh Pratap Singh certify that the project entitled “Chronic Kidney
Disease Prediction” submittedin partial fulfillment for the award of the degree of
Bachelor of Technology byAbhishek Rana, Abhishek Sodhiya, Abhishek
Tiwariistherecordcarried out by him/them under my/our guidance and that the
work has not formed the basis of award of any other degree elsewhere.
________________________________ ________________________________
_____________________
v
Acknowledgements
I would like to express my deepest gratitude to Honorable Chancellor, Shri R C Mittal, who
has provided me with every facility to successfully carry out this project, and my profound
indebtedness to Prof. (Dr.) Dileep K.Patnaik, Vice Chancellor, Medi-Caps University, whose
unfailing support and enthusiasm has always boosted up my morale. I also thank Prof. (Dr.) D
K Panda,Pro Vice Chancellor, Dr. Suresh Jain, DeanFaculty of Engineering, Medi-Caps
University, for giving me a chance to work on this project. I would also like to thank my Head
of the Department Dr. Pramod S. Nairfor hiscontinuous encouragement for betterment of the
project.
Abhishek Rana
Abhishek Sodhiya
Abhishek Tiwari
B.Tech. IV Year
Department of Computer Science & Engineering
Faculty of Engineering
Medi-Caps University, Indore
v
Executive Summary
An executive summary is a concise summary of a project report. It restates the purpose of the
report, it highlightsthe major points of the report, and it describes any results, conclusions, or
recommendations from the report.An executive summary should be aimed at an audience that
is interested in and wants to learn more about the purpose of the main project report.
vi
Table of Contents
Page No.
Front Page
Report Approval ii
Declaration iii
Certificate iv
Acknowledgement v
Executive Summary vi
Table of Contents vii
List of figures viii
List of tables ix
Abbreviations x
Notations & Symbols xi
Chapter 1 Introduction (Whichever is applicable)
1.1 Introduction 1
1.2 Literature Review
1.3 Objectives
1.4 Significance
1.5 Research Design
1.6 Source of Data
1.7 Chapter Scheme
Chapter 2 (Report on Present Investigation)
2.1 Experimental Set-up
2.2 Procedures Adopted
2.3
Chapter 3 (Main Chapter)
3.1
Chapter 4 Main Chapter (Optional)
Chapter 6 Main Chapter (Optional)
Chapter 6 Main Chapter (Optional)
Chapter 7 Results and Discussions
Chapter 8 Summary and Conclusions
Chapter 9 Future scope
Appendix
Bibliography
List of Publications (If any)
Reprints of publications(If any)
Note Results and Discussions may not be a separate chapter it may be
included in main chapter
vii
List of Figures
viii
List of Tables
ix
Abbreviations
x
Chapter – 1
Introduction
1.1 Introduction
Chronic kidney disease (CKD) is a global health problem with high mortality rate, and it induces
other diseases. Since there are no obvious symptoms during the early stages of CKD, patients often
fail to notice the disease. Early detection of CKD enables patients to receive timely treatment to
ameliorate the progression of this disease. The new analysis suggests that in 2017, the global
prevalence of CKD was 9.1% (697.5 million cases). The age-standardized global prevalence of
CKD was higher in women and girls (9.5%) than in men and boys (7.3%). In this study, we
propose a machine learning methodology for diagnosing CKD. In this study, we will be using six
Machine Learning algorithms (logistic regression, random forest, support vector machine, k-
nearest neighbour, naïve Bayes classifier and feed forward neural network) to establish the model.
Chronic kidney disease (CKD) is a global health problem with high mortality rate, and it induces
other diseases. Since there are no obvious symptoms during the early stages of CKD, patients often
fail to notice the disease. Early detection of CKD enables patients to receive timely treatment to
ameliorate the progression of this disease. The new analysis suggests that in 2017, the global
prevalence of CKD was 9.1% (697.5 million cases). The age-standardized global prevalence of
CKD was higher in women and girls (9.5%) than in men and boys (7.3%). In this study, we
propose a machine learning methodology for diagnosing CKD. In this study, we will be using six
Machine Learning algorithms (logistic regression, random forest, support vector machine, k-
nearest neighbour, naïve Bayes classifier and feed forward neural network) to establish the model.
11
1.2 Literature Review
Machine Learning techniques have been successful in predicting and diagnosing various diseases.
A lot of computational techniques have been previously used to predict different kidney diseases.
Swathi Baby &Panduranga, 2015 used machine learning algorithms like ADTrees, Naive Bayes,
Random Forest to predict kidney disease. Naive Bayes showed a better accuracy rate compared to
other methods used.
Abhinandan Dubey, 2015, used the K-Means Clustering Algorithm (Llyod's Algorithm) to form
clusters based on varying probability of suffering from chronic kidney disease. Upon applying the
model, 3 clusters were obtained based on the probability of the suspect being prone to CKD.
Ruey Kei Chiu & Renee Yu-Jing, 2011, implemented neural network models, which included
backpropagation NN, generalized feed-forward NN, and modular NN to predict kidney diseases.
All of the three models gave an accuracy greater than 85%.
12
1.3 Objectives
1.In this project, we present machine learning techniques for predicting whether a person is
suffering from chronic kidney disease or not using clinical data.
2. Predictive analytics for healthcare using machine learning is a challenging task to help doctors
decide the specific treatments for saving lives.
3. Various machine learning methods have been explored- including logistic regression, Naive
Bayes classifier, K-nearest neighbors (KNN) classifier, Support Vector Machines (SVM) with
various kernels such as linear, polynomial and RBF (radial basis function), Random Forest
classifier, decision tree classifiers and Multi-Layer Perceptron (MLP) classifier, Neural Networks
implemented using TensorFlow. For improving these learning techniques, various cross-validation
methods have been used.
4. We have compared the performance of these models on the clinical dataset obtained. Different
evaluation metrics such as accuracy score, confusion matrix, precision, recall, f1-score, AUC,
ROC have been used for the same.
6. The objective states the algorithm processing of a training set containing a set of attributes and
Chronic kidney disease, generally referred to as chronic kidney failure, represents the progressive
deterioration of kidney function.
7. The kidneys filter wastes, toxins, and excess fluids from blood, which are then excreted in the
form of urine. As chronic kidney disease worsens and reaches an advanced stage, dangerous levels
of fluid, electrolytes, toxins, and waste can develop in the human body. Treatment for chronic
kidney disease centers on decreasing the progression of kidney damage- usually by regulating the
underlying cause.
8. Chronic kidney disease can advance to end-stage kidney failure, which is fatal without artificial
filtering (dialysis) or a kidney transplant.
13
1.4 Source of data
The CKD data set is obtained from the University of California Irvine (UCI) machine learning
repository, which has a large number of missing values.
1. Logistic Regression:
Logistic regression is an example of supervised learning. It is used to calculate or predict
the probability of a binary (yes/no) event occurring. An example of logistic regression
could be applying machine learning to determine if a person is likely to be infected with
Chronic Kidney Disease Or Not.
2. Support Vector:
Support Vector Machine (SVM) is a supervised machine learning algorithm used for
both classification and regression.
3. Random Forest:
Random Forest Algorithm is a supervised machine learning algorithm which is extremely
popular and is used for Classification and Regression problems in Machine Learning. We
know that a forest comprises numerous trees, and the more trees more it will be robust.
Similarly, the greater the number of trees in a Random Forest Algorithm, the higher its
accuracy and problem-solving ability.
14
Chapter - 2
2.1 Data Set
The dataset has been obtained from the UCI Machine Learning Repository named Chronic Kidney
Disease (uploaded in 2020). The 400 instances present in this dataset have been collected from
Apollo hospital in Tamil Nadu, India over two months. It has 25 attributes-- 11 numeric and 14
nominal.
https://drive.google.com/drive/folders/1NRovPPHEMA-POQJIbObw9
XCH8S3nIiyv?usp=sharing
15
2.2 Data PREPROCESSING AND FEATURE GENERATION -
The data is stored in the form of a CSV file with 24 features and an
output variable named- "Class" which has value, 'ckd' or 'not ckd'
(binary classification). This dataset also has null values which are
and in case of string values, we have replaced them with the string
removing the null values to make our data numerical and fitting it
columns).
16
2.3 Experimental Setup
:-Technology Used:
Python (Machine Learning)
:-Libraries Used:
Numpy: It is used for doing set of operations on array etc.
Pandas: It is used Data Manipulation.
Sklearn: The sklearn library contains a lot of efficient tools for machine
learning and statistical modeling including classification, regression,
clustering and dimensionality reduction.
17
Chapter - 3
3.1 Result and Discussion
In this study, I developed and evaluated a series of artificial intelligence-based models considering
minimum variables such as sex, age, symptoms, and medications. These models predict patients’
likelihood of having chronic kidney disease. Among various models tested, support vector
classifier (SVC) performed best, with a prediction score of 0.9937.
This work examines the ability to detect CKD using machine learning algorithms while
considering the least number of tests or features. I approached this by applying five machine
learning classifiers: logistic regression, SVM, random forest, and decision tree classifier and K
nearest neighbors on a small dataset of 400 records. In order to reduce the number of features and
remove redundancy, the association between variables have been studied. A filter feature selection
method has been applied to the remaining attributes and found that there are hemoglobin, albumin,
and specific gravity have the most impact to predict the CKD.
The classifiers have been trained, tested, and validated using 5-fold cross-validation. Higher
performance was achieved with the gradient boosting algorithm by accuracy score of 0.994
The performance of the above models have been compared through various evaluation metrics:
1) Accuracy
2) Precision
3) Recall
4) F1-score
18
Chapter - 4
4.1 Summary AndConclusion
This Project aimed to observe and analyze the results obtained by applying
different machine learning algorithms in the medical field in order to predict
chronic kidney failure. Data mining definitely yields good results when applied
appropriately along with tools and techniques that can enhance its performance in
diagnosing the disease. We can explore more Deep Learning models in
TensorFlow / Pytorch to increase accuracy. We can work on pictorial data and fit
it onto CNN models
1
Chapter – 5
5.1 Future Scope
CKD is extremely common and has emerged as one of the leading
noncommunicable causes of death worldwide. It is projected to affect an
increasing number of individuals over time and to further rise in importance
among the various global causes of death. CKD affects populations in different
regions of the world unequally, likely as a result of differences in population
demographic characteristics, their comorbidities, and access to health care
resources. The common nature and devastating effects of CKD should prompt
major efforts to develop and implement effective preventative and therapeutic
efforts aimed at lowering the development of CKD and slowing its progression.
Therefore the future scope of this project is very high as CKD is one of the most
common disease found in humans and increasing day by day.