Diabetics Prediction Using KNN Algorithm

1
DIABETICS PREDICTION USING KNN ALGORITHM
PROJECT REPORT
submitted in partial fulfillment of the requirements

for the award of the degree in
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING
B. NAGARJUNA REDDY (181061101026)

CH. VENKATA NILESH (181061101032)
B. RAGHUNATH REDDY (181061101023)
DEPARTMENT OF
COMPUTER SCIENCE AND ENGINEERING
MARCH 2022
2
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of Mr./Ms. B. NAGARJUNA
REDDY (181061101026), CH. VENKATA NILESH (181061101032) And B.
RAGHUNATH REDDY (181061101023) who carried out the project entitled “DIABETES
PREDICTION USING MACHINE LEARNING” under our supervision from January 2022 to May
2022
Internal Guide Project Coordinator HOD

SARVANAN Dr. G. VICTO SUDHA GEORGE Dr. S. GEETHA
ELUMULAI Dr. V. RAMESH BABU
Assistant Prefessor Professors Professor
CSE Department CSE Department CSE Department

Dr.M.G.R.Educational Dr.M.G.R.Educational Dr.M.G.R.Educational
and Research Institute and Research Institute and Research Institute
Deemed to be University Deemed to be University Deemed to be
University
Submitted for Viva Voce Examination held on_________________
Internal Examiner External Examiner

(Name in Capital letters (Name in Capital letters
with Signature) with signature)
3
DECLARATION FORMAT
WE B. NAGARJUNA REDDY (181061101026), CH. VENKATA NILESH

(181061101032)
And B. RAGHUNATH REDDY (181061101023) hereby declare that the Project Report
entitled
“DIABETES PREDICTION USING MACHINE LEARNING” is done by us under the guidance
of Dr./Prof./Mr. SARAVANAN ELUMULAI is submitted in partial fulfillment of the requirements
for the award of the degree in BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE AND
ENGINEERING.
DATE:
PLACE: MADHURAVOYAL SIGNATURE OF THE CANDIDATE

4
ACKNOWLEDGEMENT
We would like to express our thanks and gratitude to our beloved Founder Chancellor
Thiru. A. C. SHANMUGAM, B.A., B.L. and our Honourable President Er. A. C. S.
ARUN KUMAR, B.E for all the encouragement and extended support to us during the
tenure of this project and also our years of studies in this university.
We would like to thank our Secretary Thiru A. RAVI KUMAR for his constant support
and encouragement.
We would like to express our gratitude to our Vice Chancellor Dr. S.

GEETHALAKSHMI for her valuable support and encouragement.
We express our profound gratitude to our Dean (E&T) Dr. N. ETHIRAJ for his valuable
encouragement and proving a necessary support and facility to carry out the project
successfully.
We express our heartfelt and sincere thanks to our Head of the Department Prof. Dr. S.
GEETHA, who has been actively involved and very Influential from the start till the
successful completion of our project.
Our sincere thanks to our Project Coordinators Dr .G. VictoSudha George/ Dr. V.
Ramesh Babu, and Project Supervisor Saravanan Elumalai for their continuous guidance
and encouragement throughout this Work, which has made the project a success
We would also like to thank all the Teaching and Non – Teaching Staffs of the
Department of Computer Science Engineering for their constant support and
encouragement throughout our study years.
5
Table of Contents
LIST OF FIGURES ............................................................................................................ iii

LIST OF TABLES ................................................................................................................ iv
LIST OF ABBREVIATIONS .................................................................................................
ABSTRACT ........................................................................................................................... vi
CHAPTER 1 - Introduction .................................................................................................
1.1 DIABETICS ........................................................................................................................ 1
1.2 Types of Diabetics ...................................................................................................… 2
1.3 Symptoms of Diabetics ................................................................................................ 2
1.4 Cause of diabetics……………………………………………………………………. 2
CHAPTER 2
LITERATURE SURVEY ….…………………………………………………………. 3
CHAPTER 3
PROJECT ANALYSIS .................................................................................. 5

3.1 Scope (Aim) ................................................................................................................ 5
3.2 Proposed Work ..........................................................................................................
3.3 Motivation ................................................................................................................. 6
3.4 Learning Objectives .................................................................................................... 6
3.5 Interference……………………………………………………………………… 7
CHAPTER 4
REQUIREMENT ANALYSIS .................................................................. 8
4.1 Requirements ............................................................................................................... 8

4.1.1 Software Requirements.............................................................................................. 8
4.1.1.1 Python 3 ................................................................................................................. 8
4.1.1.2 Python Packages .................................................................................................... 9
4.1.1.2.1 NumPy ................................................................................................................. 9

6
4.1.1.2.2 Scikit-Learn ........................................................................................................ 9
4.1.1.2.3 pandas ................................................................................................................. 10
4.1.1.2.4 matplotlib ............................................................................................................ 10
4.1.1.3 Kaggle .................................................................................................................... 10
4.1.1.4 Jupyter Notebook (Anaconda3) .............................................................................11

4.1.2 Hardware Requirements ........................................................................................... 11
4.2 UML Diagrams ................................................................................................................ 12

4.2.1 System Architecture Diagram ............................................................................... 12
4.2.2 Flow Diagram ........................................................................................................ 13
4.2.3 Sequence Diagram ..................................................................................................14
4.2.4 Use-case Diagram .................................................................................................. 15
4.3 Module Description ......................................................................................................... 16
4.3.1 Data pre processing ............................................................................................... 16
4.3.2 Classification Modelling ........................................................................................ 16
4.4 Algorithm .......................................................................................................................... 17
4.4.1 KNN ALGORITHM .......................................................................................... 17
CHAPTER 5
IMPLEMENTATION ................................................................................. 18
5.1 Implementation of Training Module ..............................................................................18
5.2 Methodology ............................................................................................................... 20
5.2.1 Data pre-processing ............................................................................................. 20
5.2.2 Feature selection .................................................................................................. 20
5.2.3 Classification Modelling ..................................................................................... 20
5.2.4 Performance Measure .......................................................................................... 20
CHAPTER 6
RESULT ...................................................................................................... 21
6.1 Results ................................................................................................................... 21
CHAPTER 7
CONCLUSION ........................................................................................... 25
7.1 Future Scope ...................................................................................................................25
7
7.2 Conclusion ..................................................................................................................... 25

.................................................................................................................................................
....................................................................................................................................................
..................................................................................................................................................
References .......................................................................................................................... 26
Tables of Contents
Figure No. Figure Name Page No.
3.2 Steps followed 6
4.1 Block Diagram 12
4.2 Flow diagram 13
4.3 Sequence diagram 14
4.4 Use-case diagram 15
6.1 Code (Training Phase) 21

8
6.2 Code (Training Phase) 22
6.3 Output for training phase 22
6.4 Testing phase code and output 23

(with diabetics
6.5 Scan used in figure 6.4 to test 24

the model (without
diabetics)
9
ABSTRACT
Abstract: -- Diabetes is an illness caused because of high glucose level in a human body.
Diabetes should not be ignored if it is untreated then Diabetes may cause some major
issues in a person like: heart related problems, kidney problem, blood pressure, eye
damage and it can also affects other organs of human body. Diabetes can be controlled if it
is predicted earlier. To achieve this goal this project work we will do early prediction of
Diabetes in a human body or a patient for a higher accuracy through applying, Various
Machine Learning Techniques. Machine learning techniques Provide better result for
prediction by constructing models from datasets collected from patients. In this work we
will use Machine Learning Classification and ensemble techniques on a dataset to predict
diabetes. Which are K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree
(DT), Support Vector Machine (SVM), Gradient Boosting (GB) and Random Forest (RF).
The accuracy is different for every model when compared to other models. The Project
work gives the accurate or higher accuracy model shows that the model is capable of
predicting diabetes effectively. Our Result shows that K-Nearest Neighbor achieved higher
accuracy compared to other machine learning techniques.
Keywords: diabetes, KNN algorithm , accuracy, glucose, ensemble

10
CHAPTER 1
INTRODUCTION
Diabetes is noxious diseases in the world. Diabetes caused because of obesity or high
blood glucose level, and so forth. It affects the hormone insulin, resulting in abnormal
metabolism of crabs and improves level of sugar in the blood. Diabetes occurs when body
does not make enough insulin. According to (WHO) World Health Organization about 422
million people suffering from diabetes particularly from low or idle income countries.
And this could be increased to 490 billion up to the year of 2030. However prevalence of
diabetes is found among various Countries like Canada, China, and India etc. Population
of India is now more than 100 million so the actual number of diabetics in India is 40
million. Diabetes is major cause of death in the world. Early prediction of disease like
diabetes can be controlled and save the human life. To accomplish this, this work explores
prediction of diabetes by taking various attributes related to diabetes disease. For this
purpose we use the Pima Indian Diabetes Dataset, we apply various Machine Learning
classification and ensemble Techniques to predict diabetes. Machine Learning Is a method
that is used to train computers or machines explicitly. Various Machine Learning
Techniques provide efficient result to collect Knowledge by building various classification
and ensemble models from collected dataset.
Such collected data can be useful to predict diabetes. Various techniques of Machine
Learning can capable to do prediction, however it’s tough to choose best technique. Thus
for this purpose we apply popular classification and ensemble methods on dataset for
prediction. There are a few AI strategies that are utilized to perform prescient analysis of
big data in a variety of domains. Prognostic health examination a daunting work, but it can
ultimately aid experts to make timely and well-versed judgments about the health and
handling of patients. The core objective of the study is to assist physicians and experts in
the primary diagnosis of diabetes by means of ML methods.
11
The comparison of the various ML methods used here shows which methodology is well
matched for diabetes forecast. The main objective is to detect innovative trends and then
analyze these patterns in order to provide users with relevant and useful knowledge. This
paper means to help specialists and professionals in early forecast of diabetes utilizing AI
methods.
Figure 1: Classification of Diabetes and non-diabetes

12
CHAPTER 2
LITERATURE REVIEW
K.VijiyaKumar et al. [11] proposed random Forest algorithm for the Prediction of
diabetes develop a system which can perform early prediction of diabetes for a patient
with a higher accuracy by using Random Forest algorithm in machine learning technique.
The proposed model gives the best results for diabetic prediction and the result showed
that the prediction system is capable of predicting the diabetes disease effectively,
efficiently and most importantly, instantly. Nonso Nnamoko et al. [13] presented
predicting diabetes onset: an ensemble supervised learning approach they used five widely
used classifiers are employed for the ensembles and a meta-classifier is used to aggregate
their outputs. The results are presented and compared with similar studies that used the
same dataset within the literature. It is shown that by using the proposed method, diabetes
onset prediction can be done with higher accuracy. Tejas
N. Joshi et al. [12] presented Diabetes Prediction Using Machine Learning Techniques
aims to predict diabetes via three different supervised machine learning methods in-
cluding: SVM, Logistic regression, ANN. This project pro- poses an effective technique
for earlier detection of the diabetes disease. Deeraj Shetty et al. [15] proposed diabetes
disease prediction using data mining assemble Intelli- gent Diabetes Disease Prediction
System that gives analysis of diabetes malady utilizing diabetes patients database. In this
system, they propose the use of algorithms like Bayesian and KNN (K-Nearest Neighbor)
to apply on diabetes patients database and analyze them by taking various attributes of
diabetes for prediction of diabetes disease.
Muhammad Azeem Sarwar et al. [10] proposed study on prediction of diabetes using
machine learning algorithms in healthcare they applied six different machine learning
algorithms Performance and accuracy of the applied algorithms is discussed and
compared. Comparison of the different machine learning techniques used in this study
reveals which algorithm is best suited for prediction of diabetes. Diabetes Prediction is
becoming the area of interest for researchers in order to train the program to identify the
patient are diabetic or not by applying proper classifier on the dataset. Based on previous
research work, it has been observed that the classification process is not much improved.
13
Hence a system is required as Diabetes Prediction is important area in computers, to

handle the issues identified based on previous research.
DeerajShetty et.al. [17] suggested expectation using data-mining amass Intelligent

Diabetes Disease Prediction System which contributes exploration of diabetes. Here, they
use Bayesian and KNN and study them by captivating several characteristics for forecast
of diabetes disease.
NonsoNnamoko et al. [9] offered envisaging diabetes onset: a collaborative supervised

learning method. The outcomes are obtainable and related with analogous methods that
used the same dataset in the works.
14
CHAPTER 3
SYSTEM ANALYSIS
3.1 Existing System
The comparison of the various ML methods used here shows which methodology is well
matched for diabetes forecast. The main objective is to detect innovative trends and then
analyze these patterns in order to provide users with relevant and useful knowledge. This
paper means to help specialists and professionals in early forecast of diabetes utilizing AI
methods. In the current situation, medical services experts frequently apply the Fasting
Plasma Glucose (FPG) test or the A1C test to scrutinize diabetes. Nowadays, a
Randomized Plasma Glucose (RPG) test might be utilized. Your medical care proficient
may search for specific antibodies to see whether you have diabetes. Your medical care
proficient should draw your blood for this test. This one has the accompanying
disadvantages
❖ Finger stabbing pain.

❖ Worry vis-à-vis one's control of blood sugar and state of health.
❖ Time waste for one who stands at a hospital for a long time to check the level of blood
glucose.
3.2 Proposed System
In this research, the prediction of diabetes disease is based on ML. algorithms. In this
investigation, a correlation of the different AI procedures utilized shows which calculation
is obviously appropriate for diabetes forecast. Goal of the paper is to investigate for model
to predict diabetes with better accuracy. We experimented with different classification and
ensemble algorithms to predict diabetes. Machine Learning also helps in medical field to
detect diseases such as diabetes which has affected various people from different
countries. Insulin is main concept while taking into consideration the term ‘Diabetes’.
Insulin acts as glucose for energy. It is a Gateway to body cells and controls glucose level
in our body. Diabetes is a disease in which level of glucose in blood increases. To make it
15
easy and recover from most early stages, prediction is necessary. It is been done with the
help of Machine Learning.
16
CHAPTER 4
IMPLEMENTATION
4.1 Modules
❖ Data Pre-Processing
❖ Feature Selection
❖ Classification Modelling
❖ Performance Measures.
4.1.1 Data Pre-Processing
Subsequent to preparing of various archives, diabetes infection information is pre-handled.

A sum of 769 patient records are remembered for the dataset, where 6 records make them
miss esteems. Such 6 records have been erased from the dataset, and pre-handling utilizes
the leftover 763 patient records.
4.1.1.1 Missing Values removal: Remove all the instances that have zero (0) as worth.
Having zero as worth is not possi- ble. Therefore this instance is eliminated. Through
elimi- nating irrelevant features/instances we make feature subset and this process is called
features subset selection, which reduces diamentonality of data and help to work faster.
4.1.1.2 .Splitting of data: After cleaning the data, data is nor- malized in training and
testing the model. When data is spitted then we train algorithm on the training data set and
keep test data set aside. This training process will produce the training model based on
logic and algorithms and values of the feature in training data. Basically aim of normal-
ization is to bring all the attributes under same scale.
4.1.2 Feature Selection

One attribute pertaining to age is used among the 8 attributes of the data set to classify the
patient's personal details. As they contain essential clinical records, the remaining 7
17
attributes are considered significant. Medical records are important for the diagnosis and
learning of diabetes disease severity
4.1.3 Classification Modelling

The social event of datasets is completed based on the Decision Tree (DT) work factors
and standards. At that point, to assess its consistency, each of the classifiers is applied to
grouped dataset. In light of their low pace of blunder, the top performing models are
grouped from the above information.
❖ Decision Trees Classifier
❖ Support Vector Classifier
❖ Random Forest Classifier
❖ Logistic Regression
❖ K Nearest neighbors
❖ Naive Bayes
4.1.4 Performance Measures

For the count of the presentation adequacy of this model, numerous standard exhibition
measurements, for instance, exactness, accuracy and blunder in characterization have been
considered for the calculation of execution adequacy of this model.
4.1.5 K-Nearest Neighbor
KNN is also a supervised machine learning algorithm. KNN helps to solve both the
classification and regression problems. KNN is lazy prediction technique.KNN assumes
that similar things are near to each other. Many times data points which are similar are
very near to each other.KNN helps to group new work based on similarity measure.KNN
algorithm record all the records and classify them according to their similarity measure.
For finding the distance between the points uses tree like structure. To make a prediction
18
for a new data point, the algorithm finds the closest data points in the training data set its
nearest neighbors. Here K= Number of nearby neighbors, its always a positive integer.
Neighbors value is chosen from set of class. Closeness is mainly defined in terms of
Euclidean distance. The Euclidean distance between two points P and Q i.e. P (p1,p2, .
Pn) and Q (q1, q2,..qn) is defined by the following equation:-
Algorithm-
• Take a sample dataset of columns and rows named as Pima Indian Diabetes data set.
• Take a test dataset of attributes and rows.
• Find the Euclidean distance by the help of formu- la-
• Then, decide a random value of K. is the no. of nearest neighbors
• Then with the help of these minimum distance and Euclidean distance find out the
nth column of each.
• Find out the same output values.
• If the values are same, then the patient is diabetic, other- wise not.
19
CHAPTER 5
SYSTEM SPECIFICATION
5.1 HARDWARE REQUIREMENTS:

❖ System : Pentium IV 2.4 GHz.
❖ Hard Disk : 40 GB.
❖ Monitor : 14’ Colour Monitor.
❖ Mouse : Optical Mouse.
❖ Ram : 512 Mb to 2Gb Min.
5.2 SOFTWARE REQUIREMENTS:

❖ Operating system : Windows 7 Ultimate.
❖ Coding Language : Python.
20
CHAPTER 6
SOFTWARE ENVIRONMENT
6.1 PYTHON
Python is incredibly simple to inspect. As a deciphered language, it doesn't change code to

get PC significant. Python is moreover a raised level, all around supportive programming
language. Planners sorted out it to change into a chameleon of the programming scene.
A couple of enchanting genuine elements, addressing the certifiable effect of this
language, and what is Python utilized for:
• The watched BitTorrent began as a Python Program.
• The NSA applies Python for information appraisal and cryptography.
• Developers framed Youtube utilizing Python (among different dialects).
• Google isn't any progressively irregular to Python in like manner: the affiliation based its
praised web search framework on it.
Teaching Machines to Learn

PC based insight is a unique idea. It improves personalization and future inclination
gauges. In the most recent decade, man-made reasoning has changed particular industry
fields. It gave an open entryway for new, incomprehensible improvements to ascend from
nothing. Without a doubt, not anything: Python.
21
Making modernized reasoning empowered programming sounds tangled. PC based

knowledge with Python instructs PCs to get from express models and review them, in like
way like individuals educate kids. In addition, Python AI is set up for making figures,
evaluating potential answers, thusly amazingly more!
Man-made reasoning is driven by the improvement of neural structures, one of the
musings that answer an issue of what is Python utilized for. In the least complex terms,
Python neural system is a structure including estimations subject to the human mind. With
Python, engineers make induced structures and use them to cause machines to learn by
taking a gander at models.
For what reason is Python the Best Programming Language for AI?
• The natural course of action of Python unequivocally underpins the formation of AI and
ML. There are some particularly kept up assets and instructional exercises. They give bits
of information concerning which Python libraries to use for modernized reasoning and
critical learning.
• Another enormous issue for what is Python utilized for is information the board.
Appropriately directing information in the present time of bleeding edge improvement is
major. Individuals are obliged in this significant, man-made insight is set up for arranging
huge extents of complex information with high productivity and lower creation costs.
• Since the emphasis of Python looks like English, it is sensibly progressively direct to
learn. Additionally, this language licenses preparing and overseeing complex frameworks.
Past Successful Python AI Projects

22
Making Python AI has as of late been shown to be altogether profitable. The voyaging
business was refreshed when Skyscanner applied a free Python AI calculation. Expecting
essentially no effort and high-sufficiency, it evaluated the lead of new plane courses and
wrapped up potential targets for adventurers.
Another model, indicating that Python is the best programming language for AI, is its
relationship of the human organizations part. Python AI undertakings are upsetting
infection want and injury territory, making it less hard to follow patients' success and deal
with it.
Likewise, Python urges thriving related applications to make. AiCure is one of the open
advantageous applications that ensures patients recognize their prescriptions as grasped.
This model is really what is Python utilized for: to improve advancement and our lives.
On this chance that you are essentially beginning to find a few solutions concerning AI in
Python, it is impeccable to begin investigating the potential outcomes with the Keras
library. It gives an improved change of making Python neural structures. Beginning there
forward, you should start looking into TensorFlow, PyTorch, or Theano.
How to Make a Bot with Python?

Bots are programs for performing unequivocal assignments over the Internet. Such
applications execute dreary activities a lot snappier than people.
For example, Twitter is from time to time the objective of bots, sending the equivalent or
relative messages a hundred times each day. By the by, bots can also be noteworthy for
23
explicit or any help as they can make reactions to clients' data. Thusly, client help winds
up being dynamically beneficial.
Bots are one of the musings concerning what is Python utilized for. It is one of the
essential tongues to use for making bots.
As an issue of first noteworthiness, we should consider the potential open-source bot

models:
• python-Rtm-bot is an outstanding bot structure for making Slack bots with Real-Time
Messaging API over Web Sockets.
• GitHub gives unfathomable points of interest for making bots, including code pieces and
significant clues.
• ErrorFind is a chatbot for making bots for Slack, Discord, Hipchat. The basic objective
of Errbot is to permit individuals to convey their endeavours by controlling the gave
Python source code.
Information Mining and Python

Information mining is a procedure of isolating gigantic databases to make propensity
wants. This framework is abnormal. Researchers takes a gander at huge amount of data
24
and base certain questions on them. Information mining combines assessment of social
affiliations, awful conduct imaging, and so forth.
Something other than what's expected for what is Python utilized for is to filter through
and clean information. It is considered as an excellent appriciation for make differently in
relation to other programming vernaculars to do it. Also, AI with Python improves the
information evaluation with the utilization of estimations.
Python is famous for the full degree of structures, giving titanic measures of pre-shaped
code bits that permit draftsmen to improve their undertakings. The corresponding applies
to information mining. Here is a synopsis of most standard structures for driving
information assessment:
• Numpy is the essential structure made arrangements for numerical estimations in Python.
• SciPy is a used for science, number shuffling, and building.
• Scikit-Learn is a Python AI system for beneficial information mining, permitting to play
out the apostatize, bunching, model choice, preprocessing, and demand structures.
• Dask is a structure for cutting edge parallelism for assessment and scaling thousand-
focus social events.
Make Games and 3D Graphics with Python
One the outline concerning what is Python utilized for notice that it is moreover an
appropriate open doors for game unexpected turn of events. In a little while, there are
various structures and instruments for game and reasonable creation:
25
• PyGame is no uncertainty the crucial decision for specific, engineers utilizing Python.
The eminent library offers modules to passing on completely included games and
instinctive media programs. Moreover, understudies ought to consider this system as the
gave models help to acknowledge game movement more. Take the necessary steps not to
anticipate that it should clarify each framework a smidgen at once, yet the library is a
predominant than typical beginning stage.
• PyOpenGl is a framework for OpenGL apps. It contains different events of how to make
3D models.
• Panda3D is an open-source structure for 3D rendering and game unanticipated turn of
events.
• Blender is a many-sided instrument for making 3D handy models. The gadgets apply an
installed Python translator for conveying 3D games.
• Arcade is a Python library for conveying 2D games into the world.
Terminations
Clarifying what is Python utilized for isn't for every circumstance essential. There are a
tremendous measure of layers to take off to give signs of progress take a gander at the
limits of Python. In the wake of finding a few solutions concerning the potential uses, we
prescribe you to begin learning the basics. What's more, you can begin from BitDegree's
common Python instructional exercise
26
CHAPTER 7
SYSTEM STUDY
7.1 FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to ensure
that the proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are,
 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY
7.2 ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the
developed system as well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be purchased.
7.3 TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal or null changes are required for
implementing this system.
27
7.4 SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of acceptance
by the users solely depends on the methods that are employed to educate the user about the
system and to make him familiar with it. His level of confidence must be raised so that he
is also able to make some constructive criticism, which is welcomed, as he is the final user
of the system.
28
CHAPTER 8
INPUT AND OUTPUT DESIGN
8.1 INPUT DESIGN

The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary
to put transaction data in to a usable form for processing can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system. The design of input focuses on controlling the
amount of input required, controlling the errors, avoiding delay, avoiding extra steps and
keeping the process simple. The input is designed in such a way so that it provides security
and ease of use with retaining the privacy. Input Design considered the following things:
➢ What data should be given as input?
➢ How the data should be arranged or coded?
➢ The dialog to guide the operating personnel in providing input.
➢ Methods for preparing input validations and steps to follow when error occur.
8.1.1 OBJECTIVES
1.Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process
and show the correct direction to the management for getting correct information from the
computerized system.
2. It is achieved by creating user-friendly screens for the data entry to handle large volume
of data. The goal of designing input is to make data entry easier and to be free from errors.
The data entry screen is designed in such a way that all the data manipulates can be
performed. It also provides record viewing facilities.
3.When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user will not be
in maize of instant. Thus the objective of input design is to create an input layout that is
easy to follow
8.2 OUTPUT DESIGN

29
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and
to other system through outputs. In output design it is determined how the information is
to be displaced for immediate need and also the hard copy output. It is the most important
and direct source information to the user. Efficient and intelligent output design improves
the system’s relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought out manner;
the right output must be developed while ensuring that each output element is designed so
that people will find the system can use easily and effectively. When analysis design
computer output, they should Identify the specific output that is needed to meet the
requirements.
2. Select methods for presenting information.
3. Create document, report, or other formats that contain information produced by the
system.
The output form of an information system should accomplish one or more of the following
objectives.
• Convey information about past activities, current status or projections of the
• Future.
• Signal important events, opportunities, problems, or warnings.
• Trigger an action.
• Confirm an action.
30
CHAPTER 9
SYSTEM DESIGN
9.1 System Architecture:

31
9.2 Data Flow Diagram
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used
to represent a system in terms of input data to the system, various processing carried out
on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by
the process, an external entity that interacts with the system and the information flows in
the system.
3. DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any level
of abstraction. DFD may be partitioned into levels that represent increasing information
flow and functional detail.
32
33
34
9.3 UML Diagrams

UML stands for Unified Modeling Language. UML is a standardized general-
purpose modeling language in the field of object-oriented software engineering. The
standard is managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or process
may also be added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can develop
and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns
and components.
7. Integrate best practices.
35
9.3.1 Use Case Diagram

A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.
36
9.3.2 Sequence Diagram
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction

diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.
37
9.3.3 Activity Diagram
Activity diagrams are graphical representations of workflows of stepwise activities and

actions with support for choice, iteration and concurrency. In the Unified Modeling
Language, activity diagrams can be used to describe the business and operational step-by-
step workflows of components in a system. An activity diagram shows the overall flow of
control.
38
CHAPTER 10
SAMPLE CODE
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
dataset = pd.read_csv("diabetes.csv")
dataset.shape
dataset.head()
dataset["Outcome"].value_counts()
dataset.all()
dataset.describe()
df_problem_rows = dataset[(dataset['BloodPressure']==0) | (dataset['SkinThickness']==0) |
(dataset['BMI']==0) | (dataset["Glucose"]==0)]
df_problem_rows.head()
len(df_problem_rows)
zero_not_accepted = ["Glucose","BloodPressure","SkinThickness","Insulin","BMI"]
for column in zero_not_accepted:

dataset[column] = dataset[column].replace(0,np.NaN)
mean = int(dataset[column].mean(skipna=True))
dataset[column]=dataset[column].replace(np.NaN,mean)
dataset.describe()
dataset.head()
X = dataset.iloc[:,0:8]
Y = dataset.iloc[:,8]
X_train, X_test, y_train, y_test = train_test_split(X,Y, random_state=0,test_size=0.2)
39
X_train.shape
X_test.shape
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
import math
math.sqrt(len(y_test))
knn = KNeighborsClassifier(n_neighbors=11,p=2,metric="euclidean")
knn.fit(X_train,y_train)
pred = knn.predict(X_test)
cm = confusion_matrix(y_test,pred)
sns.heatmap(cm,annot=True)
print(accuracy_score(y_test,pred))
input_data = list(input().split(","))
input_data_array = np.asarray(input_data)
input_data_reshaped = input_data_array.reshape(1,-1)
std_data = sc_X.transform(input_data_reshaped)
print(std_data)
prediction = knn.predict(std_data)
print(prediction)
if (prediction[0] == 0):
print("You have no diabetis")
else:
print("you have diabetis,immediately consult your doctor")
40
CHAPTER 6 RESULT
6.1 RESULTS
Figure 6.1: Code (Training Phase).
Figure 6.2: Code (Training Phase).
Figure 6.3: Output for Training Phase.

41
Figure 6.4: Testing Phase Code and Output
Figure 6.5: Testing Phase Code and Output (Without diabetics).
Figure 6.6: Training and predicting the model.

42
Figure 6.7: predicting new record
CHAPTER 12
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to

discover every conceivable fault or weakness in a work product. It provides a way to
check the functionality of components, sub assemblies, assemblies and/or a finished
product It is the process of exercising software with the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type addresses a specific
testing requirement.
43
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path
of a business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to determine

if they actually run as one program. Testing is event driven and is more concerned with
the basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically
aimed at exposing the problems that arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are available
as specified by the business and technical requirements, system documentation, and user
manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
44
Systems/Procedures: interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key

functions, or special test cases. In addition, systematic coverage pertaining to identify
Business process flows; data fields, predefined processes, and successive processes must
be considered for testing. Before functional testing is complete, additional tests are
identified and the effective value of current tests is determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system
testing is the configuration oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration
points.
White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of
the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.
Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other
kinds of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in
which the software under test is treated, as a black box .you cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works.
6.1 Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test phase of
the software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.
45
Test strategy and approach

Field testing will be performed manually and functional tests will be written in
detail.
Test objectives
• All field entries must work properly.

• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.
Features to be tested
• Verify that the entries are of the correct format

• No duplicate entries should be allowed
• All links should take the user to the correct page.
6.2 Integration Testing
Software integration testing is the incremental integration testing of two or more

integrated software components on a single platform to produce failures caused by
interface defects.
The task of the integration test is to check that components or software applications,
e.g. components in a software system or – one step up – software applications at the
company level – interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
6.3 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements
46
Test Results: All the test cases mentioned above passed successfully. No defects
encounter.
47
CHAPTER 7
CONCLUSION
7.1 FUTURE SCOPE
Proposed system uses “KNN algorithm” to find the diabetes disease, in data science we
have many algorithms for classification such as Naive Bayes, SVM, Decision Tree, ID3
etc… in future we can add more algorithms to find outputs and algorithms can be
compared to find the efficient algorithm. We can add visitor query module, where visitors
can post queries to administrator and admin can send reply to those queries. We can add
treatment module, where doctors upload treatment details for patients and patient can
view those treatment details.
7.2 CONCLUSION
The prediction of diabetes is one the of great importance in today scenario, and concerning
with its severe complications. Due to the biggest reason for the death in worldwide is
diabetes. The System model is mainly focus to identification of diabetes using some of the
parameters. System is useful to physicians to predict the diabetes in initial dais. So, that
conventional treatments and solutions may be given to the patients. System used some of
the techniques like ML for the prediction, so that to get the more precise results. There,
have been fortune of investigation on the diabetes imprint. Building diabetes disease
prediction system is useful for hospitals and doctors. System predicts disease at early
stages, so doctors can treat patients in a better way. Proposed model is the real time
application in which is meant for multiple hospitals and predicts disease in less time. As
we use machine learning algorithms for disease prediction, we will get more accurate and
efficient results.
48
REFERENCE
[1] “Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus” Md

Faisal Faruque, Asaduzzaman, Iqbal H.Sarker, IEEE 2019.
[2] “A Comprehensive Exploration to the Machine Learning Techniques for Diabetes
Identification” Sidong Wei1, Xuejiao Zhao, Chunyan Miao Shanghai Jiao Tong
University, China.
[3] “Association Rule Extraction from Medical Transcripts of Diabetic Patients” Lakshmi K
S, G Santhosh Kumar, 2014.
[4] “Diabetes Care Decision Support System” 2nd International Conference on Industrial and
Information Systems IEEE 2010.
[5] “An Intelligent Mobile Diabetes Management and Educational System for Saudi Arabia:
System Architecture” M.M. Alotaibi, R.S.H. Istepanian, A.Sungoor and N. Philip, IEEE
2014.
[6] “Machine Learning Techniques for Classification of Diabetes and Cardiovascular
Diseases” by BerinaAlic, Lejla Gurbeta, IEEE 2017.
[7] “Performance Analysis of Classification Approaches for the Prediction of Type II
Diabetes” by M. Durgadevi, M. Durgadevi, IEEE 2017.
[8] “Cloud-Based Diabetes Coaching Platform for Diabetes Management” Elliot B. Sloane
Senior Member IEEE, Nilmini Wickramasinghe, Steve Goldberg 2016.
[9] Minyechil Alehegn and Rahul Joshi, “Analysis and prediction of diabetes diseases using
machine learning algorithm”: International Research Journal of Engineering and
Technology Volume: 04 Issue:10 | Oct -2017
[10] P. Suresh Kumar and V. Uma tejaswi, “Diagnosing Diabetes using Data Mining
Techniques”, International Journal of Scientific and Research Publications, Volume 7,
Issue 6, June 2017 705 ISSN 2250-3153.
[11] “Clustering Medical Data to Predict the Likelihood of Diseases” by Razan Paul,
Abu Sayed Md. Latiful Hoque, IEEE 2010.
49
[12] “Robust Parameter Estimation in a Model for Glucose Kinetics in Type 1 Diabetes
Subjects” Proceedings of the 28th IEEE EMBS Annual International Conference New
York City, USA, Aug 30- Sept 3, 2006.
[13] Anjali C And Veena Vijayan V, Prediction and Diagnosis of Diabetes Mellitus, “A
Machine Learning Approach” ,2015 IEEE in Intelligent Computational Systems (RAICS) |
Trivandrum.

Diabetics Prediction Using KNN Algorithm

Uploaded by

Copyright:

Available Formats

Diabetics Prediction Using KNN Algorithm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Diabetics Prediction Using KNN Algorithm

Uploaded by

Copyright:

Available Formats

1

DIABETICS PREDICTION USING KNN ALGORITHM

submitted in partial fulfillment of the requirements

B. NAGARJUNA REDDY (181061101026)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Internal Guide Project Coordinator HOD

Assistant Prefessor Professors Professor

CSE Department CSE Department CSE Department

Submitted for Viva Voce Examination held on_________________

Internal Examiner External Examiner

WE B. NAGARJUNA REDDY (181061101026), CH. VENKATA NILESH

PLACE: MADHURAVOYAL SIGNATURE OF THE CANDIDATE

We would like to express our gratitude to our Vice Chancellor Dr. S.

LIST OF FIGURES ............................................................................................................ iii

1.4 Cause of diabetics……………………………………………………………………. 2

PROJECT ANALYSIS .................................................................................. 5

REQUIREMENT ANALYSIS .................................................................. 8

4.1 Requirements ............................................................................................................... 8

4.1.1.2 Python Packages .................................................................................................... 9

4.1.1.2.1 NumPy ................................................................................................................. 9

4.1.1.2.2 Scikit-Learn ........................................................................................................ 9

4.1.1.2.3 pandas ................................................................................................................. 10

4.1.1.2.4 matplotlib ............................................................................................................ 10

4.1.1.3 Kaggle .................................................................................................................... 10

4.1.1.4 Jupyter Notebook (Anaconda3) .............................................................................11

4.2 UML Diagrams ................................................................................................................ 12

5.2.2 Feature selection .................................................................................................. 20

5.2.3 Classification Modelling ..................................................................................... 20

5.2.4 Performance Measure .......................................................................................... 20

7.2 Conclusion ..................................................................................................................... 25

Figure No. Figure Name Page No.

3.2 Steps followed 6

4.1 Block Diagram 12

4.2 Flow diagram 13

4.3 Sequence diagram 14

4.4 Use-case diagram 15

6.1 Code (Training Phase) 21

6.2 Code (Training Phase) 22

6.3 Output for training phase 22

6.4 Testing phase code and output 23

6.5 Scan used in figure 6.4 to test 24

Keywords: diabetes, KNN algorithm , accuracy, glucose, ensemble

Figure 1: Classification of Diabetes and non-diabetes

Hence a system is required as Diabetes Prediction is important area in computers, to

DeerajShetty et.al. [17] suggested expectation using data-mining amass Intelligent

NonsoNnamoko et al. [9] offered envisaging diabetes onset: a collaborative supervised

3.1 Existing System

❖ Finger stabbing pain.

3.2 Proposed System

4.1.1 Data Pre-Processing

Subsequent to preparing of various archives, diabetes infection information is pre-handled.

4.1.2 Feature Selection

4.1.3 Classification Modelling

❖ Decision Trees Classifier

❖ Support Vector Classifier

❖ Random Forest Classifier

4.1.4 Performance Measures

4.1.5 K-Nearest Neighbor

• Take a test dataset of attributes and rows.