Miniproject 6

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 33

ABSTRACT

Agriculture is the backbone of the Indian economy. In India, agricultural yield


primarily depends on weather conditions and area. Rice cultivation mainly depends on
rainfall and soil type. Timely advice to predict the future crop productivity and an analysis is
to be made in order to help the farmers to maximize the crop production of crops. Yield
prediction is an important agricultural problem. In the past farmers used to predict their yield
from previous year yield experiences. Since the creation of new innovative technologies and
techniques the agriculture field is slowly degrading. Due to these, abundant invention people
are concentrated on cultivating artificial products that are hybrid products where there leads
to an unhealthy life. Nowadays, modern people don't have awareness about the cultivation of
the crops at the right time and at the right place. Because of these cultivating techniques the
seasonal climatic conditions are also being changed against the fundamental assets like soil,
water and air which lead to insecurity of food. By analysing all these issues and problems like
weather, temperature and several factors, there is no proper solution and technologies to
overcome the situation faced by us.

Machine Learning algorithms is also useful for predicting crop yield production.
Machine Learning algorithms is also useful for predicting crop yield production. Using past
information on weather, temperature and a number of other factors the information is given.
when the producers of the crops know the accurate information on the crop yield it minimizes
the loss. Machine learning, a fast-growing approach that’s spreading out and helping every
sector in making viable decisions to create the foremost of its applications.

The core objective of crop yield estimation is to achieve higher agricultural crop
production and many established models are exploited to increase the yield of crop
production. Nowadays, 2 ML is being used worldwide due to its efficiency in various sectors
such as forecasting, fault detection, pattern recognition, etc. The ML algorithms also help to
improve the crop yield production rate when there is a loss in unfavourable conditions. The
ML algorithms are applied for the crop selection method to reduce the losses crop yield
production irrespective of distracting environment.
CHAPTER 1

INTRODUCTION

1.1 GENERAL

Tamil Nadu being 7th largest area in India has 6th largest population. It is the leading
producer of agriculture products. Agriculture is the main occupation of Tamil Nadu people.
Agriculture has a sound tone in this competitive world. Cauvery is the main source of water.
Cauvery delta regions are called as rice bowl of Tamil Nadu. Rice is the major crop grown in
Tamil Nadu. Other crops like Paddy, Sugarcane, Cotton, Coconut and groundnut are grown.
Bio-fertilizers are produced efficiently. Many areas Farming acts as major source of
occupation. Agriculture makes a dramatic impact in the economy o f a country. Due to the
change of natural factors, Agriculture farming is degrading now-a-days. Agriculture directly
depends on the environmental factors such as sunlight, humidity, soil type, rainfall, Maxim
um and Minim um Temperature, climate, fertilizers, pesticides etc. Knowledge of proper
harvesting of crops is in need to bloom in Agriculture. India has seasons of
1. Winter which occurs from December to March.
2. Summer season from April to June.
3. Monsoon or rainy season lasting from July to September.
4. Post-monsoon or autumn season occurring from October to November.
Due to the diversity of season and rainfall, assessment of suitable crops to cultivate is
necessary. Farmers face major problem s such as crop management, expected crop yield and
productive yield from the crops. Farmers or cultivators need proper assistant regarding crop
cultivation as now-a-days many fresh youngsters are interested in agriculture. Impact of IT
sector in assessing real world problem is moving at a faster rate. Data is increasing day by
day in field of agriculture. With the advancement in Internet of Things, there are ways to
grasp huge data in field of Agriculture. There is a need o f a system to have obvious analyses
of data of agriculture and extract or use useful information from the spreading data. To get
insights from data, it has to be learnt.
1.2 OBJECTIVE

Prediction of crops was done according to farmer’s experience in the past years. A l though
farmer’s knowledge sustains, agricultural factors has been changed to astonishing level.
There comes a need to indulge engineering effect in crop prediction. Data mining plays a
novel role in agriculture research [11]. This field uses historical data to predict; such
techniques are neural networks, K - nearest Neighbour k-means algorithm does not use
historical data but predicts based on-computing centres of the samples and forming clusters.
Computational cost of algorithm acts as a major issue.

1.3 PROBLEM STATEMENT

The problem that the Indian Agriculture sector is facing the integration of technology to bring
the desired outputs. With the advent of new technologies and overuse of non-renewable
energy resources, patterns of rainfall and temperature are disturbed. The inconsistent trends
developed from the side effects of global warming make it difficult for the farmers to clearly
predict the temperature and rainfall patterns thus affecting their crop yield productivity and
also Indian GDP is decreasing as crop yielding is decreasing. The main aim of this project is
to help farmers to cultivate a crop with maximum yield.
1.4 EXISTING SYSTEM

The applications of Machine Learning in agricultural production systems. The works


analysed were categorized in
(a) crop management, including applications on yield prediction, disease detection, weed
detection crop quality, and species recognition;
(b) livestock management, including applications on animal welfare and livestock
production; (c) water management;
(d) soil management.
The filtering and classification of the presented articles demonstrate how agriculture will
benefit from machine learning technologies.

1.4.1 EXISTING SYSTEM DISADVANTAGES

• The main challenge faced in agriculture sector is the lack of knowledge about the changing
variations in climate. Each crop has its own suitable climatic features. This can be handled
with the help of precise farming techniques. The precision farming not only maintains the
productivity of crops but also increases the yield rate of production.
• The existing system which recommends crop yield is either hardware-based being costly to
maintain, or not easily accessible.
• Despite many solutions that have been recently proposed, there are still open challenges in
creating a user-friendly application with respect to crop recommendation.
1.5 PROPOSED SYSTEM

Farmers need assistance with recent technology to grow their crops. Proper prediction
of crops can be informed to agriculturists in time basis. Many Machine Learning techniques
have been used to analyse the agriculture parameters. Some of the techniques in different
aspects of agriculture are studied.
Blooming Neural networks, Soft computing techniques plays significant part in
providing recommendations. Considering the parameter like production and season, more
personalized and relevant recommendations can be given to farmers which makes them to
yield good volume of production.

1.5.1 PROPOSED SYSTEM ADVANTAGES

 The proposed model predicts the crop yield for the data sets of the given region.
Integrating agriculture and ML will contribute to more enhancements in the
agriculture sector by increasing the yields and optimizing the resources involved. The
data from previous years are the key elements in forecasting current performance.
 The proposed system uses recommender system to suggest the right time for using
fertilizers.
 The methods in the proposed system includes increasing the yield of crops, real-time
analysis of crops, selecting efficient parameters, making smarter decisions and getting
better yield.
CHAPTER-2
LITERATURE SURVEY

Title: Applications of machine learning techniques in agricultural crop production: a


review
AUTHORS: Mishra. s, Mishra. D and Sandra. H
This paper has been prepared as an effort to reassess the research studies on the relevance of
machine learning techniques in the domain of agricultural crop production.
Methods/Statistical Analysis: This method is a new approach for production of agricultural
crop management. Accurate and timely forecasts of crop production are necessary for
important policy decisions like import-export, pricing marketing distribution etc. which are
issued by the directorate of economics and statistics. However, one has understood that these
prior estimates are not the objective estimates as this estimate requires lots of descriptive
assessment based on many different qualitative factors. Hence there is a requirement to
develop statistically sound objective prediction of crop production. That development in
computing and information storage has provided large amount of data. Findings: The
problem has been to intricate knowledge from this raw data, this has led to the development
of new approach and techniques such as machine learning that can be used to unite the
knowledge of the data with crop yield evaluation. This research has been intended to evaluate
these innovative techniques such that significant relationship can be found by their
applications to the various variables present in the data base. Application/Improvement: The
few techniques like artificial neural networks, Information Fuzzy Network, Decision Tree,
Regression Analysis, Bayesian belief network. Time series analysis, Markov chain model, k-
means clustering, k nearest neighbour, and support vector machine are applied in the domain
of agriculture were presented.
Title: A Model for Prediction of Crop Yield.
AUTHORS: Manjula
Data Mining is emerging research field in crop yield analysis. Yield prediction is a very
important issue in agricultural. Any farmer is interested in knowing how much yield he is
about to expect. In the past, yield prediction was performed by considering farmer's
experience on particular field and crop. The yield prediction is a major issue that remains to
be solved based on available data. Data mining techniques are the better choice for this
purpose. Different Data Mining techniques are used and evaluated in agriculture for
estimating the future year's crop production. This research proposes and implements a system
to predict crop yield from previous data. This is achieved by applying association rule mining
on agriculture data. This research focuses on creation of a prediction model which may be
used to future prediction of crop yield. This paper presents a brief analysis of crop yield
prediction using data mining technique based on association rules for the selected region i.e.
district of Tamil Nadu in India. The experimental results shows that the proposed work
efficiently predict the crop yield production.
Title: Agricultural crop yield prediction using artificial neural network approach
AUTHORS: Dahisar, S. S, Rode and S. V.
By considering various situations of climatologically phenomena affecting local weather
conditions in various parts of the world. These weather conditions have a direct effect on crop
yield. Various researches have been done exploring the connections between large-scale
climatologically phenomena and crop yield. Artificial neural networks have been
demonstrated to be powerful tools for modelling and prediction, to increase their
effectiveness. Crop prediction methodology is used to predict the suitable crop by sensing
various parameter of soil and also parameter related to atmosphere. Parameters like type of
soil, PH, nitrogen, phosphate, potassium, organic carbon, calcium, magnesium, sulphur,
manganese, copper, iron, depth, temperature, rainfall, humidity. For that purpose, we are used
artificial neural network (ANN).
Title: Predictive ability of machine learning methods for massive crop yield prediction.
AUTHORS: Gonzalez Sanchez. A, Fausto Sols. J and Ojeda Bustamante. W
An important issue for agricultural planning purposes is the accurate yield estimation for the
numerous crops involved in the planning. Machine learning (ML) is an essential approach for
achieving practical and effective solutions for this problem. Many comparisons of ML
methods for yield prediction have been made, seeking for the most accurate technique.
Generally, the number of evaluated crops and techniques is too low and does not provide
enough information for agricultural planning purposes. This paper compares the predictive
accuracy of ML and linear regression techniques for crop yield prediction in ten crop
datasets. Multiple linear regression, M5-Prime regression trees, perceptron multilayer neural
networks, support vector regression and k- nearest neighbour methods were ranked. Four
accuracy metrics were used to validate the models: the root mean square error (RMS), root
relative square error (RRSE), normalized mean absolute error (MAE), and correlation factor
(R). Real data of an irrigation zone of Mexico were used for building the models. Models
were tested with samples of two consecutive years. The results show that M5- Prime and k-
nearest neighbour techniques obtain the lowest average RMSE errors (5.14 and 4.91), the
lowest RRSE errors (79.46% and 79.78%), the lowest average MAE errors (18.12% and
19.42%), and the highest average correlation factors (0.41 and 0.42). Since M5-Prime
achieves the largest number of crop yield models with the lowest errors, it is a very suitable
tool for massive crop yield prediction in agricultural planning.
Title: Crop and Yield Prediction Model
AUTHORS: Shreya S., Kalyani A. Begawan
An agricultural sector necessitates for well-defined systematic approach for predicting the
crops with its yield and supporting farmers to take correct decisions to enhance quality of
farming. The complexity of predicting the best crops is high duet unavailability of crop
knowledge-base. Crop prediction is an efficient approach for better quality farming and
increase revenue. Use of data clustering algorithm is an efficient approach in field of data
mining to extract useful information and give prediction. Various approaches have been
implemented so far are worked either for crop prediction. Crop prediction model aiding
farmers to take correct decision. This indeed helps in improving quality of farming and
generate better revenue for farmers. Traditional clustering algorithms such as k-Means,
improved rough k-Means and-means++ makes the tasks complicated due to random selection
of initial cluster centre and decision of number of clusters. Modified K-Means algorithm is
thereby used to improve the accuracy of a system as it achieves the high-quality clusters duet
initial cluster centric election.
CHAPTER 3
REQUIREMENTS ENGINEERING

3.1 GENERAL
These are the requirements for doing the project. Without using these tools and software’s
we can’t do the project. So, we have two requirements to do the project. They are
1. Hardware Requirements.
2. Software Requirements.

3.2 HARDWARE SPECIFICATION


The hardware requirements may serve as the basis for a contract for the implementation of
the system and should therefore be a complete and consistent specification of the whole
system. They are used by software engineers as the starting point for the system design. It
shows what the system does and not how it should be implemented.

 System: Intel i3 or Above


 Hard Disk: 40 GB
 Monitor: 14’ Colour Monitor
 Mouse: Optical Mouse
 Ram: 1GB or Above

3.3 SOFTWARE SPECIFICATIONS


The software requirements document is the specification of the system. It should include both
a definition and a specification of requirements. It is a set of what the system should do rather
than how it should do it. The software requirements provide a basis for creating the software
requirements specification. It is useful in estimating cost, planning team activities,
performing tasks and tracking the teams and tracking the team’s progress throughout the
development activity.

 System: Intel i3 or Above


 Hard Disk: 40 GB
 Monitor: 14’ Colour Monitor
 Mouse: Optical Mouse
 Ram: 1GB or Above
 Platform: Jupyter Notebook
 Coding language: python

3.4 FUNCTIONAL REQUIREMENTS


A functional requirement defines a function of a software-system or its component. A
function is described as a set of inputs, the behaviour, the presented result will help us in
identifying the behaviour of employees who can be attired over the next time. Experimental
results reveal that the logistic regression approach can reach up to 86% accuracy over other
machine learning approaches.

3.5 NON-FUNCTIONAL REQUIREMENTS


The major non-functional Requirements of the system are as follows
 Usability
The system is designed with completely automated process hence there is no or less
user intervention.
 Reliability
The system is more reliable because of the qualities that are inherited from the chosen
platform java. The code built by using java is more reliable.
 Performance
This system is developing in the high-level languages and using the advanced front-
end and backend technologies it will give response to the end user on client system
with in very less time.
 Supportability
The system is designed to be the cross platform supportable. The system is supported
on a wide range of hardware and any software platform, which is having JVM, built
into the system.
 Implementation
The system is implemented in web environment using struts framework. The Apache
tomcat is used as the web server and windows xp professional is used as the platform.
Interface the user interface is based on Struts provides HTML Tag

3.6 STANDARDS AND POLICIES

 Anaconda Prompt
Anaconda Prompt is a type of command line interface which explicitly deals with the
ml modules and navigator is available in all the windows, Linux and MacOS. The
Anaconda Prompt has many numbers of IDE’s which make the coding easier. The UI
can also be implemented in python.
Standard Used: ISO/IEC 27001

 JUPYTER
It’s like an open-source web application that allows us to share and create the
documents which contains the live code, equations, visualisations and narrative text. It
can be used for data cleaning and transformations, numerical simulation, statistical
modelling, data visualization, machine learning.
Standard Used: ISO/IEC 27001
CHAPTER 4
SYSTEM DESIGN

4.1 GENERAL

Design Engineering deals with the various UML [Unified Modelling language] diagrams for
the implementation of project. Design is a meaningful engineering representation of a thing
that is to be built. Software design is a process through which the requirements are translated
into representation of the software. Design is the place where quality is rendered in software
engineering. Design is the means to accurately translate customer requirements into finished
product

4.2 SYSTEM ARCHITECTURE:

Fig 4.2: System Architecture


4.3 UML DIAGRAMS

4.3.1 INTRODUCTION:
UML represents Unified Modelling Language. UML is an institutionalized universally useful
showing dialect in the subject of article situated programming designing. The fashionable is
overseen, and become made by way of, the Object Management Group. The goal is for UML
to become a regular dialect for making fashions of item arranged PC programming. In its gift
frame UML is contained two noteworthy components: a Meta-show and documentation.
Later on, a few types of method or system can also likewise be brought to; or related with,
UML.

The Unified Modelling Language is a popular dialect for indicating, Visualization,


Constructing and archiving the curios of programming framework, and for business
demonstrating and different nonprogramming frameworks. The UML speaks to an
accumulation of first-rate building practices which have verified fruitful in the showing of
full-size and complicated frameworks. The UML is a essential piece of creating gadgets
located programming and the product development method. The UML makes use of
commonly graphical documentations to specific the plan of programming ventures.

GOALS: The Primary goals inside the plan of the UML are as in step with the subsequent:
1. Provide clients a prepared to utilize, expressive visual showing Language on the way
to create and change massive models.
2. Provide extendibility and specialization units to make bigger the middle ideas.
3. be free of specific programming dialects and advancement manner.
4. Provide a proper cause for understanding the displaying dialect.
5. Encourage the improvement of OO gadgets exhibit.
6. Support large amount advancement thoughts, for example, joint efforts, systems,
examples and its components.
7. Integrate widespread procedures.
4.3.2 USE CASE DIAGRAM

A use case diagram in the Unified Modelling Language (UML) is a type of behavioural
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. Roles of the actors
in the system can be depicted.
Fig 4.3.2: Use Case Diagram

4.3.3 CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modelling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes. It
explains which class contains information . Class diagrams are the blue prints of your system
or subsystem. Class diagrams are useful in many stages of system.
Fig 4.3.3: Class Diagram

4.3.4 SEQUENCE DIAGRAM

A sequence diagram in Unified Modelling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagram.

Fig 4.3.4: Sequence Diagram


4.3.5 OBJECT DIAGRAM

A UML Object diagram represents a specific instance of a class diagram at a certain moment
in time. When represented visually, you’ll see many similarities to the class diagram. An
Object diagram focuses on the attributes of a set of objects and how those objects relate to
each other.
Fig 4.3.5: Object Diagram

4.3.6 ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and


actions with support for choice, iteration and concurrency. In the Unified Modelling
Language, activity diagrams can be used to describe the business and operational step-by-step
workflows of components in a system. An activity diagram shows the overall flow of control.
Fig 4.3.6: Activity Diagram

4.3.7 DEPLOYMENT DIAGRAM

In UML, Deployment diagrams model the physical architecture of a system. Deployment


diagrams show the relationship between the software and hardware components in the
system and the physical distribution of the processing. The deployment diagram visualizes
the physical hardware on which the software will be deployed.

Fig 4.3.7: Deployment Diagram


4.4 DATA FLOW DIAGRAM

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modelling tools. It is used
to model the system components. These components are the system process, the data
used by the process, an external entity that interacts with the system and the
information flows in the system.
3. DFD shows how the information moves through the system and how it is modified by
a series of transformations. It is a graphical technique that depicts information flow
and the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.
Fig 4.4: Data Flow Diagram

CHAPTER-5
IMPLEMENTATION

5.1 GENERAL

In this chapter, various supervised machine learning approaches are used. This section
provides a general description of these approaches.

5.2 METHODOLOGIES

5.2.1 MODULE NAMES:

 Data collection
 Data set
 Data pre-processing
 Model Selection
 Performance Analysis
 Accuracy Prediction

5.2.2 MODULE DESCRIPTION

Data Collection:

This is the first real step towards the real development of a machine learning model,
collecting data. This is a critical step that will cascade in how good the model will be, the
more and better data that we get, the better our model will perform.

There are several techniques to collect the data, like web scraping, manual
interventions and etc. The dataset used in this crop recommendation in India taken from some
other source.

Dataset:
The dataset consists of individual data. There are 8 columns in the dataset, which are
described below.

1. States: total number of states in India


2. Rainfall: rainfall in mm
3. Ground Water: Total ground water level
4. Temperature: temperature in degree Celsius
5. Soil type: Number of soil types
6. Season: Which season is suitable for crops
7. Crops: Types of crops

Data Pre-processing:

Wrangle data and prepare it for training. Clean that which may require it (remove
duplicates, correct errors, deal with missing values, normalization, data type conversions,
etc.)
Randomize data, which erases the effects of the particular order in which we collected
and/or otherwise prepared our data. Visualize data to help detect relevant relationships
between variables or class imbalances (bias alert!), or perform other exploratory analysis.
Split into training and evaluation sets.

Model Selection:

A decision tree is a flowchart-like tree structure where an internal node represents


feature(or attribute), the branch represents a decision rule, and each leaf node represents the
outcome. The topmost node in a decision tree is known as the root node. It learns to partition
on the basis of the attribute value. It partitions the tree in recursively manner call recursive
partitioning. This flowchart like structure helps you in decision making. It's visualization like
a flowchart diagram which easily mimics the human level thinking. That is why decision
trees are easy to understand and interpret.
Decision Tree is a white box type of ML algorithm. It shares internal decision-making
logic, which is not available in the black box type of algorithms such as Neural Network. Its
training time is faster compared to the neural network algorithm. The time complexity of
decision trees is a function of the number of records and number of attributes in the given
data. The decision tree is a distribution-free or non-parametric method, which does not
depend upon probability distribution assumptions.

Decision trees can handle high dimensional data with good accuracy. The decision
rules are generally in form of if-then-else statements. The deeper the tree, the more complex
the rules and fitter the model.

Before we dive deep, let's get familiar with some of the terminologies:

 Instances: Refer to the vector of features or attributes that define the input space
 Attribute: A quantity describing an instance
 Concept: The function that maps input to output
 Target Concept: The function that we are trying to find, i.e., the actual answer
 Hypothesis Class: Set of all the possible functions
 Sample: A set of inputs paired with a label, which is the correct output
 Testing Set: Similar to the training set and is used to test the candidate concept and
determine its performance.

Performance Analysis:

The performance was evaluated using the metrics like Mean Square Error (MSE).
In the actual dataset, we chose only 8 features:

 States: total number of states in India


 Rainfall: rainfall in mm
 Ground Water: Total ground water level
 Temperature: temperature in degree Celsius
 Soil type: Number of soil types
 Season: Which season is suitable for crops 11
 Crops: Types of crops.
 Humidity: Humidity in that area.

Accuracy Prediction:

We got an accuracy of 90.7% on test set. Now we got the accuracy prediction of the crop.

5.2 PYTHON

 Python is an interpreted high-level programming language for general-purpose


programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significant
whitespace. Python features a dynamic type system and automatic memory
management. It supports multiple programming paradigms, including object oriented,
imperative, functional and procedural, and has a large and comprehensive standard
library.
 Python is Interpreted − Python is processed at runtime by the interpreter. You do not
need to compile your program before executing it. This is similar to PERL and PHP.
 Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
 Python also acknowledges that speed of development is important. Readable and terse
code is part of this, and so is access to powerful constructs that avoid tedious
repetition of code. Maintainability also ties into this may be an all but useless metric,
but it does say something about how much code you have to scan, read and/or
understand to troubleshoot problems or tweak behaviours. This speed of development,
the ease with which a programmer of other languages can pick up basic Python skills
and the huge standard library is key to another area where Python excels. All its tools
have been quick to implement, saved a lot of time, and several of them have later been
patched and updated by people with no Python background - without breaking.

5.3 MACHINE LEARNING

Before we take a look at the details of various machine learning methods, let's start by
looking at what machine learning is, and what it isn't. Machine learning is often categorized
as a subfield of artificial intelligence, but I find that categorization can often be misleading at
first brush. The study of machine learning certainly arose from research in this context, but in
the data science application of machine learning methods, it's more helpful to think of
machine learning as a means of building models of data. Fundamentally, machine learning
involves building mathematical models to help understand data. "Learning" enters the fray
when we give these models tuneable parameters that can be adapted to observed data; in this
way the program can be considered to be "learning" from the data. Once these models have
been fit to previously seen data, they can be used to predict and understand aspects of newly
observed data. I'll leave to the reader the more philosophical digression regarding the extent
to which this type of mathematical, model based "learning" is similar to the "learning"
exhibited by the human brain. Understanding the problem setting in machine learning is
essential to using these tools effectively Applications of Machines Learning.

Machine Learning is the most rapidly growing technology and according to researchers we
are in the golden year of AI and ML. It is used to solve many real-world complex problems
which cannot be solved with traditional approach.
Following are some real-world applications of ML

 Emotion analysis Sentiment analysis.


 Error detection and prevention.
 Weather forecasting and prediction.
 Stock market analysis and forecasting.
 Speech synthesis.
 Speech recognition.
 Customer segmentation.
 Object recognition.
 Fraud detection.
 Fraud prevention.
 Recommendation of products to customer in online shopping.

5.4 TECHNIQUE USED OR ALGORITHM USED

Random Forest Algorithm

A random forest is a supervised machine learning algorithm that is constructed from decision
tree algorithms. This algorithm is applied in various industries such as banking and e-
commerce to predict behaviour and outcomes. This article provides an overview of the
random forest algorithm and how it works. The article will present the algorithm’s features
and how it is employed in real-life applications. It also points out the advantages and
disadvantages of this algorithm.

Features of a Random Forest Algorithm


 It’s more accurate than the decision tree algorithm.
 It provides an effective way of handling missing data.
 It can produce a reasonable prediction without hyper-parameter tuning.
 It solves the issue of overfitting in decision trees.
 In every random forest tree, a subset of features is selected randomly at the node’s
splitting point.

How random forest algorithm works


Understanding decision trees

Decision trees are the building blocks of a random forest algorithm. A decision tree is
a decision support technique that forms a tree-like structure. An overview of decision trees
will help us understand how random forest algorithms work.

A decision tree consists of three components: decision nodes, leaf nodes, and a root
node. A decision tree algorithm divides a training dataset into branches, which further
segregate into other branches. This sequence continues until a leaf node is attained. The leaf
node cannot be segregated further.
The nodes in the decision tree represent attributes that are used for predicting the
outcome. Decision nodes provide a link to the leaves. The following diagram shows the three
types of nodes in a decision tree.
Fig 5.4: Decision Tree Diagram

You might also like