Unit - 1, Notes
Unit - 1, Notes
Unit - 1, Notes
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used
to identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a
photo with our Facebook friends, then we automatically get a tagging suggestion with
name, and the technology behind this is machine learning's face
detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies
such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we
search for some product on Amazon, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of machine
learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important, normal, and
spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below
are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,
and Naïve Bayes classifier are used for email spam filtering and malware detection.
7. Virtual Personal Assistant:
We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the
information using our voice instruction. These assistants can help us in various ways just
by our voice instructions such as Play music, call someone, Open an email, Scheduling
an appointment, etc.
These virtual assistants use machine learning algorithms as an important part.
These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.
8. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways
that a fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction. So to detect this, Feed Forward Neural
network helps us by checking whether it is a genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a
specific pattern which gets change for the fraud transaction hence, it detects it and makes
our online transactions more secure.
9. Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short term
memory neural network is used for the prediction of stock market trends.
10. Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
11. Automatic Language Translation:
Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our
known languages. Google's GNMT (Google Neural Machine Translation) provide this
feature, which is a Neural Machine Learning that translates the text into our familiar
language, and it called as automatic translation.
The technology behind the automatic translation is a sequence to sequence learning
algorithm, which is used with image recognition and translates the text from one language
to another language.
2. Product Recommendations
Product recommendation is one of the most popular and known applications of machine
learning. Product recommendation is one of the stark features of almost every e-commerce
website today, which is an advanced application of machine learning techniques.
Using machine learning and AI, websites track your behavior based on your previous
purchases, searching patterns, and cart history, and then make product
recommendations.
3. Image Recognition
Image recognition, which is an approach for cataloging and detecting a feature or an object
in the digital image, is one of the most significant and notable machine learning and AI
techniques. This technique is being adopted for further analysis, such as pattern
recognition, face detection, and face recognition.
4. Sentiment Analysis
Sentiment analysis is one of the most necessary applications of machine learning.
Sentiment analysis is a real-time machine learning application that determines the
emotion or opinion of the speaker or the writer. For instance, if someone has written a
review or email (or any form of a document), a sentiment analyzer will instantly find out
the actual thought and tone of the text. This sentiment analysis application can be used
to analyze a review based website, decision-making applications, etc.
M 48 sick
M 67 sick
F 53 healthy
M 49 sick
F 32 healthy
M 34 healthy
M 21 healthy
B. Unsupervised learning:
Unsupervised learning is a type of machine learning algorithm used to draw inferences
from datasets consisting of input data without labeled responses. In unsupervised
learning algorithms, classification or categorization is not included in the observations.
Example: Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients.
gender age
M 48
M 67
F 53
M 49
F 34
M 21
C. Reinforcement learning:
Reinforcement learning is the problem of getting an agent to act in the world so as to
maximize its rewards.
A learner is not told what actions to take as in most forms of machine learning but instead
must discover which actions yield the most reward by trying them. For example —
Consider teaching a dog a new trick: we cannot tell him what to do, what not to do, but
we can reward/punish it if it does the right/wrong thing.
D. Semi-supervised learning:
Where an incomplete training signal is given: a training set with some (often many) of the
target outputs missing. There is a special case of this principle known as Transduction
where the entire set of problem instances is known at learning time, except that part of
the targets are missing. Semi-supervised learning is an approach to machine learning that
combines small labeled data with a large amount of unlabeled data during training. Semi-
supervised learning falls between unsupervised learning and supervised learning.
These ML algorithms help to solve different business problems like Regression,
Classification, Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
`
1. Supervised Machine Learning
As its name suggests, Supervised machine learning is based on supervision. It means in
the supervised learning technique, we train the machines using the "labelled" dataset, and
based on the training, the machine predicts the output. Here, the labelled data specifies
that some of the inputs are already mapped to the output. More preciously, we can say;
first, we train the machine with the input and corresponding output, and then we ask the
machine to predict the output using the test dataset.
Let's understand supervised learning with an example. Suppose we have an input dataset
of cats and dog images. So, first, we will provide the training to the machine to understand
the images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour,
height (dogs are taller, cats are smaller), etc. After completion of training, we input the
picture of a cat and ask the machine to identify the object and predict the output. Now,
the machine is well trained, so it will check all the features of the object, such as height,
shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat
category. This is the process of how the machine identifies the objects in Supervised
Learning.
The main goal of the supervised learning technique is to map the input variable(x)
with the output variable(y). Some real-world applications of supervised learning
are Risk Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
o Random Forest Algorithm
o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
Some popular Regression algorithms are given below:
o Simple Linear Regression Algorithm
o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression
Advantages and Disadvantages of Supervised Learning
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact
idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
o These algorithms are not able to solve complex tasks.
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.
Applications of Supervised Learning
Some common applications of Supervised Learning are given below:
o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this process,
image classification is performed on different image data with pre-defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It
is done by using medical images and past labelled data with labels for disease
conditions. With such a process, the machine can identify a disease for the new
patients.
o Fraud Detection - Supervised Learning classification algorithms are used for
identifying fraud transactions, fraud customers, etc. It is done by using historic data
to identify the patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used.
These algorithms classify an email as spam or not spam. The spam emails are sent
to the spam folder.
o Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications can
be done using the same, such as voice-activated passwords, voice commands, etc.
2. Unsupervised Machine Learning
Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning,
the machine is trained using the unlabeled dataset, and the machine predicts the output
without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified
nor labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines
are instructed to find the hidden patterns from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown
to the model, and the task of the machine is to find the patterns and categories of the
objects.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data.
It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of other
groups. An example of the clustering algorithm is grouping the customers by their
purchasing behaviour.
Some of the popular clustering algorithms are given below:
o K-Means Clustering algorithm
o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis
2) Association
Association rule learning is an unsupervised learning technique, which finds interesting
relations among variables within a large dataset. The main aim of this learning algorithm
is to find the dependency of one data item on another data item and map those variables
accordingly so that it can generate maximum profit. This algorithm is mainly applied
in Market Basket analysis, Web usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
Prediction - Regression:
• Example: Predict Price of a used car
• x : car attributes
y : price
y = g (x | θ )
where, g ( ) is the model, θ are the parameters
Introduction
Shattering
Shattering is the ability of a model to classify a set of points perfectly. More generally, the
model can create a function that can divide the points into two distinct classes without
overlapping. It is different from simple classification because it considers all possible
combinations of labels upon those points. Later in the shot, we’ll see this concept in action
while computing the VC dimension. In the context of shattering, we simply define the VC
dimension of a model as the size of the largest set of points that that model can shatter.
The VC dimension of a classifier is defined by Vapnik and Chervonenkis to be the
cardinality (size) of the largest set of points that the classification algorithm can shatter [1].
Shattering a set of points
A configuration of N points on the plane is just any placement of N points. In order to have
a VC dimension of at least N, a classifier must be able to shatter a single configuration
of N points. In order to shatter a configuration of points, the classifier must be able to, for
every possible assignment of positive and negative for the points, perfectly partition the
plane such that the positive points are separated from the negative points. For a
configuration of N points, there are 2^N possible assignments of positive or negative, so
the classifier must be able to properly separate the points in each of these.
In the below example, we show that the VC dimension for a linear classifier is at least 3,
since it can shatter this configuration of 3 points. In each of the 2³ = 8 possible assignment
of positive and negative, the classifier is able to perfectly separate the two classes.
Now, we show that a linear classifier is lower than 4. In this configuration of 4 points, the
classifier is unable to segment the positive and negative classes in at least one assignment.
Two lines would be necessary to separate the two classes in this situation. We actually
need to prove that there does not exist a 4 point configuration that can be shattered, but
the same logic applies to other configurations, so, for brevity’s sake, this example is good
enough.
Since we have now shown that the linear classifier’s VC dimension is at least 3, and lower
than 4, we can finally conclude that its VC dimension is exactly 3. Again, remember that
in order to have a VC dimension of N, the classifier must only shatter a single configuration
of N points — there will likely be many other configurations of N points that the classifier
cannot shatter.
Applications of VC dimension
Now that you know what the VC dimension is, and how to find it, it is important to also
understand what its practical implications are. In most cases, the exact VC dimension ofa
classifier is not so important. Rather, it is used more so to classify different types of
algorithms by their complexities; for example, the class of simple classifiers could include
basic shapes like lines, circles, or rectangles, whereas a class of complex classifiers could
include classifiers such as multilayer perceptrons, boosted trees, or other nonlinear
classifiers. The complexity of a classification algorithm, which is directly related to its VC
dimension, is related to the trade-off between bias and variance.
In this image, we visualize the effects of model complexity. On the bottom,
each S_i represents a set of models that are similar in VC dimension, or complexity. On
the graph above, VC dimension is measured on the x-axis as h. Observe that as complexity
increases, you transition from underfitting to overfitting; adding complexity is good up until
a certain point, after which you begin to overfit on the training data.
Another way of thinking about this is through bias and variance. A low complexity model
will have a high bias and low variance; while it has low expressive power leading to high
bias, it is also very simple, so it has very predictable performance leading to a low variance.
Conversely, a complex model will have a lower bias since it has more expressiveness, but
will have a higher variance as there are more parameters to tune based on the sample
training data. Generally, a model with a higher VC dimension will require more training
data to properly train, but will be able to identify more complex relationships in the data.
At some level of model complexity there will exist an ideal balance between bias and
variance, denoted by the dotted vertical line, at which you are neither underfitting nor
overfitting to your data. In other words, you should aim to choose a classifier with a level
of complexity that is just enough for your classification task — any less would lead to
underfitting, and any more would lead to overfitting.
Probably Approximately Correct (PAC):
• Cannot expect a learner to learn a concept exactly.
• Cannot always expect to learn a close approximation to the target concept
• Therefore, the only realistic expectation of a good learner is that with high
probability it will learn a close approximation to the target concept.
• In Probably Approximately Correct (PAC) learning, one requires that given
small parameters and , with probability at least (1- ) a learner produces
a hypothesis with error at most
• The reason we can hope for that is the Consistent Distribution assumption.
PAC Learnability
Some examples of machine learning algorithms with low variance are, Linear Regression,
Logistic Regression, and Linear discriminant analysis. At the same time, algorithms
with high variance are decision tree, Support Vector Machine, and K-nearest
neighbours.
Ways to Reduce High Variance:
o Reduce the input features or number of parameters as a model is overfitted.
o Do not use a much complex model.
o Increase the training data.
o Increase the Regularization term.
Different Combinations of Bias-Variance
There are four possible combinations of bias and variances, which are represented by the
below diagram:
1. Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine learning
model. However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with
a large number of parameters and hence leads to an overfitting
3. High-Bias, Low-Variance: With High bias and low variance, predictions are
consistent but inaccurate on average. This case occurs when a model does not learn
well with the training dataset or uses few numbers of the parameter. It leads
to underfitting problems in the model.
4. High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also inaccurate
on average.
How to identify High variance or High Bias?
High variance can be identified if the model has:
For an accurate prediction of the model, algorithms need a low variance and low bias. But
this is not possible because bias and variance are related to each other:
o If we decrease the variance, it will increase the bias.
o If we decrease the bias, it will increase the variance.
Bias-Variance trade-off is a central issue in supervised learning. Ideally, we need a model
that accurately captures the regularities in training data and simultaneously generalizes
well with the unseen dataset. Unfortunately, doing this is not possible simultaneously.
Because a high variance algorithm may perform well with training data, but it may lead
to overfitting to noisy data. Whereas, high bias algorithm generates a much simple model
that may not even capture important regularities in the data. So, we need to find a sweet
spot between bias and variance to make an optimal model.
Hence, the Bias-Variance trade-off is about finding the sweet spot to make a balance
between bias and variance errors.\
Examples:
Techniques to reduce overfitting:
1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye over the loss over the training
period as soon as loss begins to increase stop training).
4. Ridge Regularization and Lasso Regularization
5. Use dropout for neural networks to tackle overfitting.
Good Fit in a Statistical Model: Ideally, the case when the model makes the predictions
with 0 error, is said to have a good fit on the data. This situation is achievable at a spot
between overfitting and underfitting. In order to understand it, we will have to look at the
performance of our model with the passage of time, while it is learning from the training
dataset.
With the passage of time, our model will keep on learning, and thus the error for the model
on the training and testing data will keep on decreasing. If it will learn for too long, the
model will become more prone to overfitting due to the presence of noise and less useful
details. Hence the performance of our model will decrease. In order to get a good fit, we
will stop at a point just before where the error starts increasing. At this point, the model
is said to have good skills in training datasets as well as our unseen testing dataset.
o Misclassification rate: It is also termed as Error rate, and it defines how often the
model gives the wrong predictions. The value of error rate can be calculated as the
number of incorrect predictions to all number of the predictions made by the
classifier. The formula is given below:
o Precision: It can be defined as the number of correct outputs provided by the model
or out of all positive classes that have predicted correctly by the model, how many
of them were actually true. It can be calculated using the below formula:
o Recall: It is defined as the out of total positive classes, how our model predicted
correctly. The recall must be as high as possible.
o F-measure: If two models have low precision and high recall or vice versa, it is
difficult to compare these models. So, for this purpose, we can use F-score. This
score helps us to evaluate the recall and precision at the same time. The F-score is
maximum if the recall is equal to the precision. It can be calculated using the below
formula: