EE2211 Introduction To Machine Learning: Semester 1 2021/2022
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
Learning
Lecture 2
Semester 1
2021/2022
Li Haizhou ([email protected])
Acknowledgement:
EE2211 development team
(Thomas, Kar-Ann, Chen Khong, Helen, Robby and Haizhou)
Module I Contents
• What is Machine Learning and Types of Learning
• How Supervised Learning works
• Regression and Classification Tasks
• Induction versus Deduction Reasoning
• Types of data
• Data wrangling and cleaning
• Data integrity and visualization
• Causality and Simpson’s paradox
• Random variable, Bayes’ rule
• Parameter estimation
• Parameters vs. Hyperparameters
3
© Copyright EE, NUS. All Rights Reserved.
Ask the right questions
if you are to find the right answers
Domain Domain
Knowledge Knowledge
Information Information
(examples, test (examples, test
cases) cases)
Data
(what data to be used to Data
solve the problem?)
Top-down Bottom-up
EE2211 Introduction to Machine Learning 6
© Copyright EE, NUS. All Rights Reserved.
• The data may not contain the answer.
• The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. John Tukey
Figures
Records Facts
Time
EE2211 Introduction to Machine Learning 9
© Copyright EE, NUS. All Rights Reserved.
Ordinal data
• Ordinal data are data that have a fixed, small number
(< 100) of possible values, called levels, that are
ordered (or ranked).
Categorical/ Numerical/
Qualitative Quantitative
Ratio
Interval (Includes natural zero,
Ordinal (No natural zero, e.g., temperature in
Nominal (Can be ordered, e.g., e.g., temperature Kelvin)
(e.g., gender, small/medium/large) in Celsius)
religion) https://i.stack.imgur.com/J8Ged.jpg
Ratio No No No Yes
• Scaling to a range
– When the bounds or range of each independent dimension of
data is known, a common normalization technique is min-max.
• Feature clipping
https://developers.google.com/machine-learning/data-prep/transform/normalization
https://developers.google.com/machine-learning/data-prep/transform/normalization
https://archive.ics.uci.edu/ml/datasets/iris
r = 5 4 3 2 1
– For example, one can assign a ‘1’ value for male and a ‘2’ value for
female for the case of gender attribute; ‘1’ value for spam and ‘0’
value for non-spam as in Lecture 1.
– Replace the missing value with a value outside the normal range of
values.
• For example, if the normal range is [0, 1], then you can set the missing
value to 2 or −1. The idea is that the learning algorithm will learn what is
best to do when the feature has a value significantly different from
regular values.
• Alternatively, you can replace the missing value by a value in the middle
of the range. For example, if the range for a feature is [−1, 1], you can
set the missing value to be equal to 0. Here, the idea is that the value in
the middle of the range will not significantly affect the prediction.
http://www.financetwitter.com/
EE2211 Introduction to Machine Learning 27
© Copyright EE, NUS. All Rights Reserved.
Data Visualization
• https://towardsdatascience.com/data-visualization-with-mathplotlib-using-python-a7bfb4628ee3
By Anscombe.svg: Schutz Derivative works of this file:(label using subscripts): Avenue - Anscombe.svg, CC BY-SA
3.0, https://commons.wikimedia.org/w/index.php?curid=9838454
• These boxplots look very similar, but if you overlay the actual data points
you can see that they have very different distributions.
• If we make this pie chart as a bar chart it is much easier to see that A is bigger than D
• Without logarithm 90% of the data are in the lower left-hand corner in this figure