MachineLearning PDF
MachineLearning PDF
MachineLearning PDF
www.qunaieer.com
طﺮﯾﻘﺔ اﻟﻤﺤﺎﺿﺮة
• ﺷرح اﻟﻣﺻطﻠﺣﺎت واﻟﻣﻔﺎھﯾم اﻷﺳﺎﺳﯾﺔ
• اﻟﻣﺻطﻠﺣﺎت وأﻏﻠب ﺷراﺋﺢ اﻟﻌرض ﺑﺎﻹﻧﺟﻠﯾزي واﻟﺷرح ﺑﺎﻟﻌرﺑﻲ
• ﺷرح ﻣﻔﺻل ﻟﻸﺳﺎﺳﯾﺎت
• إﺷﺎرات ﺳرﯾﻌﺔ ﻟﻠﺧوارزﻣﯾﺎت اﻟﺷﮭﯾرة
• أﻣﺛﻠﺔ ﻋﻣﻠﯾﺔ ﺑﺳﯾطﺔ
• ﺧطﺔ ﺗﻌﻠم ﻣﻘﺗرﺣﺔ
ﻣﺎ ھﻮ ﺗﻌﻠﻢ اﻵﻟﺔ؟
• ھو ﻋﻠم ﯾﻣﻛن اﻟﺣﺎﺳب اﻟﺗﻌﻠم ﻣن ﻧﻔﺳﮫ ﺑدﻻً ﻣن ﺑرﻣﺟﺗﮫ ﺑﺎﻟﺗﻔﺻﯾل
• اﺧﺗزال ﺟوھر اﻟﺑﯾﺎﻧﺎت ﻋن طرﯾﻖ ﺑﻧﺎء ﻧﻣﺎذج ) ،(modelsواﺗﺧﺎذ اﻟﻘرارات واﻟﺗوﻗﻌﺎت
اﻟﻣﺳﺗﻘﺑﻠﯾﺔ ﺑﻧﺎ ًء ﻋﻠﯾﮭﺎ
Regression
•Regression analysis is a statistical
process for estimating the relationships
among variables
•Used to predict continuous outcomes
Regression Examples
Linear Regression
y
Line Equation
𝑦𝑦 = 𝑏𝑏 + 𝑎𝑎𝑎𝑎
price
intercept slope
example
𝑤𝑤0 =50, 𝑤𝑤1 =1.8,
x=500
𝑦𝑦� = 950
square meter x
Linear Regression
y How to quantify
error?
price
square meter x
Linear Regression
Residual Sum of Squares (RSS)
𝑁𝑁
2
𝑅𝑅𝑅𝑅𝑅𝑅 𝑤𝑤0 , 𝑤𝑤1 = �(𝑦𝑦�𝑖𝑖 − 𝑦𝑦𝑖𝑖 )
𝑖𝑖=1
Where 𝑦𝑦�𝑖𝑖 = 𝑤𝑤0 + 𝑤𝑤1 𝑥𝑥𝑖𝑖
Cost function
Linear Regression
y How to choose best
model?
Choose w0 and w1
price
𝑅𝑅𝑅𝑅𝑅𝑅
𝑤𝑤1
𝑤𝑤0
Optimization
𝒘𝒘𝑡𝑡+1 = 𝒘𝒘𝑡𝑡 − 𝜂𝜂𝜂𝜂𝜂𝜂𝜂𝜂𝜂𝜂
Linear Regression: Multiple features
• Example: for house pricing, in addition to size in square meters, we
can use city, location, number of rooms, number of bathrooms, etc
• The model/hypothesis becomes
𝑇𝑇 −1 𝑇𝑇
𝒘𝒘 = (𝑋𝑋 𝑋𝑋) 𝑋𝑋 𝒚𝒚
Analytical vs. Gradient Descent
• Gradient descent: must select parameter 𝜂𝜂
• Analytical solution: no parameter selection
x1
Classification Examples
Logistic Regression
• How to turn regression problem into classification one?
• y = 0 or 1
• Map values to [0 1] range
1
𝑔𝑔 𝑥𝑥 =
1 + 𝑒𝑒 −𝑥𝑥
Sigmoid/Logistic Function
Logistic Regression
• Model (sigmoid\logistic function)
𝑇𝑇
1
ℎ𝒘𝒘 𝒙𝒙 = 𝑔𝑔 𝒘𝒘 𝒙𝒙 = −𝒘𝒘 𝑇𝑇 𝒙𝒙
1+ 𝑒𝑒
• Interpretation (probability)
ℎ𝒘𝒘 𝒙𝒙 = 𝑝𝑝 𝑦𝑦 = 1|𝒙𝒙; 𝒘𝒘
𝑖𝑖𝑖𝑖 ℎ𝒘𝒘 𝒙𝒙 ≥ 0.5 ⇒ 𝑦𝑦 = 1
𝑖𝑖𝑖𝑖ℎ𝒘𝒘 𝒙𝒙 < 0.5 ⇒ 𝑦𝑦 = 0
Logistic Regression
x2 Decision
Boundary
x1
Logistic Regression
• Cost function
x1
x1
Line - Plane - Hyperplane
Demo
• Scikit-learn library’s logistic regression
Other Classification Algorithms
Neural Networks
x1
Decision Trees
Credit?
excellent poor
fair
3 year 5 year
Risky Safe
K Nearest Neighbors (KNN)
x2
5 Nearest Neighbors
x1
Clustering
Clustering
• Unsupervised learning
• Group similar items into clusters
• K-Mean algorithm
Clustering Examples
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean
x2
x1
K-Mean Algorithm
• Select number of clusters: 𝐾𝐾 (number of centroids 𝜇𝜇1 , … , 𝜇𝜇𝐾𝐾 )
• Given dataset of size N
• 𝑓𝑓𝑜𝑜𝑜𝑜 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑡𝑡:
• 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1 𝑡𝑡𝑡𝑡 𝑁𝑁:
• 𝑐𝑐𝑖𝑖 ≔ assign cluster 𝑐𝑐𝑖𝑖 to sample 𝑥𝑥𝑖𝑖 as the smallest Euclidean distance
between 𝑥𝑥𝑖𝑖 and the centroids
• 𝑓𝑓𝑜𝑜𝑜𝑜 𝑘𝑘 = 1 𝑡𝑡𝑡𝑡 𝐾𝐾:
• 𝜇𝜇𝑘𝑘 ≔ mean of the points assigned to cluster 𝑐𝑐𝑘𝑘
Demo
• Scikit-learn library’s k-mean
Machine Learning
Supervised Unsupervised
•Probabilistic models
•Ensemble methods
•Reinforcement Learning
•Recommendation algorithms (e.g., Matrix
Factorization)
•Deep Learning
Linear vs Non-linear
x2 y
price
x1 square meter x
Multi-layer Neural Networks
Support Victor Machines (kernel trick)
2 2
Kernel Trick 𝐾𝐾 𝑥𝑥1 , 𝑥𝑥2 = [𝑥𝑥1 , 𝑥𝑥2 , 𝑥𝑥1 + 𝑥𝑥2 ]
• k-fold cross-validation
• If dataset is very small
• Leave-one-out
• Fine-tuning hyper-parameters
• Automated hyper-parameter selection
• Using validation set
Performance Measures
• Depending on the problem
• Some of the well-known measure are:
• Classification measures
• Accuracy
• Confusion matrix and related measures
• Regression
• Mean Squared Error
• R2 metric
• Clustering performance measure is not straight forward, and will not
be discussed here
Performance Measures: Accuracy
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
𝑎𝑎𝑎𝑎𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
• If we have 100 persons, one of them having cancer. What is the accuracy if
classify all of them as having no cancer?
• Accuracy is not good for heavily biased class distribution
Performance Measures: Confusion matrix
Actual Class
Positive Negative
y x2
Price (*1000)
square meter x x1
Underfitting High Bias
y x2
Price (*1000)
square meter x x1
Training vs. Testing Errors
• Accuracy on training set is not representative of
model performance
• We need to calculate the accuracy on the test
set (a new unseen examples)
• The goal is to generalize the model to work on
unseen data
Bias and variance trade-off
High Bias High Variance
Testing
The optimal is
to have low
error
Validation
Validation
Training
error
error
Training
Low Number of Training Samples High Low Number of Training Samples High
Regularization
• To prevent overfitting
• Decrease the complexity of the model
• Example of regularized regression model (Ridge
Regression)
𝑁𝑁 2 𝑘𝑘 2
𝑅𝑅𝑅𝑅𝑅𝑅 𝑤𝑤0 , 𝑤𝑤1 = ∑𝑖𝑖=1(𝑦𝑦�𝑖𝑖 − 𝑦𝑦𝑖𝑖 ) + 𝜆𝜆 ∑𝑗𝑗 𝑤𝑤𝑗𝑗 ,
𝑘𝑘 = 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ𝑡𝑡𝑡𝑡
• 𝜆𝜆 is a very important hyper-parameter
Debugging a Learning Algorithm
• From “Machine Learning” course on coursera.org, by Andrew Ng