Want to learn the basics of machine learning? Watch the reel below and follow us on our social media channels.
بغيتي تعلم أساسيات التعلم الي ؟ تفرجوا فلفيديو اللي تحت أو متنساوش تابعونا فوسائل التواصل الاجتماعي لمعرفة المزيد.
Crédits : Hasnaa OUADOUDI BELABZIOUI et Mohamed Taha AFIF
Excellent work! walakin jatni b7al ila khelto mabin l'alog d'optimization e.g. G.D. o backpropagation li lmohima dyalo hiya l calcul des gradients a partir mn retropropagation dyal les erreurs mn la derniere couche jusqu'a la premiere couche?
The k-nearest neighbors (KNN) algorithm is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. It is one of the popular and simplest classification and regression classifiers used in machine learning today.
While the KNN algorithm can be used for either regression or classification problems, it is typically used as a classification algorithm, working off the assumption that similar points can be found near one another.
For classification problems, a class label is assigned on the basis of a majority vote—i.e. the label that is most frequently represented around a given data point is used. While this is technically considered “plurality voting”, the term, “majority vote” is more commonly used in literature. The distinction between these terminologies is that “majority voting” technically requires a majority of greater than 50%, which primarily works when there are only two categories. When you have multiple classes—e.g. four categories, you don’t necessarily need 50% of the vote to make a conclusion about a class; you could assign a class label with a vote of greater than 25%.
AI Lab Lead 🤖, Agentic Business Solutions, Podcast host & Speaker || Visionary, out of the box & holistic || AI, Data, Physics & Computational Neuroscience (PhD)
I am always telling my fellows that neighborhood search is the conceptual core of all #machinelearning. In essence, most of not all ML is about relationships and similarities between data points and most advanced ML algorithms are just a proxy for #kNN due to limited compute power!
It is hard to think of ML that is not about search.
Think about it. Starting from #neighborhoodsearch you can construct so many applications. In ML we look for patterns. To find patterns we need to find similarities. What is close, what is far? That's why in the brain we have a dedicated region only for estimating distances, even in abstract spaces (#enthorinalcortex#mec#gridcells).
As an example, let's take #anomalydetection: There what we want to do is find significant differences to typical data: Are there data points in our training data being close enough to the potential anomaly?
Or let's take #classification: To which cluster of points (=class) is the incoming data closest?
And what about #RAG: Here we search for the closest neighbors in a document before space.
Take #transformers and #llm: There we are predicting the next token - which subsequent word has highest probability? Which word is closest within the embedding space when using the attention based correlation metrics, that transformers implicitly use?
Take #clustering and #unsupervisedlearning: Here we define neighborhoods, regions with high density in the data, to be clusters.
In the core of so many algorithms you will find an argmin(distances_{i,j}) or some argsort(distances{i,j}) for a fixed i. That is basically what you overcome with k-nearest neighbors search if it's too much effort to precompute the full distance matrix distances_ij.
The true challenges and core to understanding your problem are to map your ML case to a neighborhood search, have enough compute power to perform the search & use the right similarity metrics for your search space.
That is why btw. also why I find euclidean distance matrix completion (#EDM) and #embedding super fascinating, which are closely related to #recommendationengines.
Take a swarm of agents, where each of them continuously tries to estimate its location in their space of operation (take our physical 3D space) based on sensory data. When it comes to "where am I within the crowd?" it may only be able to estimate its relative position with respect to its neighbors. So most entries in the relative distance matrix of one agent are empty. But by the information about distances that the neighbors have, and their neighbors and so on, the overall distance matrix can be completed (practically done by techniques like locally linear embedding #lle). This in turn allows every agent to understand where it is relative to the full swarm and what the shape of the swarm is currently. Those are essential information for planning actions. If you transfer the same logics to landmarks instead of neighboring agents, it allows the agent to construct mental maps of space.
Machine Learning Writer | Author of Interpretable Machine Learning, Modeling Mindsets, and more | christophmolnar.com
kNN is often one of the first ML algorithms taught to beginners.
I get it, it's the most intuitive approach to learning from data.
But compared to other ML algorithms, kNN is such an outlier thanks to the lazy learning concept. I wonder if starting with kNN is a good choice.
kNN is often one of the first ML algorithms taught to beginners.
I get it, it's the most intuitive approach to learning from data.
But compared to other ML algorithms, kNN is such an outlier thanks to the lazy learning concept. I wonder if starting with kNN is a good choice.
Hello Connections,
Today is Day 5 of my learning challenge, and I'm diving into the foundational concepts of machine learning! Grasping these basics is essential for building a solid understanding and advancing in this field.
Topics Covered:
Supervised Learning: This type of learning involves training a model on a labeled dataset, which means the model learns from input-output pairs. The goal is to predict the output for new inputs. Common algorithms include Linear Regression, Logistic Regression, and Decision Trees. Real-world applications include spam detection, sentiment analysis, and predictive maintenance.
Unsupervised Learning: Unlike supervised learning, unsupervised learning deals with unlabeled data. The model tries to find hidden patterns and relationships in the data. Clustering and Association are common techniques, with applications in customer segmentation, anomaly detection, and market basket analysis.
Avoiding Overfitting: Overfitting occurs when a model learns the training data too well, capturing noise and outliers, which negatively impacts its performance on new data. Techniques to avoid overfitting include:
Regularization: Adding a penalty for more complex models to prevent them from fitting the noise in the training data.
Pruning (for Decision Trees): Removing parts of the tree that do not provide additional power to classify instances.
Early Stopping: Halting the training process before the model begins to overfit.
This Machine Learning handout is very useful if one need to learn from scratch.
👉👉👉👉 Link :- https://lnkd.in/grSWJP_A#MachineLearning#supervisedlearning#unsupervisedlearning#mlbasics#regression
🌲 Unlocking the Power of Random Forest in Machine Learning 🌲
Yesterday we talked about decision trees generally, Now let's talk about how tp apply that, first let's dive deep into the essence of ensemble learning with Random Forest—a method that harmonizes the strength of multiple decision trees to form a more accurate and robust predictive model. Unlike a single tree, Random Forest builds an entire ecosystem of decision trees, each trained on random subsets of data and features, ensuring diversity in perspectives and a comprehensive understanding.
Key Insights:
Method: Constructs numerous decision trees at training time, leveraging the mode of classes (for classification) or mean prediction (regression) from individual trees for the final output.
Libraries: Dive into implementation with sklearn.ensemble.RandomForestClassifier and sklearn.ensemble.RandomForestRegressor.
Advantages:
Accuracy: By averaging multiple trees, it reduces the risk of overfitting, leading to more reliable predictions.
Versatility: Excellently handles both classification and regression tasks, making it a versatile tool in your ML arsenal.
Feature Importance: Automatically evaluates the significance of different attributes, guiding towards more insightful data understanding.
Challenges:
Complexity: More complex to interpret than a single decision tree.
Performance: Requires considerable computational resources, especially as the number of trees increases.
Random Forest stands as a testament to the principle that 'together we are stronger.' It's not just about growing trees but cultivating a forest that withstands the tests of variability and bias.
Let's embark on this journey through the forest of data, where ensemble learning paves the way to unlocking profound insights and predictions. #MachineLearning#RandomForest#DataScience#EnsembleLearning
Unsupervised Machine Learning & K-Means Clustering: A Must-Know for Beginners
Did you know that over 60% of beginners in machine learning neglect to learn about unsupervised machine learning and K-means clustering? Don't be one of them!
Unsupervised Machine Learning:
A type of machine learning where the algorithm learns patterns and relationships in data without human supervision
No labeled data is required, making it ideal for exploratory data analysis
K-Means Clustering:
A popular unsupervised algorithm used for clustering similar data points into groups
Helps identify patterns, customer segments, and anomalies in data
Why Learn Unsupervised Machine Learning & K-Means Clustering?
Enhance your data analysis skills
Identify hidden patterns and relationships in data
Improve decision-making with data-driven insights
Take Advantage of these Resources!
Learn about unsupervised machine learning and K-means clustering to stay ahead in the field. Share this with your network to spread the knowledge!
#MachineLearning#DataScience#UnsupervisedLearning#KMeansClustering#LinkedInLearning
#Supervised_Learning:-
In supervised learning, the model is trained on a labeled dataset, meaning each input comes with an associated output (or label).
#Goal: Make predictions on new data.
#Two_types:-
#Classification:-
Predicts categories (discrete output).
To assign data into predefined categories or classes.
The model learns from examples where the correct category is known, then uses this knowledge to classify new data.
#Example: Classify emails as "spam" or "not spam."
______________
#Regression:-
Predicts continuous values.
To predict a continuous numerical value based on input data.
The model learns the relationship between input features and a continuous output value, then uses this to predict outcomes for new data
#Example: Predict house prices based on feature.
__________________________
#Unsupervised_Learning:-
In unsupervised learning, the model is given an unlabeled dataset and must discover hidden patterns or structures in the data without explicit guidance on the correct output.
Used for clustering, association, and dimensionality reduction.
#Example: Grouping customers by purchase behavior.
#Algorithms: K-means, hierarchical clustering, PCA, etc.
_____________________________
#Reinforcement_Learning:-
Definition: In reinforcement learning, an agent learns how to perform a task by interacting with an environment and receiving feedback in the form of rewards or penalties.
Receives rewards or penalties based on actions.
#Example: Training a robot to navigate a maze.
#Algorithms: Q-learning, deep Q-networks (DQN), policy gradients, etc.
#SupervisedLearning#MachineLearning#Classification#Regression#DataScience#DataAnalysis#PredictiveModeling
PhD Researcher | Machine Learning & AI
1wExcellent work! walakin jatni b7al ila khelto mabin l'alog d'optimization e.g. G.D. o backpropagation li lmohima dyalo hiya l calcul des gradients a partir mn retropropagation dyal les erreurs mn la derniere couche jusqu'a la premiere couche?