𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀 𝗧𝗶𝗺𝗲 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 𝗟𝗶𝗻𝗲𝗮𝗿 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Linear regression scales well with large datasets, where n is the number of data points and p is the number of features. 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p * i) → Prediction Time Complexity: O(p) 🟢 Involves iterative updates (i = iterations), making it slower than linear regression but efficient for binary classification. 𝗞-𝗡𝗲𝗮𝗿𝗲𝘀𝘁 𝗡𝗲𝗶𝗴𝗵𝗯𝗼𝗿𝘀 (𝗞-𝗡𝗡) → Training Time Complexity: O(1) → Prediction Time Complexity: O(n * p) 🟠 Training is instant, but prediction time grows with dataset size n, as it calculates the distance to every point. 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (𝗦𝗩𝗠) → Training Time Complexity: O(n^2 * p) (or O(n^3) for kernels) → Prediction Time Complexity: O(s * p) (s = support vectors) 🔴 Computationally heavy for training, especially with kernels, but effective in high-dimensional spaces. 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗧𝗿𝗲𝗲𝘀 → Training Time Complexity: O(n * p * log(n)) → Prediction Time Complexity: O(log(n)) 🟢 Efficient for both training and prediction, well-suited for non-linear data. 𝗥𝗮𝗻𝗱𝗼𝗺 𝗙𝗼𝗿𝗲𝘀𝘁 → Training Time Complexity: O(k * n * p * log(n)) (k = number of trees) → Prediction Time Complexity: O(k * log(n)) 🟠 Scales better than decision trees by reducing overfitting, but increases complexity with added trees. 𝗡𝗮𝗶𝘃𝗲 𝗕𝗮𝘆𝗲𝘀 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Simple and very fast, both in training and prediction, making it ideal for high-dimensional datasets. 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 → Training Time Complexity: O(i * n * p) (i = iterations) → Prediction Time Complexity: O(p) 🔴 Highly dependent on the architecture and number of layers, with significant training times. → Note: Time complexities provide a general guideline; actual performance can vary based on data, implementation, and hardware.
MOHAMMAD ASAD’s Post
More Relevant Posts
-
🔍 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐧𝐠 𝐭𝐡𝐞 𝐎𝐩𝐭𝐢𝐦𝐚𝐥 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥-𝐀𝐮𝐠𝐦𝐞𝐧𝐭𝐞𝐝 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 (𝐑𝐀𝐆) 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡: 𝐆𝐫𝐚𝐩𝐡 𝐑𝐀𝐆 𝐯𝐬. 𝐕𝐞𝐜𝐭𝐨𝐫 𝐑𝐀𝐆 In the realm of Retrieval-Augmented Generation (RAG), choosing between Graph RAG and Vector RAG depends on your specific application needs. Here’s a guide to help you make the best decision: 📊 𝐃𝐚𝐭𝐚 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞: 📝 Unstructured Text Predominance: If your data mainly consists of unstructured text, Vector RAG provides a robust foundation. 🔗 Highly Interconnected Data: When working with data that has well-defined relationships and interconnected entities, Graph RAG is advantageous. 🧩 𝐐𝐮𝐞𝐫𝐲 𝐂𝐨𝐦𝐩𝐥𝐞𝐱𝐢𝐭𝐲: 🔍 Simple Similarity-Based Queries: Vector RAG efficiently manages straightforward queries based on semantic similarity. 🧠 Multi-Hop Reasoning Demands: For complex queries requiring multi-step reasoning across connected entities, Graph RAG excels. ⚙️ 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐂𝐨𝐧𝐬𝐢𝐝𝐞𝐫𝐚𝐭𝐢𝐨𝐧𝐬: 🚀 Rapid Deployment: Vector RAG is generally easier to implement, making it ideal for projects needing a quicker start. 🧬 Complex Knowledge Domains: In intricate knowledge domains, investing in Graph RAG's implementation can offer significant long-term benefits. 📈 𝐒𝐜𝐚𝐥𝐚𝐛𝐢𝐥𝐢𝐭𝐲: 🌐 Large-Scale Unstructured Data: Vector RAG excels in scalability, especially with large volumes of unstructured data. 🕸️ Intricate Relationship Networks: Scaling Graph RAG can become more challenging as the complexity of relationships within the data increases.
To view or add a comment, sign in
-
𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀 𝗧𝗶𝗺𝗲 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 𝗟𝗶𝗻𝗲𝗮𝗿 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Linear regression scales well with large datasets, where n is the number of data points and p is the number of features. 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p * i) → Prediction Time Complexity: O(p) 🟢 Involves iterative updates (i = iterations), making it slower than linear regression but efficient for binary classification. 𝗞-𝗡𝗲𝗮𝗿𝗲𝘀𝘁 𝗡𝗲𝗶𝗴𝗵𝗯𝗼𝗿𝘀 (𝗞-𝗡𝗡) → Training Time Complexity: O(1) → Prediction Time Complexity: O(n * p) 🟠 Training is instant, but prediction time grows with dataset size n, as it calculates the distance to every point. 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (𝗦𝗩𝗠) → Training Time Complexity: O(n^2 * p) (or O(n^3) for kernels) → Prediction Time Complexity: O(s * p) (s = support vectors) 🔴 Computationally heavy for training, especially with kernels, but effective in high-dimensional spaces. 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗧𝗿𝗲𝗲𝘀 → Training Time Complexity: O(n * p * log(n)) → Prediction Time Complexity: O(log(n)) 🟢 Efficient for both training and prediction, well-suited for non-linear data. 𝗥𝗮𝗻𝗱𝗼𝗺 𝗙𝗼𝗿𝗲𝘀𝘁 → Training Time Complexity: O(k * n * p * log(n)) (k = number of trees) → Prediction Time Complexity: O(k * log(n)) 🟠 Scales better than decision trees by reducing overfitting, but increases complexity with added trees. 𝗡𝗮𝗶𝘃𝗲 𝗕𝗮𝘆𝗲𝘀 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Simple and very fast, both in training and prediction, making it ideal for high-dimensional datasets. 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 → Training Time Complexity: O(i * n * p) (i = iterations) → Prediction Time Complexity: O(p) 🔴 Highly dependent on the architecture and number of layers, with significant training times. → Note: Time complexities provide a general guideline; actual performance can vary based on data, implementation, and hardware. --- 400+ 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀: https://lnkd.in/gv9yvfdd 📘 𝗣𝗿𝗲𝗺𝗶𝘂𝗺 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 : https://lnkd.in/gPrWQ8is 📙 𝗣𝘆𝘁𝗵𝗼𝗻 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗟𝗶𝗯𝗿𝗮𝗿𝘆: https://lnkd.in/gHSDtsmA 📗 45+ 𝗠𝗮𝘁𝗵𝗲𝗺𝗮𝘁𝗶𝗰𝘀 𝗕𝗼𝗼𝗸𝘀 𝗘𝘃𝗲𝗿𝘆 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝘀𝘁 𝗡𝗲𝗲𝗱𝘀: https://lnkd.in/ghBXQfPc --- Join What's app channel for jobs updates: https://lnkd.in/gu8_ERtK 📸: @datainterview.com
To view or add a comment, sign in
-
𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀 𝗧𝗶𝗺𝗲 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 𝗟𝗶𝗻𝗲𝗮𝗿 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Linear regression scales well with large datasets, where n is the number of data points and p is the number of features. 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p * i) → Prediction Time Complexity: O(p) 🟢 Involves iterative updates (i = iterations), making it slower than linear regression but efficient for binary classification. 𝗞-𝗡𝗲𝗮𝗿𝗲𝘀𝘁 𝗡𝗲𝗶𝗴𝗵𝗯𝗼𝗿𝘀 (𝗞-𝗡𝗡) → Training Time Complexity: O(1) → Prediction Time Complexity: O(n * p) 🟠 Training is instant, but prediction time grows with dataset size n, as it calculates the distance to every point. 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (𝗦𝗩𝗠) → Training Time Complexity: O(n^2 * p) (or O(n^3) for kernels) → Prediction Time Complexity: O(s * p) (s = support vectors) 🔴 Computationally heavy for training, especially with kernels, but effective in high-dimensional spaces. 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗧𝗿𝗲𝗲𝘀 → Training Time Complexity: O(n * p * log(n)) → Prediction Time Complexity: O(log(n)) 🟢 Efficient for both training and prediction, well-suited for non-linear data. 𝗥𝗮𝗻𝗱𝗼𝗺 𝗙𝗼𝗿𝗲𝘀𝘁 → Training Time Complexity: O(k * n * p * log(n)) (k = number of trees) → Prediction Time Complexity: O(k * log(n)) 🟠 Scales better than decision trees by reducing overfitting, but increases complexity with added trees. 𝗡𝗮𝗶𝘃𝗲 𝗕𝗮𝘆𝗲𝘀 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Simple and very fast, both in training and prediction, making it ideal for high-dimensional datasets. 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 → Training Time Complexity: O(i * n * p) (i = iterations) → Prediction Time Complexity: O(p) 🔴 Highly dependent on the architecture and number of layers, with significant training times. =>Note: Time complexities provide a general guideline; actual performance can vary based on data, implementation, and hardware. #ML #DataScience #DataWorld #Data #DataAnalytics #BusinessGrowth
To view or add a comment, sign in
-
𝐟𝐫𝐨𝐦 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐭𝐢𝐜𝐬 𝐭𝐨 𝐏𝐫𝐞𝐬𝐜𝐫𝐢𝐩𝐭𝐢𝐯𝐞/𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥 ------ by Ali Papi From an optimization or prescriptive analytics perspective, various approaches exist for modeling data and its patterns in optimization models. These approaches can be broadly categorized as follows: 🔶 𝟙. 𝕊𝕥𝕠𝕔𝕙𝕒𝕤𝕥𝕚𝕔 ℙ𝕣𝕠𝕘𝕣𝕒𝕞𝕞𝕚𝕟𝕘 (𝕊ℙ) This classical approach to modeling under uncertainty involves extracting the governing data pattern as a probability distribution function (PDF). The objective function's mean performance is then optimized considering the corresponding PDF. SP models can be employed with various risk-neutral and risk-averse metrics, such as CVaR. 🔶 𝟚. ℂ𝕙𝕒𝕟𝕔𝕖-ℂ𝕠𝕟𝕤𝕥𝕣𝕒𝕚𝕟𝕖𝕕 ℙ𝕣𝕠𝕘𝕣𝕒𝕞𝕞𝕚𝕟𝕘 (ℂℂℙ) Similar to SP, CCP first requires extracting the PDF. However, in CCP, a confidence level for constraint satisfaction is defined. This is particularly useful when uncertain parameters are present in constraints rather than the objective function. 🔶 𝟛. ℝ𝕠𝕓𝕦𝕤𝕥 𝕆𝕡𝕥𝕚𝕞𝕚𝕫𝕒𝕥𝕚𝕠𝕟 𝔹𝕒𝕤𝕖𝕕 𝕠𝕟 ℂ𝕠𝕟𝕧𝕖𝕩 𝕊𝕖𝕥𝕤 This paradigm of optimization under uncertainty utilizes available data to extract an uncertainty set (typically convex). Subsequently, the objective function's optimality (or constraint satisfaction) is guaranteed in the worst-case scenario within this uncertainty set. Machine learning techniques like PCA and FCM can be employed to extract these uncertainty sets and their corresponding tractable robust counterparts. 🔶 𝟜. 𝔻𝕚𝕤𝕥𝕣𝕚𝕓𝕦𝕥𝕚𝕠𝕟𝕒𝕝𝕝𝕪 ℝ𝕠𝕓𝕦𝕤𝕥 𝕆𝕡𝕥𝕚𝕞𝕚𝕫𝕒𝕥𝕚𝕠𝕟 (𝔻ℝ𝕆) This approach emerges from the marriage of classical SP and robust optimization. Instead of a single distribution, DRO utilizes available data to extract a family of close-proximate PDFs. Then, SP is employed to optimize the mean performance under the worst-case possible distribution. Note: These data modeling and pattern recognition approaches offer powerful tools for optimization under uncertainty, enabling informed data-driven decision-making in the face of complex and stochastic real-world problems. The choice of the most suitable approach depends on the specific problem characteristics, data availability, and computational constraints. ------ #Optimization #OperationsResearch #DataAnalytics #MachineLearning #DecisionMaking ------ OptimYar ✅
To view or add a comment, sign in
-
I don't rely on Accuracy in multiclass classification settings to measure model improvement 🧩 Consider probabilistic multiclass classification models. Using "Accuracy" as a signal to measure model improvement can be deceptive. It can mislead you into thinking that you are not making any progress in improving the model. In other words, it is possible that we are actually making good progress in improving the model... ...but “Accuracy” is not reflecting that (YET). The problem arises because Accuracy only checks if the prediction is correct or not. And during iterative model building, the model might not be predicting the true label with the highest probability... ...but it might be quite confident in placing the true label in the top "k" output probabilities. Thus, using a "top-k accuracy score" can be a much better indicator to assess whether my model improvement efforts are translating into meaningful enhancements in predictive performance or not. For instance, if top-3 accuracy increases from 75% to 90%, it is clear that the improvement technique was effective: - Earlier, the correct prediction was in the top 3 labels only 75% of the time. - But now, the correct prediction is in the top 3 labels 90% of the time. Thus, one can effectively direct the engineering efforts in the right direction. Of course, what I am saying should ONLY be used to assess the model improvement efforts. This is because true predictive power will be determined using traditional Accuracy. As depicted in the image below: - It is expected that “Top-k Accuracy” may continue to increase during model iterations. This reflects improvement in performance. - Accuracy, however, may stay the same during successive improvements. Nonetheless, we can be confident that the model is getting better and better. For a more visual explanation, check out this issue: https://lnkd.in/dP_h8SFM. -- 👉 Get a Free Data Science PDF (530+ pages) by subscribing to my daily newsletter today: https://lnkd.in/gzfJWHmu -- 👉 Over to you: What are some other ways to assess model improvement efforts?
To view or add a comment, sign in
-
Uncertainty pervades every stage of the #ML pipeline, including: -Data preprocessing -Feature engineering -Model training -Model prediction -Model deployment In data-driven ML methods, uncertainty arises from: -The intrinsic ambiguity in the data -Variations in sampling -Flawed models -Errors in model approximations Even in rule-based or symbolic ML systems, the complexity of the market introduces uncertainty, affecting any conclusions drawn. Addressing uncertainty is crucial not as a theoretical idea but as a practical requirement, as #trading systems must generate profit despite working with imperfect information. For now, the most effective way I have found to use total uncertainty is by: -Modeling it to switch-off the system -Iniciating a process of optimization -Metric of quality performance -Filtering and assets selection To understand it, a quick reminder of some concepts: -Aleatoric-statistical uncertainty: What will a random sample drawn from a probability distribution be? -Epistemic-systematic uncertainty: What is the relevant probability distribution? -Total uncertainty: The sum of aleatoric and epistemic when both of them are independent The basic protocol is based on: PnL → Conformal prediction → Switch-off (or optimization) Some considerations: -Although robust and stochastic optimization methods are often used to handle optimization under Bayesian uncertainty, they don’t effectively predict epistemic uncertainty with sufficient accuracy -By integrating models that account for epistemic uncertainty, we can improve the optimization process by considering the gaps in information or data -It is necessary to change the typical CP method to fit into this phase of the system development -The rule is simple: If PnL < Lower interval: Switch-off, else: continue Aleatoric uncertainty arises from the inherent randomness or noise in the data: -This type of uncertainty can’t be reduced by collecting more data -Conformal prediction helps quantify this uncertainty by constructing prediction intervals that account for the variability in the data -To handle it with conformal prediction we have some key methods: •Nonconformity measure: To capture the deviations of the closed trades from the PnL’s benchmark •Prediction intervals: For this application we are interested only in the lower interval 👇👇👇Continue in the comments👇👇👇
To view or add a comment, sign in
-
We often use PCA for dimensionality reduction. PCA is a linear transformation technique that focuses on capturing the most prominent linear variations in the data. However, it may not be suitable for complex datasets with non-linear relationships between data points. t-SNE (t-Distributed Stochastic Neighbour Embedding) is a non-linear technique specifically designed for visualizing high-dimensional data. It excels at preserving the local structure of the data in a lower-dimensional space, allowing us to see complex, non-linear relationships between data points. Here are some advantages of t-SNE: - Preserves local structure: It better represents how close data points are in the original high-dimensional space. - Reveals complex patterns: By preserving local structure, it can reveal hidden patterns and relationships in the data. - Interpretable visualizations: The resulting lower-dimensional representation is often visually interpretable, aiding in exploratory data analysis. However, t-SNE also has some limitations: - It is computationally expensive, especially for large datasets. - The embeddings are non-convex, making it challenging to interpret distances and densities accurately. - The results can be sensitive to hyperparameter settings (e.g., perplexity, learning rate).
To view or add a comment, sign in
-
🚀 The Time Complexity of the "Champion Node" Problem 🎯 In today's post, let's dive deep into the time complexity of a code that identifies the Champion Node in a directed graph. The champion node is the one with no incoming edges, and it’s an exciting problem in graph theory! Let’s break down how the time complexity is derived and why it's essential for understanding the performance of your algorithms. 📊 🔍 Problem Recap The task is to find a node in a directed graph that has no incoming edges (in-degree of 0). This problem is widely applicable in scenarios where we need to identify "sources" or "winners" in directed networks. Think of it like finding the "champion" in a competition based on their performance! 🏆 🧠 Approach Breakdown 1. Calculate In-Degree: For each edge (u, v) in the graph, increase the in-degree count for node v. This ensures that we capture the number of incoming edges for every node in the graph. 2. Identify Candidates: Nodes with in-degree == 0 are potential champions (no one points to them). 3. Return the Champion: If there’s exactly one node with in-degree == 0, we have found our champion! If not, there is no unique champion. 📈 Time Complexity Now, let’s discuss the time complexity in detail: Initialization of the in_degree array takes O(V), where V is the number of nodes (vertices). Processing Edges: Traversing all E edges, each edge requires an operation that takes constant time. So, this step is O(E). Identifying Candidates: Traversing through the in_degree array to find nodes with no incoming edges takes O(V). Thus, the total time complexity is O(E + V), where: E = number of edges V = number of vertices This complexity ensures that even for large graphs, our solution is efficient! 🚀 📊 Time Complexity Image Here’s a visualization of the Time Complexity analysis for better understanding: 🧑💻 Key Takeaways: Scalability: The algorithm scales linearly with the number of edges and nodes. So, whether you’re dealing with a small graph or a large network, this approach will perform efficiently. Understanding Complexity: Breaking down and understanding time complexity is crucial when working with real-world data to ensure performance in production systems. 💡 In Summary: Graph-based problems like finding a champion node are prevalent in many real-world applications like social networks, recommendation systems, and directed acyclic graphs (DAGs). Understanding the time complexity ensures that we are building optimized and scalable systems. 🌱 For a detailed solution to this problem, check out the full code and explanation here: https://lnkd.in/dYrtFZjb Feel free to comment below if you have any questions or insights on time complexity in graph algorithms! ⬇️ Let's connect and learn together! 🔗 #GraphTheory #TimeComplexity #AlgorithmDesign #Coding #Programming #DataStructures #MachineLearning #SoftwareEngineering
To view or add a comment, sign in
-
🌟 𝐃𝐚𝐲 𝟏𝟏: 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 - 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐕𝐞𝐜𝐭𝐨𝐫 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 (𝐒𝐕𝐌) 🌟 📚 𝐌𝐨𝐝𝐮𝐥𝐞 𝟎𝟒: 𝐏𝐚𝐫𝐭 - 𝟔 𝐒𝐮𝐩𝐞𝐫𝐯𝐢𝐬𝐞𝐝 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 - 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐕𝐞𝐜𝐭𝐨𝐫 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 🧠 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 (𝗦𝗩𝗠) 𝗦𝗩𝗠 is used for both classification and regression problems, but it is mainly used for classification. The core concept of SVM is to plot each data item as a point in n-dimensional space, with each feature representing a coordinate. The aim is to find the optimal hyperplane that separates the data into different classes. 🔍𝗞𝗲𝘆 𝗖𝗼𝗻𝗰𝗲𝗽𝘁𝘀: - 𝑯𝒚𝒑𝒆𝒓𝒑𝒍𝒂𝒏𝒆: The line or plane that separates the data into different classes. - 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑽𝒆𝒄𝒕𝒐𝒓𝒔: The data points that are closest to the hyperplane and help in defining the boundary. - 𝑴𝒂𝒓𝒈𝒊𝒏: The distance between the hyperplane and the nearest support vectors. 📊 𝗚𝗿𝗮𝗽𝗵𝗶𝗰𝗮𝗹 𝗥𝗲𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 In a 2D space with two features, the SVM finds a line (or hyperplane in higher dimensions) that separates the data into classes. 𝗔𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲𝘀: - Capable of performing linear and nonlinear classification. - Effective in high-dimensional spaces. - Versatile for various types of data. 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀: Classification of data into classes. 🗂️ Regression tasks. 📈 Outlier detection. 🔍 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗢𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲: Find the "maximum margin hyperplane," which is farthest from the closest points of the two classes (support vectors). 🔍 𝐄𝐱𝐩𝐥𝐨𝐫𝐚𝐭𝐨𝐫𝐲 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 𝐚𝐧𝐝 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐕𝐞𝐜𝐭𝐨𝐫 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐨𝐧 𝐭𝐡𝐞 𝐅𝐫𝐚𝐦𝐢𝐧𝐠𝐡𝐚𝐦 𝐇𝐞𝐚𝐫𝐭 𝐒𝐭𝐮𝐝𝐲 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 📊 𝟭. 𝗗𝗮𝘁𝗮 𝗟𝗼𝗮𝗱𝗶𝗻𝗴 𝗮𝗻𝗱 𝗜𝗻𝘀𝗽𝗲𝗰𝘁𝗶𝗼𝗻 📝 𝟮. 𝗗𝗮𝘁𝗮 𝗖𝗹𝗲𝗮𝗻𝗶𝗻𝗴 🧹 𝟯. 𝗗𝗮𝘁𝗮 𝗘𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻 🔎 𝟰. 𝗣𝗿𝗲𝗽𝗮𝗿𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 𝗳𝗼𝗿 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴 🛠️ 𝟱. 𝗠𝗼𝗱𝗲𝗹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗮𝗻𝗱 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 📈 𝟲. 𝗖𝗼𝗻𝗳𝘂𝘀𝗶𝗼𝗻 𝗠𝗮𝘁𝗿𝗶𝘅 𝗩𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 📊 𝐊𝐞𝐲 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲𝐬: - SVMs find the optimal hyperplane for classification by maximizing the margin between classes. 🎯 - They can be used for both linear and non-linear classification tasks. 🔄 - The performance of the SVM model can be evaluated using metrics like accuracy, confusion matrix, and classification report. 📊 #MachineLearning #SupportVectorMachine #SVM #DataScience #HeartDiseasePrediction #Classification #Day11 #SkillVertex
To view or add a comment, sign in
-
Evaluating model improvements with Accuracy can be misleading 🧩 The efficacy of a model improvement step is best determined using performance metrics. However, improving probabilistic multiclass-classification models using "Accuracy" as a signal can be deceptive. In other words, it is possible that we are actually making good progress in improving the model... ...but “Accuracy” is not reflecting that (YET). The problem arises because Accuracy only checks if the prediction is correct or not. And during iterative model building, the model might not be predicting the true label with the highest probability... ...but it might be quite confident in placing the true label in the top "k" output probabilities. Thus, using a "top-k accuracy score" can be a much better indicator to assess whether my model improvement efforts are translating into meaningful enhancements in predictive performance or not. For instance, if top-3 accuracy increases from 75% to 90%, it is clear that the improvement technique was effective: - Earlier, the correct prediction was in the top 3 labels only 75% of the time. - But now, the correct prediction is in the top 3 labels 90% of the time. Thus, one can effectively direct the engineering efforts in the right direction. Of course, what I am saying should ONLY be used to assess the model improvement efforts. This is because true predictive power will be determined using traditional Accuracy. So make sure you are gradually progressing on the Accuracy front too. As depicted in the image below: - It is expected that “Top-k Accuracy” may continue to increase during model iterations. This reflects improvement in performance. - Accuracy, however, may stay the same during successive improvements. Nonetheless, we can be confident that the model is getting better and better. For a more visual explanation, check out this issue: https://lnkd.in/dP_h8SFM. -- 👉 Get a Free Data Science PDF (550+ pages) with 320+ posts by subscribing to my daily newsletter today: https://lnkd.in/gzfJWHmu -- 👉 Over to you: What are some other ways to assess model improvement efforts?
To view or add a comment, sign in