Mathieu Le Pajolec
Toulouse, Occitanie, France
1 k abonnés
+ de 500 relations
Voir les relations en commun avec Mathieu
Bon retour parmi nous
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
ou
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
Voir les relations en commun avec Mathieu
Bon retour parmi nous
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
ou
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
Licences et certifications
Voir le profil complet de Mathieu
Autres profils similaires
-
Pierre COURTOIS
Dunkerque et périphérieSe connecter -
Laila LYOUSSFI
Chef projet méthodes
ParisSe connecter -
Yoann Lemonnier
AngersSe connecter -
Yassine Oufkir
Ingénieur Industriel chez CERDYS
Nantes et périphérieSe connecter -
Julien Cabanne
StrasbourgSe connecter -
Oscar MARIEZ
Le CreusotSe connecter -
Rime BAHLAK
Ingènieur d'affaires
Houston, TXSe connecter -
Mathieu Frigo
Paris et périphérieSe connecter -
Anne-Laure Vialette
Ingénieur - Modélisation technico économique
Vert-Saint-DenisSe connecter -
Maxime Bréchu
CompiègneSe connecter -
Alexandre Brasme
Ingénieur méthodes chez RES France
AvignonSe connecter -
Abdennaceur Lahlou
Consultant fonctionnel SAP SD
Lyon et périphérieSe connecter -
Sébastien Vieugué
Responsable d'atelier DEVILLÉ ASC
AngersSe connecter -
Pierre Roblot
Responsable Méthodes chez Blancpain
MorbierSe connecter -
Thibault Monge-Cadet
Paris et périphérieSe connecter -
Marie GRANDPIERRE
FranceSe connecter -
Quentin COMMUN
DreuxSe connecter -
M'hammed EL ALAMI
ParisSe connecter -
Firas Ben Hamed
Cergy-PontoiseSe connecter -
Ayoub Bendraka
Chef de projet Digitalisation @ Bouygues Travaux Publics
ToulouseSe connecter
Découvrir plus de posts
-
PyCon DE & PyData
⭐️ New video release 📺: I achieved peak performance in python, here's how ... Watch Dishant Sethi as he reveals the secrets to achieving peak performance in Python and optimizing your code for speed and efficiency. 📺 Watch the video on YouTube: https://lnkd.in/e9pS54Hg Dishant Sethi, Founder at prodinit.com, a software consultancy, shared insights on optimizing Python code for peak performance. The talk focused on techniques to enhance speed and reduce resource consumption, emphasizing memory efficiency. Key points included optimizing functional execution, rigorous testing, and performance enhancement. Common bottlenecks like inefficient coding practices, memory leaks, and suboptimal algorithms were discussed. Strategies for profiling code and achieving peak performance in data-driven applications were also explored, such as efficient DataFrame storage and looping techniques. Attendees gained valuable insights regardless of their experience level, with practical takeaways to improve Python code efficiency. By understanding the principles behind efficient Python coding, developers can enhance their optimization skills.
2 -
Venugopal Adep
🔍 𝐇𝐨𝐰 𝐢𝐬 𝐂𝐥𝐚𝐮𝐝𝐞 3.5 𝐒𝐨𝐧𝐧𝐞𝐭 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐢𝐧𝐠 𝐇𝐚𝐧𝐝𝐰𝐫𝐢𝐭𝐢𝐧𝐠 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬? In the rapidly evolving field of artificial intelligence, Claude 3.5 Sonnet is setting new benchmarks in vision and analysis capabilities. Here's how this advanced AI model is revolutionizing handwriting analysis: 📝 𝐄𝐧𝐡𝐚𝐧𝐜𝐞𝐝 𝐕𝐢𝐬𝐢𝐨𝐧 𝐂𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: Claude 3.5 Sonnet, with its cutting-edge vision capabilities, is enabling users to upload images of handwriting and receive detailed personality interpretations, showcasing a significant leap in visual data processing. 🚀 𝐒𝐩𝐞𝐞𝐝 𝐚𝐧𝐝 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲: In head-to-head tests with GPT-4, Claude 3.5 Sonnet has demonstrated superior speed, especially in tasks requiring high-level reasoning and multilingual capabilities, making it a preferred choice for developers and researchers. 💡 𝐂𝐨𝐬𝐭-𝐄𝐟𝐟𝐞𝐜𝐭𝐢𝐯𝐞𝐧𝐞𝐬𝐬: Despite its advanced features, Claude 3.5 Sonnet is surprisingly affordable. It's available for free on Claude's platform, with competitive pricing for high-volume usage, making it accessible to a broader audience. 🛠️ 𝐔𝐬𝐞𝐫-𝐅𝐫𝐢𝐞𝐧𝐝𝐥𝐲 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧: With an API that echoes the simplicity of the OpenAI API, transitioning to using Anthropic's technology is nearly seamless for developers familiar with existing AI models. 🖥️ 𝐑𝐞𝐚𝐥-𝐭𝐢𝐦𝐞 𝐂𝐨𝐥𝐥𝐚𝐛𝐨𝐫𝐚𝐭𝐢𝐨𝐧: The introduction of 'artifacts' in Claude allows users to interact with AI-generated content like code snippets and text documents directly, enhancing the collaborative experience and integrating AI insights more fluidly into projects. The application of Claude 3.5 Sonnet in handwriting analysis not only demonstrates the model’s robust capabilities but also hints at the potential future applications in various industries requiring nuanced visual interpretations. 🤔 Could Claude 3.5 Sonnet be the key to unlocking new dimensions in your field of work? How might enhanced AI vision capabilities impact your industry? #AI #MachineLearning #HandwritingAnalysis #Innovation #Claude35Sonnet
2 -
Jon Ripley
Going Beyond RAG - "Contextual Retrieval" With Anthropic RAG is great for semantic search on a corpus of internal documents but what happens when you need the search to include specific keywords ? RAG tends to lose context when chunked, obscuring the value of keywords. The article below from Anthropic takes RAG beyond simple embedding (with contextual loss) to contextual retrieval. By prepending context you can reduce the number of failed retrievals by 49% (and when combined with reranking, by 67%). https://lnkd.in/gUjdrXj5
9 -
Dr. Camilo Thorne
A nice library for data engineering (crunching tabular data) at scale - but with limited computational resources (e.g. RAM): #polars. Besides allowing for parallel processing on pandas-like #dataframes, it supports... lazy evaluation, to prevent OOM errors and optimize reading and processing of insanely huge CSV files on e.g. laptops: https://lnkd.in/ehTQs8eK
1 -
Guillaume Barrois
The traditional approach to text classification relied on training a classifier using massive amounts of labeled data. The new way, with large language models is to simply prompt the model, provide a few examples, and let it handle the classification for you ! 🤔 Or does it ? Not so fast… This new method might work well for standard cases like sentiment analysis, spam detection, or language identification. But let’s look at an example we know well at Explain: assigning public tenders to the right industry. Consider tenders for catering and restaurant services—we need to distinguish between catering for one-off events, school meals, food supply... You could try to tackle this with a detailed prompt. But now imagine doing it for all the industry: that would mean crafting thousands of intricate clauses, each with its own few-shot examples—which quickly becomes unmanageable. The bottom line is that when you have numerous categories with subtle distinctions requiring industry expertise, the "new way" falls short. And in B2B, most use cases are like this. 🤖 However, we can still benefit from the capabilities of modern LLMs by using a hybrid approach: leveraging an LLM for an initial classification that is then manually reviewed to create a dataset for training a traditional classifier. One thing is clear: while LLMs are powerful, they don’t replace the need to build and encode industry expertise into your product.
261 commentaire -
Darshil Modi
Is GraphRAG worth the hype ? This weekend I explored much-trending "GraphRAG" and analysed feasibility to include it in a LLM production pipeline. While GraphRAG is a substantial contribution to better retrievals, I still am concerned about its inefficiency and cost implications. - Implementing GraphRAG can be prohibitively expensive. For example, processing a single book of 32,000 words through an LLM to utilize GraphRAG's functionalities can cost about $10. Scale this to thousands of documents, and the financial burden becomes unsustainable for most operations. - Additionally, because information is stored in nodes, intricate details, specifically reasoning part, can be lost, leading to incomplete data retrieval. This lack of detail can significantly impact applications where nuance and completeness are critical. - Considering these factors, the tangible improvement in system performance ,in practice, is often marginal—around 5-10%. This leads us to question whether the effort and resources required to implement GraphRAG could be better spent enhancing existing systems. Alternatives like refining embedding models or improving data curation strategies might offer similar benefits at a fraction of the cost and complexity. Thoughts ? Alok Abhishek Hamza Farooq Victor Calderon, Ph.D. Tiffany Teasley Vasanth Raghu Nair
2412 commentaires -
Massimiliano Marchesiello
Why Most Cross-Validation Visualizations Are Wrong (And How to Fix Them) https://ift.tt/3Im8iwz MODEL VALIDATION & OPTIMIZATION Stop using moving boxes to explain cross-validation! You know those cross-validation diagrams in every data science tutorial? The ones showing boxes in different colors moving around to explain how we split data for training and testing? Like this one: Have you seen that? Image by author. I’ve seen them too — one too many times. These diagrams are common — they’ve become the go-to way to explain cross-validation. But here’s something interesting I noticed while looking at them as both a designer and data scientist. When we look at a yellow box moving to different spots, our brain automatically sees it as one box moving around. It’s just how our brains work — when we see something similar move to a new spot, we think it’s the same thing. (This is actually why cartoons and animations work!) You might think the animated version is better, but now you can’t help following the blue box and starting to forget that this should represent how cross-validation works. Source: Wikipedia But here’s the thing: In these diagrams, each box in a new position is supposed to show a different chunk of data. So while our brain naturally wants to track the boxes, we have to tell our brain, “No, no, that’s not one box moving — they’re different boxes!” It’s like we’re fighting against how our brain naturally works, just to understand what the diagram means. Looking at this as someone who works with both design and data, I started thinking: maybe there’s a better way? What if we could show cross-validation in a way that actually works with how our brain processes information? All visuals: Author-created using Canva Pro. Optimized for mobile; may appear oversized on desktop. What’s Cross-Validation Really About? Cross-validation is about making sure machine learning models work well in the real world. Instead of testing a model once, we test it multiple times using different parts of our data. This helps us understand how the model will perform with new, unseen data. Here’s what happens: We take our data Divide it into groups Use some groups for training, others for testing Repeat this process with different groupings The goal is to get a reliable understanding of our model’s performance. That’s the core idea — simple and practical. (Note: We’ll discuss different validation techniques and their applications in another article. For now, let’s focus on understanding the basic concept and why current visualization methods need improvement.) What’s Wrong with Current Cross-validation Diagrams? Open up any machine learning tutorial, and you’ll probably see these types of diagrams: Long boxes split into different sections Arrows showing parts moving around Different colors showing training and testing data Multiple versions of the same diagram side by side Currently, this is similar to the first image you’ll see if...
-
Ali Nemati
🚀 Mistral AI: Pixtral Large and Le Chat Platform Introducing Pixtral Large: Mistral AI, the French AI startup, has launched Pixtral Large, a groundbreaking 124-billion parameter multimodal model designed to process both text and visual data. - 🧠 Core Architecture: Combines a 123-billion parameter text decoder with a 1-billion parameter vision encoder. - 📖 Expanded Context Window: Supports up to 128K tokens, enabling it to handle 30 high-resolution images or a 300-page book. - 🏆 Benchmark Leader: Outperforms GPT-4o, Gemini 1.5 Pro, and Claude-3.5 Sonnet on key benchmarks like MathVista (69.4% accuracy), ChartQA, and DocVQA. Le Chat Platform: Unlocking New Possibilities Mistral’s Le Chat platform integrates Pixtral Large, introducing innovative features for diverse applications: -🔍 Web Search with Citations: Real-time data retrieval with source transparency. -🖋️ Canvas: Tools for live document creation, collaborative editing, and version control. -📄 Advanced OCR: Efficiently processes PDFs, tables, and equations to extract insights. - 🎨 Image Generation: Powered by Flux Pro from Black Forest Labs. - 🤖 Task Agents: Automates tasks like summarization and invoice processing. - 🌟 Performance Highlights Pixtral Large sets new standards on benchmarks like ChartQA, DocVQA, and VQAv2, showcasing cutting-edge capabilities in document analysis and visual data interpretation. The model is available under the Mistral Research License for non-commercial use, with commercial licenses for enterprise adoption. 📢 Additional Updates: Mistral AI also unveiled Mistral Large 2.1, a 123-billion parameter model optimized for general-purpose tasks, along with enhanced APIs to streamline developer integration.
6 -
Abdul Rauf
Hi, I just created a video about CNN (Convolutional Neural Networks). In this video, I explain the basic concepts used in CNNs, like convolutions and pooling. I also show some code at the end that trains a model to recognize handwritten digits. Check it out here : https://lnkd.in/dTGB9Htu #CNN #ConvolutionalNeuralNetworks #pooling #convolutions
7 -
Sanjay Kumar MBA,MS,PhD
GraphRAG : New tool for complex data discovery GraphRAG is a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets. The GraphRAG process involves extracting a knowledge graph out of raw text, building a community hierarchy, generating summaries for these communities, and then leveraging these structures when perform RAG-based tasks. #LLM #GenAI #Graph #RAG #graphrag #AI Reference : https://lnkd.in/gYbw5yXG
5 -
Wei Jie Ng
This approach can capture and leverage complex relationships and dependencies between entities, which might be missed by purely semantic-based methods. It is particularly useful in domains where relational information is critical, such as scientific research, legal documents, or interconnected datasets. Exciting times ahead for AI and data-driven decision-making! Can't wait to explore it!
4 -
Dra. Eva Andres Nuñez
Although I appreciate the various approaches to Theory of Mind (ToM), it is worth noting that there is still no unified mathematical theory that consolidates all the theories regarding decision-making in the brain. In my opinion, there is indeed consciousness in decision-making, especially due to the functions of the hippocampus, which processes at a high level what has been decided at the stimulus-response level, akin to what the hypothalamus does. The hippocampus plays a fundamental role in learning and orchestrates a large part of it. Additionally, decisions influence each other, which would require a certain level of consciousness. Furthermore, according to ToM, there is a social and feedback component inherent in decision-making.Lastly, memory is subjective and enriched by prior experience and lessons learned, which also necessitates consciousness.
8 -
Frédéric Branchaud-Charron
Baal, our Bayesian Active Learning library is working on a major version and we want to know more about you! If you use Baal for Active Learning, Uncertainty Estimation or Bayesian Deep Learning, we would **love** to talk to you! 😎 In more detail, we want to understand when our users use our library and how. You can take a spot in our Calendly:
14 -
Shekar Ramachandran
🚀 Introducing 𝗔𝘂𝘁𝗼𝗚𝗚𝗨𝗙: An Automated Graphical Interface for GGUF Model Quantization 🎉 AutoGGUF, a new graphical user interface (PyQt6) app developed in Python to streamline the quantization process of GGUF models using the llama.cpp library. 🎳 𝗞𝗲𝘆 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀: - Automated Download & Management: Seamlessly manage llama.cpp backends, including CUDA. - Simple Model Selection & Quantization: Easily choose and quantize models. - Configurable Parameters: Tailor your quantization settings. - Resource Monitoring: Monitor system resources during operations. - Parallel Tasks: Enjoy the benefits of threaded execution. - Preset Saving: Save your quantization presets for future use. - iMatrix Generation: Effortlessly generate iMatrix. - Extensive Logging: Access detailed logs for better tracking and troubleshooting. 🍦 𝗖𝗼𝗺𝗽𝗮𝘁𝗶𝗯𝗶𝗹𝗶𝘁𝘆 & 𝗔𝗰𝗰𝗲𝘀𝘀𝗶𝗯𝗶𝗹𝗶𝘁𝘆: - Cross-Platform: Compatible with multiple platforms. - Open Source: Licensed under Apache-2.0. - Language Support: Available in 28 languages. - Executable Releases: Windows and Ubuntu users can download the latest release executable built with PyInstaller for enhanced performance, while other platforms can run it from source. 🕸️ 𝗦𝗶𝗺𝗽𝗹𝗶𝗳𝗶𝗲𝗱 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄: AutoGGUF removes the need for command line operations, automates directory creation, and offers extensive customization options. It addresses common pain points in the quantization workflow, making it easier to work with GGUF models. 👁️ 𝗚𝗶𝘁𝗛𝘂𝗯 𝗥𝗲𝗽𝗼𝘀𝗶𝘁𝗼𝗿𝘆: https://lnkd.in/dqTzAShC 🧠 𝗞𝗻𝗼𝘄𝗻 𝗜𝘀𝘀𝘂𝗲𝘀: - Preset Saving During Quantization: May cause UI thread crashes. - Task Deletion During Processing: You must cancel the task first to avoid crashes. 🤖𝗨𝗽𝗰𝗼𝗺𝗶𝗻𝗴 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀: - Custom command line parameters - Additional iMatrix generation parameters - Perplexity testing - Conversion of HF safetensors to GGUF - Enhanced progress tracking Below a screenshot of the app to give you a glimpse:
2 -
Quentin Lhoest
How to handle private data in ML datasets ? Here is out latest exciting experiment at Hugging Face 🤗 : Many datasets contain undocumented private information, making them difficult for ML practitioners to use. We started to use #Presidio to detect the presence of private data in HF Datasets, in particular Personal Identifiable Information (#PII). We now show a Presidio Report that estimates the amount of PII on dataset pages. This is useful e.g. to help further filter datasets or to take more informed decisions before training a model. More info on Presidio + examples in our latest blog post: https://lnkd.in/e5_mt7j7 We built this with the help of the community (special thanks Omri Mendels for the help with Presidio, Sara Hooker for the feedback on multilingual datasets, Margaret Mitchell and team for the help on the blog post !) and from our work with CNIL - Commission Nationale de l'Informatique et des Libertés and their guidance.
51 -
LlamaIndex
LlamaIndex + MLflow RAG contains a lot of parameters to tune, from chunking to indexing - everything affects downstream answer accuracy. It’s important to have a systematic approach to track and tune these parameters, and define the right eval metrics/dataset to evaluate changes. Check out Jino Rohit's video on experimenting with a LlamaIndex pipeline with MLflow for experiment tracking. https://lnkd.in/gyYXfyyb
55414 commentaires -
Venugopal Adep
AI faces its 'Oppenheimer moment' during killer robot arms race 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐭𝐡𝐢𝐬 𝐧𝐞𝐰𝐬 𝐚𝐛𝐨𝐮𝐭? 🤖 The article discusses the urgent challenge of regulating artificially intelligent killing machines, termed as the "Oppenheimer moment" for AI. 🌍 It highlights a major conference in Vienna where over 100 countries discussed the merger of AI with military technologies. 🔍 Jaan Tallinn and other experts express concerns over the rapid proliferation of autonomous weapons systems. 𝐖𝐡𝐲 𝐢𝐬 𝐭𝐡𝐢𝐬 𝐚 𝐦𝐚𝐭𝐭𝐞𝐫 𝐨𝐟 𝐜𝐨𝐧𝐜𝐞𝐫𝐧 𝐭𝐨 𝐦𝐞? 🌐 The use of AI in warfare could have profound ethical and safety implications for global security. 💡 Understanding the discussions and potential regulations can inform one's views on the future of AI and its ethical use. 🛡️ Awareness of these issues is crucial for anyone involved in AI, tech policy, or global security fields. 𝐇𝐨𝐰 𝐜𝐚𝐧 𝐈 𝐦𝐚𝐤𝐞 𝐭𝐡𝐞 𝐛𝐞𝐬𝐭 𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐢𝐬 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧? 📢 Advocate for ethical standards and regulations in AI development and its military applications. 📚 Educate others about the potential risks and the importance of international cooperation in controlling AI weapons. 🤔 Reflect on how your work or research in AI or related fields can contribute to responsible AI use. 𝐖𝐡𝐞𝐫𝐞 𝐢𝐬 𝐭𝐡𝐢𝐬 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐜𝐨𝐦𝐢𝐧𝐠 𝐟𝐫𝐨𝐦? 🗞️ Report by Bloomberg, based on the Vienna conference discussions and statements from key figures like Austrian Foreign Minister Alexander Schallenberg and Jaan Tallinn. 🎤 Insights from a wide range of global civilian, military, and technology officials concerned with AI's impact on warfare. 𝐖𝐡𝐞𝐧 𝐰𝐚𝐬 𝐭𝐡𝐢𝐬 𝐩𝐮𝐛𝐥𝐢𝐬𝐡𝐞𝐝? 📅 Last Updated on April 30, 2024, providing a timely reflection on the growing urgency to address AI in military applications. 🔗 Interested in the ethical implications of AI and how it shapes global security? Connect with me on LinkedIn to explore more about responsible AI use and international policies!
-
HIMANSHU NEGI✅
𝐄𝐱𝐩𝐥𝐨𝐫𝐢𝐧𝐠 𝐭𝐡𝐞 𝐅𝐫𝐨𝐧𝐭𝐢𝐞𝐫 𝐨𝐟 𝐀𝐈 𝐰𝐢𝐭𝐡 𝐌𝐚𝐦𝐛𝐚 𝐌𝐨𝐝𝐞𝐥𝐬: 𝐀 𝐋𝐞𝐚𝐩 𝐢𝐧 𝐙𝐞𝐫𝐨-𝐒𝐡𝐨𝐭 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐎𝐎𝐃 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 In a groundbreaking technical report, researchers have unveiled a new frontier in artificial intelligence through the development of Mamba-based models, which are setting new benchmarks in efficiency and robustness across a variety of challenging datasets. 𝘐𝘯𝘯𝘰𝘷𝘢𝘵𝘪𝘷𝘦 𝘈𝘱𝘱𝘳𝘰𝘢𝘤𝘩: The study presents the first attempt to train a transferable Mamba model enhanced by contrastive language-image pretraining (CLIP). This innovative integration enables the model to achieve remarkable understanding and contextual interpretation of images through natural language. 𝘐𝘮𝘱𝘳𝘦𝘴𝘴𝘪𝘷𝘦 𝘗𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦: Mamba models, though smaller in size, have shown remarkable capabilities. A Mamba model with just 67 million parameters performs on par with a much larger 307 million-parameter Vision Transformer (ViT) in zero-shot classification tasks. This stark comparison not only highlights the efficiency of Mamba models but also positions them as a viable solution for resource-constrained environments. 𝘌𝘹𝘵𝘦𝘯𝘴𝘪𝘷𝘦 𝘌𝘷𝘢𝘭𝘶𝘢𝘵𝘪𝘰𝘯: The research involved a rigorous evaluation process where various sizes of Mamba models were tested across 26 zero-shot classification datasets and 16 out-of-distribution (OOD) datasets. The results demonstrate not only versatility but also superior performance particularly in scenarios involving OOD image contrasts or when subjected to high-pass filtering, showcasing their robustness against diverse and challenging conditions. 𝘈𝘯𝘢𝘭𝘺𝘵𝘪𝘤𝘢𝘭 𝘐𝘯𝘴𝘪𝘨𝘩𝘵: Despite their impressive performance, the Hessian analysis points out that Mamba models have a sharper and more non-convex optimization landscape compared to ViT models. This characteristic makes them somewhat more challenging to train, indicating areas where future research could focus to enhance training methodologies and model stability. This pioneering work opens new avenues for the application of state space models like Mamba in practical scenarios, extending beyond traditional image classifications to more complex, real-world tasks where efficiency and robustness are critical. Stay tuned for more updates on how these developments might reshape the landscape of machine learning applications! #AI #MachineLearning #DeepLearning #MambaModels #CLIP #ZeroShotClassification #OODGeneralization #TechInnovation #ArtificialIntelligence #ResearchUpdate Source code: https://lnkd.in/dyJ338Sy Source link: https://lnkd.in/dfRwQn6w