Ben's Bites’ Post

View organization page for Ben's Bites, graphic

1,935 followers

5mo

Mistral and Nvidia drop a new open-source LLM that's small but mighty Mistral AI's latest AI model, NeMo, is here to shake things up. It's small but mighty, packing a 128k context window and some impressive multilingual skills. Oh, and it's open-source. Time to get coding! What does this mean? Mistral NeMo is kinda big (12B parameters) compared to its peers (Gemma 2 9B and Llama 3 8B). But Mistral thinks that local machines can still run it and get actual stuff done, not just use open-source as a play toy. It's got a massive 128k token context window (that's a lot of room for chat) Performs like a champ on reasoning, knowledge, and coding tasks Speaks a bunch of languages fluently (not just English, folks) Uses a fancy new tokenizer called Tekken that's super efficient with different languages and code Comes in both pre-trained and instruction-tuned flavors Licensed under Apache 2.0, so it's free for research and commercial use Oh, and it's quantization-aware, meaning you can run it in FP8 without losing performance. Nerdy, but cool. It’s available at the usual places like HuggingFace and Mistral’s La Plateforme, as well as as a package on NVIDIA’s NIM microservice. Why should I care? If you're into AI (and who isn't these days?), this is big news. Mistral NeMo brings near top-tier performance in a smaller, more efficient package. This means: Easier and cheaper to run for smaller companies and researchers Better multilingual support for global applications Potential for more diverse and creative AI applications due to its open-source nature. For developers, it's a drop-in replacement for Mistral 7B, so upgrading should be a breeze. And for the open-source AI community, it's another step towards democratizing powerful language models. Time to play with some new toys! Read more here: https://lnkd.in/eMvVWbT7

Mistral NeMo

mistral.ai

To view or add a comment, sign in

More Relevant Posts

Yogesh Patil

Strategic Partnerships Lead | Ecosystem Builder | Market Expansion | Gen AI | Deep Learning | Community Grass Root Leader
5mo
Report this post
Use linear programming and LLMs to harness the power of AI agents to improve optimization. ➡️ https://nvda.ws/4dfHNrb The cuOpt AI agent is built using multiple hashtag #LLM agents and acts as a natural language front end to cuOpt, enabling you to transform natural language queries to code and to an optimized plan seamlessly.

Building an AI Agent for Supply Chain Optimization with NVIDIA NIM and cuOpt | NVIDIA Technical Blog

developer.nvidia.com
Like Comment
To view or add a comment, sign in
Sarath Chandra Nalluri

Sr Applied AI Engineer @ NVIDIA | Expert in Deep Learning, NLP, LLMs and Generative AI
6mo Edited
Report this post
NVIDIA Releases Nemotron-4 340B Model Family & Synthetic data generation pipeline 𝐌𝐨𝐝𝐞𝐥𝐬 𝐀𝐯𝐚𝐢𝐥𝐚𝐛𝐥𝐞: - Nemotron-4-340B-Base - Nemotron-4-340B-Instruct - Nemotron-4-340B-Reward 𝐋𝐢𝐜𝐞𝐧𝐬𝐞: - Under NVIDIA Open Model License Agreement 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬: - Generating synthetic data to train smaller/domain specific language models - Research studies & Commercial applications 𝐎𝐩𝐞𝐧-𝐒𝐨𝐮𝐫𝐜𝐞𝐝: - Models - Pretraining code - Alignment strategies - Reward Model Training code - Synthetic data generation pipeline (Links in comments) 𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬: 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭: - Fits on a single DGX H100 with 8 GPUs in FP8 precision - All models available on #HuggingFace 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠: - 9 trillion tokens - Sequence length of 4096 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞: - Decoder-only Transformer - Rotary Position Embeddings (RoPE) - Grouped Query Attention (GQA) - Squared ReLU activation - No dropout & bias - Untied embeddings 𝐀𝐥𝐢𝐠𝐧𝐦𝐞𝐧𝐭: - Supervised Fine-Tuning (SFT) - Preference Fine-Tuning (RLHF, DPO) 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬: Nemotron-4-340B-Base: - Competitive with Llama-3 70B, Mixtral 8x22B, Qwen-2 72B - Excels in commonsense reasoning (ARC-Challenge, MMLU, BigBench Hard) Nemotron-4-340B-Instruct: - Surpasses Llama-3 70B, Mixtral 8x22B, Qwen-2 72B in instruction following and chat capabilities Nemotron-4-340B-Reward: - Top accuracy on RewardBench (Allen AI, 2024) - Surpasses proprietary models (GPT-4o-0513, Gemini 1.5 Pro-0514) https://lnkd.in/g4mQ5SJM #ai #artificialintelligence #research #llm #generativeai #nvidia #nlp #naturallanguageprocessing #nlproc

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

blogs.nvidia.com

1 Comment
Like Comment
To view or add a comment, sign in
Raju G

Machine Learning Engineer | Data Scientist | Gen AI ML Engineer
5mo
Report this post
🚀 Exciting News in AI! 🚀 Mistral AI, in collaboration with NVIDIA, has unveiled the NeMo 12B model, boasting a massive context window of 128,000 tokens. This state-of-the-art model excels in reasoning, world knowledge, and coding accuracy. Here are some highlights: Multilingual Capabilities: Trained on over 100 languages, including English, French, German, and more. Advanced Tokenization: The new "Tekken" tokeniser offers 30% better compression for source code and major languages, outperforming other models in 85% of languages. Quantisation Awareness: FP8 inference is supported without sacrificing performance. Open-Source Availability: Pre-trained and instruction-tuned checkpoints are available under Apache 2.0, encouraging widespread adoption and research. Ready to revolutionize your AI projects? Mistral NeMo is available on HuggingFace and NVIDIA's platforms for seamless integration. link: https://lnkd.in/gnC4RANM

Mistral NeMo

mistral.ai
Like Comment
To view or add a comment, sign in
Sumit Kumar

Fellow, PW School of Startups
6mo
Report this post
It's fascinating to see the advancements in AI, particularly with NVIDIA's Large Language Model (LLM) and its potential applications in generative AI, chatbots, and other natural language processing tasks. The NVIDIA LLM is a significant development in the field of artificial intelligence, offering capabilities for retrieval-augmented generation and the creation of sophisticated chatbots for various applications. With the availability of new software tools, it's becoming easier to develop and deploy AI models, including those utilizing NVIDIA's LLM. The LLM's accessibility and potential for creating new content using existing patterns make it an exciting prospect for AI enthusiasts and developers. NVIDIA's ongoing innovations, such as the recent announcements of new robotics products at GTC 2024, further highlight the company's commitment to advancing AI technologies. Try the new Nvidia LLM here: https://lnkd.in/givMEiHR I will look into implementing it in a bot in the future. Read more about it here: https://lnkd.in/g62KKWYW #NVIDIAAI #LLM #GenerativeAI #Chatbots #AIInnovation

nvidia/Nemotron-4-340B-Base · Hugging Face

huggingface.co
Like Comment
To view or add a comment, sign in
Paul Chu

Project Manager - Healthcare | GenAI | IT Infrastructure | PgMP | PMP | PMI-ACP
6mo
Report this post
Are you fascinated by the potential of large language models (LLMs) but unsure where to begin? This NVIDIA blog post offers a fantastic introduction to building your first LLM agent application! 🚀 In the post, you'll learn about the four key components of an LLM agent: agent core, memory module, agent tools, and planning module. You'll also discover helpful resources to get you started, including beginner-friendly tutorials and a recommended reading list. 📚 Whether you're aiming to build a question-answering agent, a multi-modal agent, or a swarm of agents, this post is a valuable resource to kickstart your LLM agent development journey. #LLM #AI #MachineLearning #NVIDIA https://lnkd.in/g7Tw7v4y

Building Your First LLM Agent Application | NVIDIA Technical Blog

developer.nvidia.com

2 Comments
Like Comment
To view or add a comment, sign in
khalil Ur rehman

AI/ML Engineer | Creative AI Lead @ Visto Agency
6mo
Report this post
NVIDIA AI Introduces Nemotron-4 340B: A Family of Open Models that Developers can Use to Generate Synthetic Data for Training Large Language Models (LLMs) NVIDIA has recently unveiled the Nemotron-4 340B, a groundbreaking family of models designed to generate synthetic data for training large language models (LLMs) across various commercial applications. This release marks a significant advancement in generative AI, offering a comprehensive suite of tools optimized for NVIDIA NeMo and NVIDIA TensorRT-LLM and includes cutting-edge instruct and reward models. This initiative aims to provide developers with a cost-effective and scalable means to access high-quality training data, which is crucial for enhancing the performance and accuracy of custom LLMs. The Nemotron-4 340B includes three variants: Instruct, Reward, and Base models, each tailored to specific functions in the data generation and refinement process. ✅ The Nemotron-4 340B Instruct model is designed to create diverse synthetic data that mimics the characteristics of real-world data, enhancing the performance and robustness of custom LLMs across various domains. This model is essential for generating initial data outputs, which can be refined and improved upon. ✅ The Nemotron-4 340B Reward model is crucial in filtering and enhancing the quality of AI-generated data. It evaluates responses based on helpfulness, correctness, coherence, complexity, and verbosity. This model ensures that the synthetic data is high quality and relevant to the application’s needs. ✅ The Nemotron-4 340B Base model serves as the foundational framework for customization. Trained on 9 trillion tokens, this model can be fine-tuned using proprietary data and various datasets to adapt to specific use cases. It supports extensive customization through the NeMo framework, allowing for supervised fine-tuning and parameter-efficient methods like low-rank adaptation (LoRA). Full read: https://lnkd.in/dfCWrGGD Technical report: https://lnkd.in/dr8rrqC3 Models: https://lnkd.in/dPqEH7yx

NVIDIA AI Introduces Nemotron-4 340B: A Family of Open Models that Developers can Use to Generate Synthetic Data for Training Large Language Models (LLMs)

https://www.marktechpost.com
Like Comment
To view or add a comment, sign in
Brian Collier PhD MBA

SVP Head of Digital Product and High-Tech Vertical
6mo
Report this post
In a groundbreaking move, NVIDIA has unveiled Nemotron-4 340B, a family of open models revolutionizing the generation of high-quality synthetic data for training large language models tailored to commercial applications. Boasting an impressive 340 billion parameters and trained on a vast 9 trillion tokens spanning diverse text, multilingual data, and programming languages, Nemotron-4 340B surpasses other open models and even rivals GPT-4 on various benchmarks. The key strength of Nemotron-4 340B lies in its comprehensive pipeline for synthetic data generation, featuring base, instruct, and reward models. Noteworthy is the instruct model's supervised fine-tuning and preference optimization using human-annotated and synthetic data, while the reward model leads the RewardBench leaderboard for evaluating response quality across attributes like helpfulness and coherence. By open-sourcing the entire synthetic data generation pipeline, NVIDIA is democratizing AI, making it more accessible to businesses of all sizes. This groundbreaking offering holds promise for various industries, including healthcare, finance, manufacturing, retail, and technology. Setting Nemotron-4 340B apart from cloud provider offerings like Azure and AWS is its open and commercially-friendly licensing, the ability to generate synthetic training data tailored to specific use cases, efficient inference optimized for NVIDIA's DGX H100 systems, and strong performance rivaling proprietary models. With Nemotron-4 340B, NVIDIA is enabling enterprises to develop and deploy custom large language models without being tied to a cloud provider's ecosystem or pricing model. This innovative offering is set to fuel AI innovation, empowering businesses to unlock new possibilities and gain a competitive edge.

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

blogs.nvidia.com
Like Comment
To view or add a comment, sign in
VAIBHAV SHANKAR SHARMA

Serial Entrepreneur skilled in Product Innovation, on a secret mission to make the future secure for people around the globe. Expert in Fintech, Marketing, and Beyond.
2w
Report this post
NVIDIA AI Introduces NVILA: A Family of Open Visual Language Models VLMs Designed to Optimize both Efficiency and Accuracy Visual language models (VLMs) have come a long way in integrating visual and textual data. Yet, they come with significant challenges. Many of today’s VLMs demand substantial resources for training, fine-tuning, and deployment. For instance, training a 7-billion-parameter model can take over 400 GPU days, which makes it inaccessible to many researchers. Fine-tuning is equally demanding, often requiring over 64GB of GPU memory, far exceeding what consumer hardware can handle. Deploying these models in environments with limited computational resources, such as edge devices or robotics, is another hurdle. These limitations highlight the urgent need for VLMs that are not only powerful but also efficient and scalable. To tackle these challenges, NVIDIA has introduced NVILA, a family of open VLMs designed with efficiency and accuracy in mind. Building on the VILA model, NVILA adopts a “scale-then-compress” approach. This method increases spatial and temporal resolutions to preserve details in visual inputs and then compresses them into fewer, denser tokens. This combination allows NVILA to handle high-resolution images and long video sequences effectively. NVILA’s design optimizes every stage of the model lifecycle. It reduces training costs by 4.5×, cuts fine-tuning memory requirements by 3.4×, and improves inference speeds by 1.6 to 2.8× compared to other VLMs. Importantly, these gains do not come at the expense of accuracy. NVILA performs on par with or better than many benchmarks, excelling in visual question answering, video understanding, and document processing tasks. NVIDIA also plans to release NVILA’s code and models, fostering greater accessibility and reproducibility. Technical Details At the heart of NVILA’s efficiency is its “scale-then-compress” strategy. Spatial scaling increases image resolutions to dimensions like 896×896 pixels, compared to the usual 448×448. To mitigate the computational cost of scaling, NVILA uses token compression to retain essential information while reducing the number of tokens. For video inputs, the model processes more frames by applying temporal compression, balancing accuracy and computational efficiency. NVILA incorporates further innovations to streamline training and fine-tuning. Techniques like FP8 mixed precision and dataset pruning accelerate training and lower memory usage. Adaptive learning rates and parameter-efficient fine-tuning ensure the model can handle domain-specific tasks without excessive resource demands. During deployment, NVILA uses advanced quantization—W8A8 for the vision tower and W4A16 for language components—to speed up inference while maintaining performance. Performance Highlights NVILA’s value lies in making advanced VLMs more accessible while addressing the need for efficient AI systems. Some key metrics include: Training Efficiency: NVILA...

1 Comment
Like Comment
To view or add a comment, sign in
Ankit Abhishek

Certified Data Engineer | Data Science Enthusiast | Former Mobile Application Developer @ Tech Mahindra | Passionate About Building Scalable Data Solutions
1mo
Report this post
🚀 Breaking Through Bottlenecks in LLM Inference with Mistral.rs! As large language models (LLMs) continue to evolve, one challenge keeps growing: the need for speed. LLMs bring immense potential, but they often require substantial computational resources, impacting both cost and user experience—especially in time-sensitive scenarios. Enter Mistral.rs, a new platform designed for faster, more accessible LLM inference without sacrificing accuracy. Here’s why it's a game-changer: 🔹 Device Compatibility: Supports a variety of devices—from high-end GPUs to CPUs and even Apple silicon. 🔹 Quantization for Efficiency: Mistral.rs uses GGML and GPTQ techniques to reduce model size and boost speed while retaining high accuracy. 🔹 Memory Optimization: Features like continuous batching and PagedAttention handle large datasets more efficiently, minimizing out-of-memory issues. On an A10 GPU, for example, Mistral-7b hits 86 tokens/second with 4_K_M quantization—a remarkable boost! For developers and data engineers focused on deploying LLMs at scale, Mistral.rs offers a streamlined, cost-effective solution that bridges the gap between performance and practicality. The platform’s API compatibility with OpenAI makes integration a breeze. Curious to explore more on how Mistral.rs can transform real-world AI applications? Let’s discuss! #AI #LLM #DataEngineering #MachineLearning #Inference

Mistral.rs: A Fast LLM Inference Platform Supporting Inference on a Variety of Devices, Quantization, and Easy-to-Use Application with an Open-AI API Compatible HTTP Server and Python Bindings

https://www.marktechpost.com
Like Comment
To view or add a comment, sign in
Jigar Halani

Director - Solution Architect & Engg. at NVIDIA | Hiring | Twitter: jigarhalani3
5mo Edited
Report this post
Created by #Mistral AI and #NVIDIA, #Mistral #NeMo #12B #NIM is an advanced open language model for chatbots, #multilingual tasks, #coding and #summarization.

Mistral AI and NVIDIA Unveil Mistral NeMo 12B, a Cutting-Edge Enterprise AI Model

blogs.nvidia.com
Like Comment
To view or add a comment, sign in

1,935 followers

View Profile Follow

Ben's Bites’ Post

Mistral NeMo

mistral.ai

More from this author

🍿Quick Bites: Sequoia— Act two of Generative AI

Explore topics