🚀 NVIDIA launches Llama-3.1-Nemotron-70B! https://lnkd.in/efahSbKW 📊 As of Oct 1, 2024, it's topping the charts: • Arena Hard: 85.0 • AlpacaEval 2 LC: 57.6 • MT-Bench: 8.98 🔑 Key points: • Outperforms other models across multiple benchmarks • Longer responses (avg 2199.8 characters) • Consistent performance (narrow confidence interval) 𝗛𝗲𝗿𝗲 𝗮𝗿𝗲 𝘀𝗼𝗺𝗲 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻 𝗮𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗘𝘃𝗮𝗹 𝗺𝗲𝘁𝗿𝗶𝗰𝘀 𝘂𝘀𝗲𝗱 𝑨𝒓𝒆𝒏𝒂-𝑯𝒂𝒓𝒅 is a challenging AI benchmark created from real user queries on Chatbot Arena. It uses the BenchBuilder pipeline to select 500 high-quality, diverse prompts that test language models on complex, real-world tasks across various domains. The process involves: * Question Source: Real user queries from Chatbot Arena (initially 200,000) * Question Selection Process: - BenchBuilder pipeline filters and evaluates queries - AI (GPT-4-Turbo) scores questions on 7 key qualities: - Specificity - Domain knowledge - Complexity - Problem-solving - Creativity - Technical accuracy - Real-world relevance - Topic modeling clusters similar queries - High-quality clusters are selected - 500 challenging prompts sampled for final benchmark Evaluation uses an LLM-as-judge approach, comparing model outputs to a baseline. This method provides a more comprehensive, updatable, and cost-effective evaluation than previous benchmarks, better separating top models and aligning well with human judgments. Arena-Hard Paper: https://lnkd.in/eh_hK7NK Github: https://lnkd.in/e9qBbd3E #NVIDIA #AI #LLM #TechInnovation #LLM
Sayed Raheel Hussain’s Post
More Relevant Posts
-
Ever imagine an AI that could not only chat with you about Shakespeare’s sonnets but also critique the artistry in your vacation photos? Enter NVIDIA’s latest marvel, the NVLM-D-72B model, a veritable Swiss Army knife in the AI world, blending text and visuals with the ease of a seasoned bartender mixing your favorite cocktail 🍹. NVIDIA’s brainchild, the NVLM-D-72B, isn’t just smart—it’s a smart aleck that can understand pictures and words together 📸📖. Picture this: an AI that can look at a doodle and not only tell you what it sees but also spin up a story about it! Trained on NVIDIA’s ultra-sophisticated Megatron-LM and cozying up with Hugging Face, this model is like having a supercomputer in your pocket, only it doesn’t weigh you down. Whether you’re looking to spice up your meme game or develop an app that finally understands fashion advice from a picture, NVLM-D-72B is your go-to 🚀. So, if you’re ready to ride the wave of AI innovation, NVIDIA’s got your ticket to the future, and it’s first class! Dive into the details on their Hugging Face page, and let’s get this techno-party started! 🎉 For a deeper dive and more giggles, check out the full tech extravaganza at NVIDIA's Hugging Face space: #AI #MachineLearning #DataScience #NVIDIA #Technology #Innovations #DeepLearning #AITransformation
To view or add a comment, sign in
-
NVIDIA has silently released its new open-source model, Llama-3.1-Nemotron-70B, which is reported to surpass OpenAI's GPT-4o and Anthropic's Claude 3.5 in many crucial benchmarks with relatively lesser parameters Key Features of Llama-3.1-Nemotron-70B: - Can answer How many r's are in strawberry? accurately 😅 - Performance: The model has performed well on several alignment benchmarks, for example: - Arena Hard: 85.0 - AlpacaEval 2 LC: 57.6 - GPT-4-Turbo MT-Bench: 8.98 These scores show that the Nemotron model does not only outperform but also beats the performance of its larger counterparts, which have more parameters. - Architecture: The Nemotron model is built on the Llama 3.1 framework and uses transformers to generate coherent responses. It has relatively modest 70 billion parameters but is quite efficient and capable in the processing of user inquiries. - Open Source Availability: The model and its corresponding reward model with its training dataset are open sourced and are available on the Hugging Face so there's a scope for us to experiment with the model - RLHF (Reinforcement Learning from Human Feedback) : This model was trained on RLHF techniques that relied on the REINFORCE algorithm to optimize the best performance based on human preference. If you are interested in exploring or trying out this model further, preview the official website of NVIDIA. #AI #NVIDIA #Llama #MachineLearning #OpenSource #Innovation https://lnkd.in/gd-_p_rP
nvidia model by llama-3_1-nemotron-70b-reward | NVIDIA NIM
build.nvidia.com
To view or add a comment, sign in
-
🚀 Breakthrough in Audio Transcription Speed! 🎙️ Imagine transcribing a feature-length movie in less time than it takes to make a cup of coffee. That's now possible with 'insanely-fast-whisper', a game-changing GitHub project. Key highlights: • Transcribe 2.5 hours of audio in just 98 seconds • Works locally on Mac or Nvidia GPUs • Combines Whisper + Pyannote for rapid transcription and speaker segmentation For the tech-savvy, here's a quick setup: 1. pip install insanely-fast-whisper 2. Run with your file and settings This tool isn't just fast—it's revolutionizing how we process audio data. Think about the implications for: • Journalists transcribing interviews • Researchers analyzing focus groups • Content creators captioning videos What would you do with local and near-instant transcription? Share your ideas below! 👇
Covering the latest in AI R&D • ML-Engineer • MIT Lecturer • Building AlphaSignal, a newsletter read by 200,000+ AI engineers.
You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.
To view or add a comment, sign in
-
This looks like it could be very useful. Thinking of the 1000s of scenarios. hopefully this can be automated, deployed and scaled beyond local. Repo: https://lnkd.in/ePmTCfsK
Covering the latest in AI R&D • ML-Engineer • MIT Lecturer • Building AlphaSignal, a newsletter read by 200,000+ AI engineers.
You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.
To view or add a comment, sign in
-
pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN>
Covering the latest in AI R&D • ML-Engineer • MIT Lecturer • Building AlphaSignal, a newsletter read by 200,000+ AI engineers.
You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.
To view or add a comment, sign in
-
You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.
To view or add a comment, sign in
-
Is having a background in AI imperative to get into Generative AI? Maybe Not.. Maybe not if you want to build an RAG pipeline. But most definitely yes if you want to Fine Tune LLMs. I consider myself lucky to have that background, so I was able to transition seamlessly into the task of Fine Tuning models but honestly, most work you might do with LLMs would involve building an RAG pipeline with Prompt Engineering. With Autogen, it has become even more easier to setup a RAG workflow. So do you need a background in AI to work with LLMs? - YES - if you need to Fine Tune models - NOT NECESSARILY - If you want to build an RAG pipeline. PS: If you have an NVIDIA GPU>= 4070 powering your laptop, you can just about Fine Tune your own LLM models using PEFT. Or use Google Collab to try it out. #llm #rag
To view or add a comment, sign in
-
A couple of days ago, Nvidia released a new breed of Attention-SSM hybrids called Hymba. Since their approach is really groundbreaking, I compiled a somewhat exhaustive but beginner-friendly summary: https://lnkd.in/eAUggsnt Long story short: They run Attention and Mamba in parallel at each layer, where Mamba serves as long-term memory, and Attention (mostly Sliding Window except on the First, Middle and Last Layer) as short-term memory with perfect recall. They also introduced Meta tokens which smooth out attention's softmax distribution to avoid attention sinks, and at the same time they bootstrap Mamba's internal state. To reduce memory requirements, they share the KV cache between two consecutive layers, bringing down the cache's memory requirement by 20x compared to a vanilla attention model. To make it even better, their 1.5B model trained only on 1.5T tokens still achieves State of the Art among sub 2B models, with nearly 3x higher throughput. #AI #LLM #SSM #Attention #NVIDIA
Hymba, a new breed of SSM-Attention Hybrids
n1o.github.io
To view or add a comment, sign in
-
Nemotron-4 340B Models: A Game-Changer in AI Performance and Accessibility 🚀 In a remarkable breakthrough, NVIDIA has unveiled the Nemotron-4 340B model family, which is set to revolutionize the AI landscape. These cutting-edge models, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward, have demonstrated exceptional performance across a wide range of evaluation benchmarks. What sets the Nemotron-4 340B models apart is their unique combination of state-of-the-art architecture, extensive pretraining data, and innovative alignment techniques. 💡 Key Highlights: 1. Nemotron-4-340B-Base boasts an impressive 9.4 billion embedding parameters and 331.6 billion non-embedding parameters, trained using the powerful NVIDIA Hopper architecture. 2. The models leverage a diverse pretraining data blend, encompassing English and multilingual natural language data, as well as source code from 43 programming languages. 3. Nemotron-4-340B-Reward plays a crucial role in the alignment process, serving as a judge for preference ranking and quality filtering, and achieving the highest accuracy on the RewardBench benchmark. The implications of these models are far-reaching. Nemotron-4-340B-Base excels in commonsense reasoning tasks, while Nemotron-4-340B-Instruct surpasses other instruct models in instruction following and chat capabilities. 🧠💬 Moreover, the Nemotron-4 340B models showcase the effectiveness of synthetic data generation, with over 98% of the alignment data being synthetically generated. This opens up new possibilities for creating high-quality, domain-adaptive data. 📈 The release of these models under the NVIDIA Open Model License Agreement is a testament to NVIDIA's commitment to advancing AI accessibility and collaboration. Researchers and developers now have the opportunity to leverage these powerful models, distribute their work, and contribute to the AI community. 🤝 What are your thoughts on the potential impact of the Nemotron-4 340B models on the AI industry? How do you see these models being applied in various domains? Share your insights in the comments below! 💬 Let's embrace this exciting development and explore the limitless possibilities that the Nemotron-4 340B models bring to the table. Together, we can shape the future of AI! 🌟 #Nemotron4 #AIBreakthrough #NVIDIAHopper
To view or add a comment, sign in
-
NVIDIA has released Nemotron-4 340B! This 340 billion parameter #LLM includes base, instruct, and reward models released under an open license agreement, and positioned for the use case of generating synthetic data for training smaller language models. NVIDIA notes that 98%+ of the data used in model alignment was synthetically generated. You can read all the details below! 📢Announcement : https://lnkd.in/gx-VJQux 🤓 Technical Report: https://lnkd.in/gsxjXNCW 🤗 Model on Hugging Face: https://lnkd.in/gyU5gGGb 🔍 Analysis on /r/LocalLLaMa from Daniel Han: https://lnkd.in/gmix7qHb #largelanguagemodels #generativeai
Leverage the Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B | NVIDIA Technical Blog
developer.nvidia.com
To view or add a comment, sign in