Fabrizio Billi’s Post

HealthTech Innovator. Professor, Department of Orthopaedic Surgery, UCLA. Director, Musculoskeletal Innovation Group (BiMIG), Co-Chair Digital Orthopaedic Conference San Francisco.

2mo

NVIDIA’s new open-source model, Nemotron-70B, surpasses GPT-4o and Claude 3.5 Sonnet with high scores in several benchmarks (Arena Hard, AlpacaEval 2 LC, MT-Bench) despite its relatively smaller 70B parameter size. Key innovations include RLHF (Reinforcement Learning from Human Feedback) with the REINFORCE algorithm and two custom reward models: Llama-3.1-Nemotron-70B-Reward, which evaluates response quality, and HelpSteer2-Preference Prompts, which guide responses based on detailed user feedback, ensuring quality and alignment with user preferences. https://lnkd.in/gAjKNwbm

NVIDIA NIM | llama-3_1-nemotron-51b-instruct

build.nvidia.com

To view or add a comment, sign in

More Relevant Posts

Graham Hosking

Snr. Technical Specialist @ Microsoft | Copilot & AI Solutions | Professional Services Industry
2mo
Report this post
Just catching up on the news, theres a new model from Nvidia. Their Llama-3.1-Nemotron-70B-Instruct model has quietly emerged as a formidable competitor to GPT-4, outperforming it in key benchmarks. This breakthrough showcases Nvidia's expanding prowess in AI software development, potentially reshaping the competitive landscape and offering businesses a powerful, cost-efficient alternative for AI applications. https://lnkd.in/eqjN34Mj

llama-3.1-nemotron-70b-instruct Model by NVIDIA | NVIDIA NIM

build.nvidia.com
Like Comment
To view or add a comment, sign in
Saksham Kapoor

Pursuing CS @ University of Maryland- College Park | AI4ALL Ignite Accelerator Fellow | Python Lead at BigThinkAI
2mo Edited
Report this post
Greetings to Fellow GenAI enthusiasts, After testing out NVIDIA's new LLM Nemotron-70B's preview at https://lnkd.in/gvvhvwui, I was excited to find that they have enabled customizing key LLM parameters which enable me to enhance the quality of responses based on purpose. Today, I'm excited to share my insights on how to tailor your LLM outputs for creativity, predictability, and diversity through parameter configurations. 1) Max number of tokens generated-> This may seem insignificant, but setting a low limit of tokens for simple questions can drastically save natural resources, as AI models reportedly consume 3 water bottles to generate every 100 words. 2) Temperature-> This determines the randomness of the model output. Lower temperatures lead to more predicable answers and are useful for responses that based on factual information such as the birth date of a famous person, a physics law, true/false question etc. Higher temperature are useful for tasks involving creativity, such as writing, designing a logo or motto for a company, treating AI as a person to converse with or asking for views on new policies from government. Low (0-0.25) Factual info (e.g., birth dates) Medium (0.25-0.75) Balanced creativity & diversity (e.g., essays) High (0.75-1) Highly imaginative & unpredictable outputs (e.g., artistic writing) 3) Top P-> Using this, you can control the diversity of the output. A higher top P value like 0.8 allows more choice of words to be considered while sampling, resulting in more diverse responses. Conversely, a lower top P value like 0.3, limits the choices and generates more focused responses. High (>0.5) Richer language (e.g., creative writing, improving a speech) Low (<0.5) Simpler responses geared towards easy comprehension (e.g., explaining complex topics, summarizing research paper) 4) Presence and Frequency Penalty-> Parameters to discourage the model from repeating words already in the generated response. By assigning a higher presence penalty value, such as 2.0, you can reduce the likelihood of repeating words. This is useful when you want to avoid certain content or bias in the generated text by over-fixating on a few words. Frequency penalty counts how many times the words has repeated too, penalizing more for additional counts of repetition, while Presence Frequency merely searches for presence, not frequency. 5) Seed-> Providing a new seed value generates a different response with same prompt, even if the choice of topics or words is similar to the previous response. This might seem not useful, but by changing the seed when analyzing a model's errors, you can consistently reproduce faulty outputs with different seeds, making it easier to track and address issues like bias or incorrect responses. I hope this information about LLM parameters helps you get more out of Generative AI. Feel free to share these useful tips with fellow connections. Happy Prompting! #GenerativeAI #LLMs #TechTrends #AI #NLP #NVIDIA

llama-3_1-nemotron-70b-instruct | NVIDIA NIM

build.nvidia.com
Like Comment
To view or add a comment, sign in
Rajeev Sharma

Enabler | Building production-ready AI / ML products | (We’re hiring!)
2mo
Report this post
NVIDIA Shakes Up the LLM Landscape! 🔥 NVIDIA just dropped a game-changer: Llama-3.1-Nemotron-70B-Instruct, an open-source LLM that's outperforming giants like OpenAI GPT-4 and Anthropic Claude 3.5 Sonnet! 🤯 Key highlights: • Built on Meta's Llama 3.1 70B, fine-tuned with NVIDIA's secret sauce • Crushing benchmarks: Arena Hard (85.0), AlpacaEval 2 LC (57.6), GPT-4-Turbo MT-Bench (8.98) • Beating the big players with just 70B parameters • Fully open-sourced on Hugging Face - model, reward model, AND training data! Why this matters: NVIDIA's not just dominating chips anymore - they're revolutionizing LLMs. Nemotron proves that lean, mean, open-source machines can compete with the heavyweights. Try it yourself: https://lnkd.in/d-FisA4x What do you think? Let's discuss it! 👇 #AI #MachineLearning #NVIDIA #OpenSource #TechInnovation

llama-3.1-nemotron-70b-instruct Model by NVIDIA | NVIDIA NIM

build.nvidia.com

1 Comment
Like Comment
To view or add a comment, sign in
SHARMITH RAMESH

Information & Cyber Security Risk Management Professional | IT-OT Security Architect | AI & Emerging Tech Advocate | Cyber Security Consultant & Advisor | Compliance & Data Protection Expert
2mo
Report this post
llama-3.1-nemotron-70b-instruct Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses. HHH objective of RLHF= Helpfulness, Harmlessness & Honesty As per Nvidia claim, as of 1 Oct 2024 this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet Note: As per documentation, this model is not yet been fine tuned for performance in specialized domains such as math. Test the inference with this new model from the below link. https://lnkd.in/egf3UpH2

llama-3.1-nemotron-70b-instruct Model by NVIDIA | NVIDIA NIM

build.nvidia.com
Like Comment
To view or add a comment, sign in
Pierpaolo Pascarella 🚀

I Help Automate Your Sales, Marketing, and Customer Support.
2mo
Report this post
🚨 NEMOTRON-70B TAKES THE LEAD 🏎️ ⠀ 💥 NVIDIA has just dropped a bombshell in the AI world with its new Nemotron-70B, an open-source LLM that beats both OpenAI's GPT-4o and Anthropic's Sonnet 3.5. ⠀ Built on Llama 3.1, this 70B-parameter model has smashed key benchmarks like Arena Hard, AlpacaEval 2 LC, and MT-Bench, outperforming larger models with ease. ⠀ This move signals NVIDIA’s push to dominate the AI software and hardware space—open-source AI is the perfect vehicle for that. ⠀ Nemotron-70B is already leading the generative AI race. ⠀ Jensen Huang is cooking. 🔥 #AI #NVIDIA #LLM #Automation #AIAutomation
Like Comment
To view or add a comment, sign in
何京

英伟达 - social media manager
1mo
Report this post
Perplexity uses NVIDIA NeMo to create an AI-powered search engine✨ Learn how NeMo made it easier for them to: ✅scale the fine-tuning of #LLMs ✅create custom models for their online answer engine ✅develop a new Sonar training model with a 20% improvement over the base model Read the customer story to learn more 🔗https://nvda.ws/3ZdHvNm

Perplexity Enhances Model Performance for AI-Powered Search Engines With NVIDIA NeMo

nvidia.com
Like Comment
To view or add a comment, sign in
Serge Palaric
1mo
Report this post
Perplexity uses NVIDIA NeMo to create an AI-powered search engine✨ Learn how NeMo made it easier for them to: ✅scale the fine-tuning of #LLMs ✅create custom models for their online answer engine ✅develop a new Sonar training model with a 20% improvement over the base model Read the customer story to learn more 🔗https://nvda.ws/3ZdHvNm

Perplexity Enhances Model Performance for AI-Powered Search Engines With NVIDIA NeMo

nvidia.com
Like Comment
To view or add a comment, sign in
Tomasz Bednarz

Director, Strategic Researcher Engagement at NVIDIA | PhD MBA
1mo
Report this post
Perplexity uses NVIDIA NeMo to create an AI-powered search engine✨ Learn how NeMo made it easier for them to: ✅scale the fine-tuning of #LLMs ✅create custom models for their online answer engine ✅develop a new Sonar training model with a 20% improvement over the base model Read the customer story to learn more 🔗https://nvda.ws/3ZdHvNm

Perplexity Enhances Model Performance for AI-Powered Search Engines With NVIDIA NeMo

nvidia.com
Like Comment
To view or add a comment, sign in
Featurepreneur

859 followers
9mo
Report this post
BOOM! Get ready to blast through the barriers of LLM quantization with a bang! 💥 So, you've heard of those massive language models, right? They're like the big, beefy superheroes of the AI world, but boy, do they have a craving for GPUs! It's like they've got an insatiable appetite for computational power, leaving us mere mortals scrambling to keep up. But fear not, for FlattenQuant is here to save the day! 🦸♂️ This game-changing technique is like the superhero sidekick we've been waiting for. It swoops in and squishes those ginormous channels in tensor data, making them ultra-compact and ultra-efficient. The result? Less GPU memory gobbling and lightning-fast computations—talk about a speed boost! And here's the kicker: it's all about the magic number 4! That's right, just 4 measly bits are all it takes to revolutionize linear layer operations. By strategically allocating 4 bits for a big chunk of the workload (almost half!), and 8 bits for the rest, FlattenQuant achieves the impossible: maximum efficiency with minimal accuracy loss. 2024 is already bursting at the seams with breakthroughs, and FlattenQuant is leading the charge! Who says innovation has to slow down? With heroes like FlattenQuant on our side, the sky's the limit! 🚀 #Featurepreneur #LLM #AI #Coding
Like Comment
To view or add a comment, sign in
Claudio Polla

NVIDIA Telco Solutions - UKI & Africa
1mo
Report this post
Perplexity uses NVIDIA NeMo to create an AI-powered search engine✨ Learn how NeMo made it easier for them to: ✅scale the fine-tuning of #LLMs ✅create custom models for their online answer engine ✅develop a new Sonar training model with a 20% improvement over the base model Read the customer story to learn more 🔗https://nvda.ws/3ZdHvNm

Perplexity Enhances Model Performance for AI-Powered Search Engines With NVIDIA NeMo

nvidia.com
Like Comment
To view or add a comment, sign in

2,411 followers

View Profile Connect

Fabrizio Billi’s Post

NVIDIA NIM | llama-3_1-nemotron-51b-instruct

build.nvidia.com

More from this author

Beyond Efficiency: How AI Helps Us Be Better Humans and Healthcare Professionals.

AI as a Catalyst for Human Creativity: Expanding the Boundaries of Innovation

The Adjacent Possible: Exploring Frontiers in Multi-Agent Systems and AI Co-pilots

Explore topics