NVIDIA released their open model Nemotron-4 340B, which comes close to GPT-4 in some benchmarks. 98% of the data used to train its Instruct model is synthetic. The interesting thing is that they are not positioning as a competitor to other open models like Llama-3. They are instead positioning it as a tool to help other developers to train better or more models in different domains. https://lnkd.in/gH-HCqC7
Kim Kuhlman, PhD’s Post
More Relevant Posts
-
What’s the interface for consumers? What differentiates their data and how it was filtered and used? Models are quickly becoming commonplace commodities. The frontier is integrated AI-powered systems built with continuous learning, data dignity, and bespoke UX.
NVIDIA just released an open-source LLM on par with GPT 4. It is completely open, including the training data and model weights. https://lnkd.in/gnAXeTPx NVLM: Open Frontier-Class Multimodal LLMs https://lnkd.in/g5QBaCvc https://lnkd.in/gQ5zvDEn #NVIDIA #meagatron #NVLM #multimodal
Nvidia just dropped a bombshell: Its new AI model is open, massive, and ready to rival GPT-4
https://venturebeat.com
To view or add a comment, sign in
-
Hey #devs, check out this technical blog for a new way to use NVIDIA's Nemotron-4 340B model for Synthetic Data Generation (SDG)! https://lnkd.in/gJaU-ivG #developer #learning #llm #gpt #mlops #syntheticdata #data #datascience #llmops #ai #ml #nemotron #nemo
Leverage the Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B | NVIDIA Technical Blog
developer.nvidia.com
To view or add a comment, sign in
-
NVIDIA released a new llama 3 finetune which is trained specifically for RAG tasks. It beats even gpt-4 on retrieval augmented generation oriented benchmarks. Even their 8B parameters model is almost as good as gpt-4. This is really good for basically everyone. As RAG is one of the most profitable applications of LLMs. https://lnkd.in/e728hc2T
nvidia/Llama3-ChatQA-1.5-8B · Hugging Face
huggingface.co
To view or add a comment, sign in
-
Nvidia unveils NVLM 1.0: a new family of open-source multimodal language models 🔍 Key highlights: - Open-source models with performance comparable to GPT-4 and Claude 3.5 - Model weights available under *Creative Commons Attribution Non Commercial 4.0 International* license (commercial use not allowed) - Hybrid architecture to process a variety of inputs: text, images, memes, math problem-solving, etc. - Outperforms Llama 3.2 90B on several benchmarks despite having fewer parameters. 💻 Technical highlights: - Flagship model NVLM-D-72B - Outstanding results on benchmarks such as MathVista (65.2), OCRBench (852), and VQA - Significant improvement in text-only performance after multimodal training (+4.3 points on average) - Supports large-scale deployments with multi-GPU distributed loading Nvidia continues to push the boundaries of AI by combining vision and language with this next-generation model.
To view or add a comment, sign in
-
Read my latest blog on: How can you determine the amount of GPU memory required to run a Large Language Models (LLMs)? #artificialintelligence #llms #nvidia #gpu #gpumachine #generativeai
How can you determine the amount of GPU memory required to run a Large Language Models (LLMs)?
medium.com
To view or add a comment, sign in
-
𝐍𝐕𝐈𝐃𝐈𝐀 𝐨𝐮𝐭𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐬 𝐆𝐏𝐓4𝐨 𝐚𝐧𝐝 𝐒𝐨𝐧𝐧𝐞𝐭 3.5 NVIDIA has released a fine-tuned version of Llama 3.1 70B that outperforms both OpenAI's GPT4o and Anthropic's Sonnet 3.5 on multiple benchmarks. This model was trained using Reinforcement Learning from Human Feedback (RLHF), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as initial policy. NVIDIA models are now available worldwide under a Llama 3.1 Open Source license on Hugging Face, and according to it, the Llama-3.1-Nemotron-70B-Reward-HF model is "#1 on all three automatic alignment benchmarks, edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet." as o October 1st. 🔗 Hugging face repository: https://lnkd.in/dqexeux6 🔗 Paper: https://lnkd.in/db7YfEDP
To view or add a comment, sign in
-
The Predibase Inference Engine's GPU autoscaling offers a practical way to reduce deployment costs for serving small language models. Always-on GPU setups incur expenses whether they’re needed or not, while autoscaling adjusts resources based on demand. For example, an enterprise workload that would cost over $213,000/year with an always-on deployment can drop to under $155,000/year with autoscaling—saving nearly 30% with no impact on performance. (And both options are still more affordable than running fine-tuned GPT-4o-mini.) Autoscaling ensures you get the performance you need when you need it, without paying for idle infrastructure. Learn more about how our Inference Engine can streamline your SLM deployments: https://pbase.ai/3YqHu8x
To view or add a comment, sign in
-
NVIDIA just released Nemotron-4, and it outperformed GPT-4. Key Highlights: ✅ The big unlock is the synthetic data generation, which mimicks real world data, and allows for training and fine tuning LLMs 🔓 Nemotron-4 democratizes LLM training, enabling organizations to rapidly develop custom language models tailored to their specific domains and use cases, without the high costs and challenges of acquiring extensive datasets. Read the announcement below #NVIDIA #SyntheticData #LLMs #GenerativeAI https://lnkd.in/eRVavfD7
NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models
blogs.nvidia.com
To view or add a comment, sign in
-
Here are some non-obvious reasons why Small Language Models could be more useful than you thought: 1. We can get better performance using test-time compute scaling, which means for the similar performance as a larger model, you can use smaller models with more test-time compute. The total amount of compute could be the same, but you don't need fancy GPUs with large amounts of memory. Perhaps your MacBook could do wonders one day. Noam Brown, for example, found that for Poker AI, you have to scale the model MUCH bigger in order to get the same performance as a small model with a longer (30 seconds) test-time compute budget. The corollary should be obvious here. It doesn't hold just for Poker AI. The title of https://lnkd.in/gwuzzGVv ("Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters") should be self-explanatory. 2. Smaller models could be better than larger models when generating data for fine-tuning. This is explored in https://lnkd.in/g9FCSJ3G, where they see higher coverage and diversity in the data generated by smaller models, which could be helpful.
arxiv.org
To view or add a comment, sign in
-
🚀 Excited to share my latest project for the Generative AI Agents Developer Contest by NVIDIA and LangChain! 🌟 Introducing NVIDIA LangChain DocsAgent – an RAG agent that you can ask questions related to retrieved API and framework documentation. This tool is powered by NVIDIA's cutting-edge AI models and the LangChain🦜️🔗 framework and deployed with LangServe🦜️🏓 and LangSmith🦜🛠️ for chain trace monitoring. 📚🤖 🔍 Key Features: - Natural language querying of documentation - Efficient document retrieval and ranking - Customizable database creation with retrieved data 🔧 Built With: - NVIDIA NIM API (nv-rerank-qa-mistral-4b): Re-ranks retrieved documents based on relevance to the question. - NVIDIA NIM API (nvidia/embed-qa-4): Generates high-quality embeddings for documentation chunks for use in similarity searches. - NVIDIA NIM API (meta/llama3-70b): Provides the LLM that generates coherent responses. - langchain_nvidia_ai_endpoints: Integrates NVIDIA AI models with LangChain for embedding generation, document ranking, and LLM interactions. - LangChain: Core framework for building the RAG agent, facilitating smooth integration between document retrieval, prompt crafting, and response generation. - LangSmith: Provides advanced monitoring and analytics for tracing and debugging interactions within the RAG agent. - LangServe: Deploys and manages the agent as a web service, handling user queries and interacting with the Chroma database. GitHub repo: https://lnkd.in/gd7fjrKp 🎥 Check out this demo video to see the DocsAgent in action! #NVIDIADevContest #LangChain NVIDIA AI
To view or add a comment, sign in