Yann LeCun’s Post

7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ

293 Comments

Cesar Garcia

Independent researcher at La Hora Maker

7mo

I am quite happy about the release of Llama 3. The only thing I hoped it would change (and make Llama 3 much more useful) is this clause of the license: You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof). So, I thing we should stick to Mistral series to generate any kind of open sourced synthetic datasets.

27 Reactions

Karan Sachdeva

IBM AWS Global Strategic Alliance Leader for AI and Data @ IBM

7mo

Just explained llama 3 value prop to a customer-Imagine Llama3 as a highly skilled librarian who has read an enormous amount of books- -1.5 trillion to be exact! This librarian comes in two versions: one can juggle 8 billion books (simpler version) and the other 70 billion books (more complex version) allowing them to pull from a vast amount of knowledge to answer questions. The librarian has a memory that can retain the context of up to 8,000 books at once, helping them remember and connect details over a long conversation.

19 Reactions

Sai Tulasi Kolapudi

Data Scientist at JLL

7mo

Thank you Yann . Why there isn't a comparison with GPT 4? I can bet this is going to make new waves in the open-source community and hugging face is soon going to be flooded with its fine-tuned versions. 😃

8 Reactions

EMANUELE IACCARINO

7mo

Someone know how to offload the model downloaded in Hugging Face? The model it's too heavy for my computational resource

1 Reaction

Tarek Khalil (TK)

I help sales teams succeed using #WhatsApp 🚀📊📈

7mo

Groq supporting it soon? 🙏🙏🙏

4 Reactions

Dean Horak

Sr Software Engineer at Leidos, certified AI/Machine Learning developer

7mo

I’m a bit disappointed as it is only trained on data as late as December 2022 and does not have internet access. That is quite stale data as compared to the leading LLMs.

5 Reactions

Anshuman (Ansh) Pandey

Co-founder and CEO | Driving GenAI adoption at Enterprises

7mo

Sire, you are a legend! 🫡 P.S. Llama 3 live on Tune AI

20 Reactions

Dmitriy Kalyada

🚀 Founder | 🧠 @ Allessent <-> AI Ecosystem Architect | 👗💻 Digital Fashion Disruptor

7mo

🏎️ The Llama family is a key GenAI driver. Since the first LLama family models, it has also served as a symbol of responsible AI in the industry, being an open-source platform. 🦙🏆 Llama3 has raised the bar for research and development in the LLM-based product domain, pushing efficiency to new heights and setting a new standard for the field.

6 Reactions

Konstantine Arkoudas

Chief AI Officer & Architect

7mo

Congratulations, the numbers are remarkable. If I'm reading the benchmarks correctly, the 70B version outperforms GPT-4 on MATH and GPQA. However, a context size of 8K is underwhelming in April 2024. Looking forward to upcoming releases!

12 Reactions

Manish Sainani 🤫

Founder, Engineer & CEO @ hushh 🤫, ex-Google Director of Product - Core Developer & ML, GPU TPU AI/ML Infra, Board Member, Tech Advisor & Investor, ex-Microsoft AI, Splunk AI/ML, E&Y / Capgemini, UBS, ING, Adobe, Sun

7mo

Artificial Intelligence (AI) can significantly enhance data accessibility by automating the classification, cataloging, and quality control of data. This automation streamlines the process of making data easily accessible and usable, reducing errors and improving efficiency. For example, at hushh.ai, AI could automate the organization and retrieval of user data, ensuring it's both accessible and valuable with consent, aligning with your mission to make user-data universally accessible.

2 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Nilanjan Dutta

Associate Director-AI/ML at Harman International| Ex-Amazon| Ex-Accenture Strategy| Ex-Adobe
7mo
Report this post
Llama 3 8B and Llama 3 70B — trained on two custom-built 24,000 GPU clusters — are are among the best-performing generative AI models available today. Llama 3 8B bests other open source models like Mistral’s Mistral 7B and Google’s Gemma 7B, both of which contain 7 billion parameters, on at least nine benchmarks: MMLU, ARC, DROP, GPQA (a set of biology-, physics- and chemistry-related questions), HumanEval (a code generation test), GSM-8K (math word problems), MATH (another mathematics benchmark), AGIEval (a problem-solving test set) and BIG-Bench Hard (a commonsense reasoning evaluation).
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
Like Comment
To view or add a comment, sign in
Ian Eisenberg

Leading AI Research @ CredoAI · Founded the Ai Salon 🧠 Cognitive Neuroscientist
7mo
Report this post
Llama3 is out! I'm particularly excited to learn more about LlamaGuard2 and the other guardrails like CyberSecEval2 (https://lnkd.in/eSabvvsv). I also love seeing ecosystems come together. Llamaguard2 and the risk taxonomy used by MLCommons are the same. This interoperability is critical. Taxonomies will inevitably change over time (perhaps created by a broader participatory process), and building software in a way that it can take advantage of the best ecosystem learnings is critical. Of course, I'm thinking of Credo AI as well. We actively explore open-source models and guardrails for our own use and our customers. We benefit from (and contribute to!) our distributed responsible AI intelligence. Governance is a way to systematize best practices and oversight - and I love more tools to build on top of.
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
Like Comment
To view or add a comment, sign in
Saurabh Khemka

AI Lead @ Parspec | NLP, DNN, Computer Vision
7mo
Report this post
🥁 Exciting News in the AI Sphere! 🥁 Llama3 is officially out! Today marks the release of both 8B and 70B models of Llama3, promising a substantial leap forward in AI capabilities. With an impressive 8k context length and trained on a massive 15 trillion tokens, Llama3 represents a significant technological advancement.
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
1 Comment
Like Comment
To view or add a comment, sign in
Frank Teoh
7mo Edited
Report this post
Meta 's relentless pursuit of "compute" ... by "the end of 2024, we’re aiming to continue to grow our infrastructure ... 350,000 NVIDIA H100 GPUs". Total Power Consumption = 350,000 * 700W = 245 MW For every 2W, you need 1W for HVAC so est total 370 MW. That is equivalent energy to power the whole city of San Francisco. src: https://lnkd.in/gUP49eFZ Supplemental: The average electricity customer in San Francisco, CA uses 729 kWh of electricity per month, and 8,748 kWh over the course of the year. Given that there are approximately 361,222 households in San Francisco, we can calculate the total power consumption for the city 361 Megawatts of power.
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
3 Comments
Like Comment
To view or add a comment, sign in
Matthew Price, Ph.D.

XR Responsible Innovation Advocate. Executive Leader. Creative Storyteller. Metaverse Researcher.
7mo Edited
Report this post
Excited to try this out - especially with integration in Meta AI chatbot. We will see if it’s better than Claude Opus or ChatGPT-4 at python. Waiting for the benchmarks from dev experts! EDIT: a quick test of Meta AI, shows it’s still Llama 2. Llama 3 is only downloadable right now
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
Like Comment
To view or add a comment, sign in
Sean Durkin

Founder, Stealth.
7mo
Report this post
Very exciting we get an on-prem model that can do basic agentic workflows. It’s still a bit less powerful at reasoning than GPT-4-Turbo, which is less powerful than GPT-4. That reasoning gap matters a lot for complex agentic workflows, e.g. agent swarms, agent graphs, agent hierarchies, and Society of Mind emulation.
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
Like Comment
To view or add a comment, sign in
Sauradeep Debnath

Machine Learning Lead @AICoE, Cigna. IIT-Hyd.
7mo Edited
Report this post
18 April : 3 interesting tidbits : 1. A big change in Llama 3 compared to Llama 2 is the use of a new tokenizer that expands the vocabulary size to 128,256 (from 32K tokens in the previous version). This larger vocabulary can encode text more efficiently (both for input and output) and potentially yield stronger multilingualism. 2.The Llama 3 models were trained ~8x more data on over 15 trillion tokens on a new mix of publicly available online data on two clusters with 24,000 GPUs. 3. 10 Million human-annotated data samples with combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct policy optimization (DPO). https://lnkd.in/gh_Vdu3n
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
Like Comment
To view or add a comment, sign in
Ahmed Kamal

Engineering Leadership | Building Data & AI Products | ML Infra | MLOps | xUber/xCareem
7mo
Report this post
We are moving fast towards high performing small language models which would be the real deal in my opinion as the much smaller size will enable their usage on device and will open the door for realtime usecases to leverage their power.
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
Like Comment
To view or add a comment, sign in
Basit Riaz Sheikh, Ph.D.

AI Entrepreneur | Transforming Businesses with AI | Cornell Ph.D.
7mo
Report this post
Another day, yet another new model: Meta releases Llama3. We've barely finished trying out Mistral AI’s latest LLM, and now we have another one in the queue. The exploratory space for AI engineers keeps expanding. The positive aspect, however, is the commoditization of LLMs: they're becoming faster, better, and cheaper, with support for tool integrations enabling independent agents. The world is your oyster. I'm uncertain if, ten years from now, we'll see as many companies producing foundational models. I envision a future with only a few powerful foundational models, akin to the evolution of the cloud computing space (dominated by three major players) or chip design space (dominated by a few giants). Yet, there will be thousands of companies developing and selling apps built on top of these super powerful models, after customizing and fine-tuning them to their specific domains. Until then, happy exploring in the LLM world. #meta #ai #llm #generativeai
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
Like Comment
To view or add a comment, sign in
Niyas Mohammed

Crafting delightful AI experiences
7mo
Report this post
🎊Meta's smallest LLM released today (Llama3 8b) is now as powerful as ~9x bigger model from it's previous generation (Llama2 70b). But why did they decide to go with 8b parameters when all other models in the category are 7b? 🤔 ie. Mistral 7b, Gemma7b, WizardLM 7b My guess is this: Meta probably didn't want to spend a few months training a new 7b model... ... and then find out that it underperformed an existing model in the 7b category. 🤦 A quick solution: make it a bit bigger. Bigger is better. But not so big that it's close to the 13b category. 😉 However, this is still a HUGE STEP forward! This could also mean current 7b models could be squeezed to be even smaller. 🍊 We'll probably soon see 2b models with the equivalent of today's 7b models. Let's see how the open source community takes this forward. Meta is one of the only companies that's putting out true open source work worth millions. And this is a fantastic addition. Truly amazing times to live in! ✨ #GenAI #llm #llama #meta
Yann LeCun Yann LeCun is an Influencer
7mo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. https://lnkd.in/dNZx72FJ
2 Comments
Like Comment
To view or add a comment, sign in

854,864 followers

933 Posts

View Profile Connect

Yann LeCun’s Post

More Relevant Posts

Explore topics