𝐇𝐮𝐧𝐲𝐮𝐚𝐧-𝐋𝐚𝐫𝐠𝐞: 𝐀𝐧 𝐎𝐩𝐞𝐧-𝐒𝐨𝐮𝐫𝐜𝐞 𝐌𝐨𝐄 𝐌𝐨𝐝𝐞𝐥 𝐰𝐢𝐭𝐡 52 𝐁𝐢𝐥𝐥𝐢𝐨𝐧 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐞𝐝 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 𝐛𝐲 𝐓𝐞𝐧𝐜𝐞𝐧𝐭 Tencent The Hunyuan-Large model (389B total params, 52B activated) outperforms several recent models including LLama3.1-405B and others across multiple benchmarks. Features 256K context length and MOE architecture. 🏆 Outperforms LLama3.1-70B and competes closely with LLama3.1-405B. Achieves 88.4% on MMLU, 92.9% on CommonsenseQA, and 71.4% on HumanEval. 💡 Incorporates KV cache compression, expert-specific learning rate scaling, and a mixed expert routing strategy, resulting in nearly 95% KV cache savings that improve inference efficiency. 📊 Trained on 7 trillion tokens, including 1.5 trillion synthetic tokens. Abs: https://lnkd.in/gbm4juZP HF: https://lnkd.in/gqWRai8w #AI #GenAI #LLM #MLLM #VLM #MOE
Mark Kovarski’s Post
More Relevant Posts
-
🚀 Sepp Hochreiter's xLSTM architecture pushes LSTMs to compete with top Transformers. ✅ Highlights: 👉 Exponential Gating: Solves LSTM limits dynamically. 👉 Variants: sLSTM (scalar memory) and mLSTM (matrix memory). ✅ Performance: 👉 Trained on 300B tokens, models from 125M to 1.3B parameters. 👉 Excels in long-context tasks with low perplexities. xLSTM is set to revolutionize AI! 🌟 GitHub Link in comments 👇 Follow Maxime Jabarian for more AI news! 🔔 #AI #LSTM #xLSTM #MachineLearning #TechInnovation
To view or add a comment, sign in
-
GPT-4o Mini, announced today, is impressively affordable. With an MMLU score of 82% (as reported by TechCrunch), it outperforms other smaller models like Gemini 1.5 Flash (79%) and Claude 3 Haiku (75%). Even more exciting, it will be available at a lower price than these models, with a reported cost of $0.15 per million input tokens and $0.60 per million output tokens. Its large 128k context window makes it particularly appealing for long context use cases, such as large document retrieval-augmented generation (RAG). https://lnkd.in/gTVYbxYF
To view or add a comment, sign in
-
#DeepSeek-AI resleased a new #LLM #DeepSeek-V2 👉 It is a powerful #LLM with efficient training and inference 👉 Pretrained on a large, high-quality dataset (8.1 trillion tokens) 👉Improved upon #DeepSeek-67B 👉 Further fine-tuned using #SFT and #RL 👉Strong performance, using fewer parameters during processing (21B activated per token). 👉Economical training compared to previous models (saves 42.5%). 👉Efficient inference due to innovative techniques like Multi-head Latent Attention (MLA) that reduces memory requirements. 👉Supports long context lengths (up to 128K tokens). 👉Paper: https://lnkd.in/e7SU7BFs 👉Model: https://lnkd.in/e-hyDn48.
To view or add a comment, sign in
-
What is BasedAi? It's a versatile architecture with a focus on addressing the complexities of maintaining privacy & efficiently while executing complex computations... The current focus is on LLMs looking to solve many of the current Ai issues we see today... Within this model, which you can read in detail on the whitepaper, is the ability to leverage tokenomics to reward all participants from Brain owners (100k Pepecoins) + Miners + Validators + users... This, to me, is the future of decentralized technology with a focus on privacy & open-source architecture... Check out the #BasedAi whitepaper below, and I will have more videos and content coming soon, translating the geek-speak around this innovative approach to the future of LLMs and more! https://lnkd.in/dUGDep_k
To view or add a comment, sign in
-
- 128k-1M tokens (scaling to 10M) - Mixture of Experts Architecture - Multimodality: Text, Image, Video - Performance equal to Gemini Ultra 1.0 Gemini Pro 1.5 is now in limited preview. I guess the 800-pound Gorilla has started dancing. https://lnkd.in/gen2xyfQ
To view or add a comment, sign in
-
Processing in memory (PIM) is gaining traction as an effective method to boost performance and reduce power consumption in deep learning applications. SK hynix's #PIM solution, #AiM, adds processing power right on the memory chip, allowing data to be handled immediately. Expanding the #AI community's familiarity with AiM is key, so SK hynix rolled out the #AiMX platform to make it easy for anyone to try and assess AiM. This paper introduces the AiMX architecture and software stack, designed to fully optimize AiM solutions on Linux-based systems. If you're curious about how the AiMX platform manages and executes various deep learning workloads efficiently, dive into the details in the paper below! #DeepLearning #DL #LLM
To view or add a comment, sign in
-
Interesting guide. It is a guide, and likely to change quickly, however it does provide a useful guide on where and how the various models could be used, and the effective costs, etc.
Helping Retail, CPG, Healthcare and Logistics clients with Cloud, Data, AI & Automation to drive Digital operations | AI Architect (Predictive AI, Neural Networks, Attention Models, Foundation Models, Generative AI)
Enterprises will start using LLMs based on use case specific needs. One size doesn't fit all. For latency focused use cases, Groq with Mixtral 7B is the best option as of now. Consider Mistral, Haiku and Command light for throughput & cost focused use cases Consider Mistral large, Command R+, Claude as knowledge specialists.. The faster, cheaper & knowledgeable LLM will win the case As a Gen AI CoE, Consider creating a marketplace of LLMs focused on latency, throughput and knowledge so the users will choose the LLMs based on their needs. For LLMOps, Create a champion challenger observability platform to evaluate, debug and monitor the champion LLM vs challenger LLM.
To view or add a comment, sign in
-
#Openai sora Diffusion models like Sora begin with a video that seems to be static noise and work their way through many phases to progressively remove the noise. Sora can create full videos all at once or can add more time to created videos to make them longer. The difficult task of ensuring that a subject remains unchanged even when it momentarily disappears from view has been resolved by providing the model with many frames' worth of foresight at once. Sora employs a transformer architecture, just like GPT models, to provide better scaling performance. Sora expands on earlier work in the GPT and DALL·E models. It makes advantage of DALL·E 3's recaptioning approach, which entails creating extremely detailed captions for the visual training data. Consequently, the model may more accurately follow the user's written directions in the designed video. Prompt: A Chinese Lunar New Year celebration video with Chinese Dragon.
To view or add a comment, sign in
-
TLDR: Google announced today new LLM models - Gemma, and improvements for the current line of models - Gemini. Main updates 👇🏼 ➡️ Gemini 1.5 Pro - a new version of the Gemini LLM model with a series of quality improvements. According to the release notes, this includes improvement in translation, coding, and reasoning. ➡️ Gemini 1.5 Flash - is a mini version of the Gemini model that is optimized for narrower or high-frequency tasks with an emphasis on high response time. ➡️ Both models, Gemini 1.5 Pro, and Gemini 1.5 Flash, are equipped with a one million token context window. This powerful feature enables them to process a wide range of inputs, including text, images, audio, and video, making them versatile LLM. ➡️ Gemma - is a family of SOTA open-source new models based on the Gemini model. Currently, the model has two versions: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants. More details are available on the release post: https://lnkd.in/dW5qJZit Gemma model release notes: https://lnkd.in/dp72jCbj #llm #datascience #genai #deeplearning #machinelearning
Gemini 1.5 Pro updates, 1.5 Flash debut and 2 new Gemma models
blog.google
To view or add a comment, sign in
-
Mistral has released a new 176B parameter model Mixtral 8x22B, which marks a significant milestone for open-source artificial intelligence. This open-source model, available for download via torrent, is expected to outperform Mistral AI’s previous Mixtral 8x7B model, which had already surpassed competitors like Llama 2 70B in various benchmarks. Recall that GPT-3.5 had 175B parameters and this is larger that that - although this is multi-modal. This also has a context length of 65,000 tokens which makes it very useful for RAG use cases. Link: https://lnkd.in/g596xhBG #llms #genai #generatieveai #capgemini #llm #opensourceai #capgeminiindia #ai #artificialintelligence #software #mistral #mistral8x22B
v2ray/Mixtral-8x22B-v0.1 · Hugging Face
huggingface.co
To view or add a comment, sign in