Mark Kovarski’s Post

Responsible AI | Co-Founder | CTO | Enterprise | Automation

1mo Edited

𝐇𝐮𝐧𝐲𝐮𝐚𝐧-𝐋𝐚𝐫𝐠𝐞: 𝐀𝐧 𝐎𝐩𝐞𝐧-𝐒𝐨𝐮𝐫𝐜𝐞 𝐌𝐨𝐄 𝐌𝐨𝐝𝐞𝐥 𝐰𝐢𝐭𝐡 52 𝐁𝐢𝐥𝐥𝐢𝐨𝐧 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐞𝐝 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 𝐛𝐲 𝐓𝐞𝐧𝐜𝐞𝐧𝐭 Tencent The Hunyuan-Large model (389B total params, 52B activated) outperforms several recent models including LLama3.1-405B and others across multiple benchmarks. Features 256K context length and MOE architecture. 🏆 Outperforms LLama3.1-70B and competes closely with LLama3.1-405B. Achieves 88.4% on MMLU, 92.9% on CommonsenseQA, and 71.4% on HumanEval. 💡 Incorporates KV cache compression, expert-specific learning rate scaling, and a mixed expert routing strategy, resulting in nearly 95% KV cache savings that improve inference efficiency. 📊 Trained on 7 trillion tokens, including 1.5 trillion synthetic tokens. Abs: https://lnkd.in/gbm4juZP HF: https://lnkd.in/gqWRai8w #AI #GenAI #LLM #MLLM #VLM #MOE

To view or add a comment, sign in

More Relevant Posts

Maxime Jabarian

AI Engineer | Artificial Intelligence | LLM | MSc Astrophysics - TDS Writer
7mo
Report this post
🚀 Sepp Hochreiter's xLSTM architecture pushes LSTMs to compete with top Transformers. ✅ Highlights: 👉 Exponential Gating: Solves LSTM limits dynamically. 👉 Variants: sLSTM (scalar memory) and mLSTM (matrix memory). ✅ Performance: 👉 Trained on 300B tokens, models from 125M to 1.3B parameters. 👉 Excels in long-context tasks with low perplexities. xLSTM is set to revolutionize AI! 🌟 GitHub Link in comments 👇 Follow Maxime Jabarian for more AI news! 🔔 #AI #LSTM #xLSTM #MachineLearning #TechInnovation
1 Comment
Like Comment
To view or add a comment, sign in
Sabari Nathan

Senior AI Engineer, Research Engineer, Deep Learning, Computer Vision, Medical Image Processing and Analysis
5mo
Report this post
GPT-4o Mini, announced today, is impressively affordable. With an MMLU score of 82% (as reported by TechCrunch), it outperforms other smaller models like Gemini 1.5 Flash (79%) and Claude 3 Haiku (75%). Even more exciting, it will be available at a lower price than these models, with a reported cost of $0.15 per million input tokens and $0.60 per million output tokens. Its large 128k context window makes it particularly appealing for long context use cases, such as large document retrieval-augmented generation (RAG). https://lnkd.in/gTVYbxYF
Like Comment
To view or add a comment, sign in
Wazir Ali

Assistant Professor for Data Science | NLP & ML researcher
7mo
Report this post
#DeepSeek-AI resleased a new #LLM #DeepSeek-V2 👉 It is a powerful #LLM with efficient training and inference 👉 Pretrained on a large, high-quality dataset (8.1 trillion tokens) 👉Improved upon #DeepSeek-67B 👉 Further fine-tuned using #SFT and #RL 👉Strong performance, using fewer parameters during processing (21B activated per token). 👉Economical training compared to previous models (saves 42.5%). 👉Efficient inference due to innovative techniques like Multi-head Latent Attention (MLA) that reduces memory requirements. 👉Supports long context lengths (up to 128K tokens). 👉Paper: https://lnkd.in/e7SU7BFs 👉Model: https://lnkd.in/e-hyDn48.
Like Comment
To view or add a comment, sign in
Brian Fanzo

Digital Futurist | Web 3.0 & AI Keynote Speaker
9mo
Report this post
What is BasedAi? It's a versatile architecture with a focus on addressing the complexities of maintaining privacy & efficiently while executing complex computations... The current focus is on LLMs looking to solve many of the current Ai issues we see today... Within this model, which you can read in detail on the whitepaper, is the ability to leverage tokenomics to reward all participants from Brain owners (100k Pepecoins) + Miners + Validators + users... This, to me, is the future of decentralized technology with a focus on privacy & open-source architecture... Check out the #BasedAi whitepaper below, and I will have more videos and content coming soon, translating the geek-speak around this innovative approach to the future of LLMs and more! https://lnkd.in/dUGDep_k
Like Comment
To view or add a comment, sign in
Sanskar Agrawal

Android Grug sometimes dabbling in Backend, IoT, and AI
10mo
Report this post
- 128k-1M tokens (scaling to 10M) - Mixture of Experts Architecture - Multimodality: Text, Image, Video - Performance equal to Gemini Ultra 1.0 Gemini Pro 1.5 is now in limited preview. I guess the 800-pound Gorilla has started dancing. https://lnkd.in/gen2xyfQ

Our next-generation model: Gemini 1.5

blog.google

1 Comment
Like Comment
To view or add a comment, sign in
SK hynix

133,574 followers
1mo
Report this post
Processing in memory (PIM) is gaining traction as an effective method to boost performance and reduce power consumption in deep learning applications. SK hynix's #PIM solution, #AiM, adds processing power right on the memory chip, allowing data to be handled immediately. Expanding the #AI community's familiarity with AiM is key, so SK hynix rolled out the #AiMX platform to make it easy for anyone to try and assess AiM. This paper introduces the AiMX architecture and software stack, designed to fully optimize AiM solutions on Linux-based systems. If you're curious about how the AiMX platform manages and executes various deep learning workloads efficiently, dive into the details in the paper below! #DeepLearning #DL #LLM
Like Comment
To view or add a comment, sign in
Peter Benson

Infosec leader, Responsible AI, Data Protection, Cyber-Psychology amateur, providing thought leadership and business strategy. AI Governance Professional (IAPP), ex CISSP Instructor
8mo
Report this post
Interesting guide. It is a guide, and likely to change quickly, however it does provide a useful guide on where and how the various models could be used, and the effective costs, etc.
Andrew Stephen, AI Architect

Helping Retail, CPG, Healthcare and Logistics clients with Cloud, Data, AI & Automation to drive Digital operations | AI Architect (Predictive AI, Neural Networks, Attention Models, Foundation Models, Generative AI)
8mo Edited

Enterprises will start using LLMs based on use case specific needs. One size doesn't fit all. For latency focused use cases, Groq with Mixtral 7B is the best option as of now. Consider Mistral, Haiku and Command light for throughput & cost focused use cases Consider Mistral large, Command R+, Claude as knowledge specialists.. The faster, cheaper & knowledgeable LLM will win the case As a Gen AI CoE, Consider creating a marketplace of LLMs focused on latency, throughput and knowledge so the users will choose the LLMs based on their needs. For LLMOps, Create a champion challenger observability platform to evaluate, debug and monitor the champion LLM vs challenger LLM.
Like Comment
To view or add a comment, sign in
Junaidul Islam

PhD scholar at National Central University Taiwan
9mo
Report this post
#Openai sora Diffusion models like Sora begin with a video that seems to be static noise and work their way through many phases to progressively remove the noise. Sora can create full videos all at once or can add more time to created videos to make them longer. The difficult task of ensuring that a subject remains unchanged even when it momentarily disappears from view has been resolved by providing the model with many frames' worth of foresight at once. Sora employs a transformer architecture, just like GPT models, to provide better scaling performance. Sora expands on earlier work in the GPT and DALL·E models. It makes advantage of DALL·E 3's recaptioning approach, which entails creating extremely detailed captions for the visual training data. Consequently, the model may more accurately follow the user's written directions in the designed video. Prompt: A Chinese Lunar New Year celebration video with Chinese Dragon.
Like Comment
To view or add a comment, sign in
Rami Krispin

Senior Manager - Data Science and Engineering at Apple | Docker Captain | LinkedIn Learning Instructor
7mo
Report this post
TLDR: Google announced today new LLM models - Gemma, and improvements for the current line of models - Gemini. Main updates 👇🏼 ➡️ Gemini 1.5 Pro - a new version of the Gemini LLM model with a series of quality improvements. According to the release notes, this includes improvement in translation, coding, and reasoning. ➡️ Gemini 1.5 Flash - is a mini version of the Gemini model that is optimized for narrower or high-frequency tasks with an emphasis on high response time. ➡️ Both models, Gemini 1.5 Pro, and Gemini 1.5 Flash, are equipped with a one million token context window. This powerful feature enables them to process a wide range of inputs, including text, images, audio, and video, making them versatile LLM. ➡️ Gemma - is a family of SOTA open-source new models based on the Gemini model. Currently, the model has two versions: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants. More details are available on the release post: https://lnkd.in/dW5qJZit Gemma model release notes: https://lnkd.in/dp72jCbj #llm #datascience #genai #deeplearning #machinelearning

Gemini 1.5 Pro updates, 1.5 Flash debut and 2 new Gemma models

blog.google
Like Comment
To view or add a comment, sign in
Rajeswaran V (PhD)

Generative AI specialist. AI Futures and AI CoE head
8mo
Report this post
Mistral has released a new 176B parameter model Mixtral 8x22B, which marks a significant milestone for open-source artificial intelligence. This open-source model, available for download via torrent, is expected to outperform Mistral AI’s previous Mixtral 8x7B model, which had already surpassed competitors like Llama 2 70B in various benchmarks. Recall that GPT-3.5 had 175B parameters and this is larger that that - although this is multi-modal. This also has a context length of 65,000 tokens which makes it very useful for RAG use cases. Link: https://lnkd.in/g596xhBG #llms #genai #generatieveai #capgemini #llm #opensourceai #capgeminiindia #ai #artificialintelligence #software #mistral #mistral8x22B

v2ray/Mixtral-8x22B-v0.1 · Hugging Face

huggingface.co
Like Comment
To view or add a comment, sign in

4,049 followers

View Profile Follow

Mark Kovarski’s Post

More from this author

Responsible AI (RAI): The Imperative of Responsible Artificial Intelligence

Future-Proofing Privacy: Securing AI, LLMs and Data with Homomorphic Encryption

Transforming Cybersecurity in the Age of Large Language Models (LLMs) and Generative AI

Explore topics