Mark Kovarski’s Post

View profile for Mark Kovarski, graphic

Responsible AI | Co-Founder | CTO | Enterprise | Automation

𝐇𝐮𝐧𝐲𝐮𝐚𝐧-𝐋𝐚𝐫𝐠𝐞: 𝐀𝐧 𝐎𝐩𝐞𝐧-𝐒𝐨𝐮𝐫𝐜𝐞 𝐌𝐨𝐄 𝐌𝐨𝐝𝐞𝐥 𝐰𝐢𝐭𝐡 52 𝐁𝐢𝐥𝐥𝐢𝐨𝐧 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐞𝐝 𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 𝐛𝐲 𝐓𝐞𝐧𝐜𝐞𝐧𝐭 Tencent The Hunyuan-Large model (389B total params, 52B activated) outperforms several recent models including LLama3.1-405B and others across multiple benchmarks. Features 256K context length and MOE architecture. 🏆 Outperforms LLama3.1-70B and competes closely with LLama3.1-405B. Achieves 88.4% on MMLU, 92.9% on CommonsenseQA, and 71.4% on HumanEval. 💡 Incorporates KV cache compression, expert-specific learning rate scaling, and a mixed expert routing strategy, resulting in nearly 95% KV cache savings that improve inference efficiency. 📊 Trained on 7 trillion tokens, including 1.5 trillion synthetic tokens. Abs: https://lnkd.in/gbm4juZP HF: https://lnkd.in/gqWRai8w #AI #GenAI #LLM #MLLM #VLM #MOE

To view or add a comment, sign in

Explore topics