That's right! It's a huge week for small language models (SLMs) Few new SLMs on my radar: 1) Mistral NeMo Highlights: - Introduced by Mistral + NVIDIA - Apache 2.0 license - outperforms Gemma 2 9B and Llama 3 8B - multilingual capabilities - efficient tokenizer (Tekken) 2) GPT-4o mini Highlight: "15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast." 3) SmolLM Highlight: "SmolLM models: 135M, 360M, and 1.7B parameters; Smollm-Corpus curated from Cosmopedia v2, FineWeb-Edu, and Stack-Edu-Python." 4) Mathstral and Codestral Mamba Highlights: - Mathstral achieves 56.6% on MATH and 63.47% on MMLU. - Codestral Mamba tested on in-context retrieval capabilities up to 256k tokens and shown to be quite efficient due to Mamba architecture. 5) H2O Danube3 Highlight: "After final tuning, they show strong performance on academic, chat, and fine-tuning benchmarks. H2O-Danube3 is efficient enough to run on modern smartphones, allowing local and fast processing on mobile devices." Will be doing a YT video on this including capabilities and interesting ways to apply SLMs. Stay tuned! https://lnkd.in/e2WS8ksJ Any others?
The DCLM models from Apple were published on HF - https://huggingface.co/apple/DCLM-7B
I guess my main interest is how does number one compare with number two technically speaking?
Interesting!
ML/NLP Engineer | Building Generative AI Applications🚀 | Linux user (arch btw)
5moI'm pretty sure that Mistral NeMo beat *Llama 3 8B* not 2nd iteration of llama.