Morris Wong’s Post

Small Language Models (not LLMs) are underrated: Ever since ChatGPT, we are hooked with the idea of large language models where it is this ultimate source of intelligence. However, after working with LLMs for a bit over a year now, I realized that while LLMs are great, it's small language model that is the hidden gem. Here is why I think they are underrated: - Fast: One of the biggest features is since they are smaller in nature, they are meant to be able to run faster than large language models, so you would get the response quicker. - Offline: As they are smaller in size, they will be able to run on a device without needing any internet connection, which also means you can use it even when you are offline! - Domain specific: Since it is smaller, it is also easier to train small models for specific tasks like on-device translation. One of the recent small language debuts has to be Apple Intelligence, where they have their small 3B (GPT-3 has 175B) parameter model to be the first model that the user will interact with, and gradually route it to their Apple server models and then to ChatGPT. Come and try it out and feel it for yourself at HuggingChat. They have a small model, Phi-3-mini-4k-instruct, that you can try now: https://buff.ly/4bwnwN1. We might see David vs Goliath stories again in this space!

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics