Our therapy-matching chatbot would be very resource consuming if we didn’t optimise its performance. Feel free to read our thoughts about NLP inference optimisation (both high-level approaches and detailed methods)
Startups getting into AI *may* not have a $700k-per-day budget, as OpenAI is reported to have for ChatGPT. So how can you keep your AI budget - and emissions - under control? According to Zofia Smoleń, founder of Polish startup MindMatch.pl, you should: 🤖 Clearly define your needs, which may well be met by a (free) open source model via Hugging Face 🤖 Store all models, classes and functions in a way that allows you to quickly iterate, without needing to dig deep into code 🤖 Fit the model to your purpose: it can't talk about subject X if it hasn't been trained on it 🤖 Run it on the right machine: GPUs may not always be the right option; CPUs can often run inference tasks just as well 🤖 Familiarize yourself with Quantization, Pruning, Distillation and more, to make the most of your models. This and more insights and technical tips in this guest blogpost. Enjoy 😃!