Davit Baghdasaryan’s Post

There are hundreds of AI models out there 🚀 Every vendor claims theirs is the best 💪 How do we know who to trust? 👉 Benchmarks! But, AI benchmarks are like the Wild West today. In this interview, Dylan Fox, Founder/CEO of AssemblyAI and I do a deep dive into what AssemblyAI is up to and how the industry should do benchmarks. Here’s what stood out to me most 👇 1) AssemblyAI is processing terabytes of audio data every single day - podcasts, meetings, phone calls, radio, TV, broadcast, etc 2) Over 100K developers use the API, resulting in 30M AI inference calls per day 3) Every 6 months the cost has been going down due to the economy of scale and model optimizations 4) There is a ton of interest in streaming use cases (voice bots, agent assist, close captions) as well as non-streaming 5) Since non-streaming models can work bi-directionally, they will always produce higher quality. The majority of users submit non-streaming tasks. 6) They recently launched their newest model called Universal-1 7) Universal-1 can do both streaming and non-streaming. It was trained on 12.5M hours of voice data. 90-93% accuracy in English and 90-92% in French, Spanish and German 8) Today’s AI benchmarks are the Wild West. Industry must use independent 3rd parties for benchmarks. The benchmarking data must be closed source so that companies cannot play the system. 9) Average WER is not a good metric as it’s not representative of real-world user needs 10) WER doesn’t include quality for detecting rare words, alphanumerics, proper nouns, emails, formatting, or context. But these are super important for Speech AI workflows (e.g. summaries) 11) What users care about is not WER but fluency of output 12) AssemblyAI is doing a lot of human evaluations of models 13) They used Google TPUv5 for training Universal-1 14) They will always work to make STT models better, faster, and cheaper. STT market will grow faster once the models improve. New use cases will unlock. 15) In 18mo-24mo models will be much more accurate 16) Currently, AssemblyAI is highly focused on STT and Speech Understanding. TTS and Translation will come over time but not soon. Dylan, thanks for your time and insights 🙏 Full interview here 👉 https://lnkd.in/dbSVzc6G

Ruben Lusinyants

Managing Director Morgan Stanley - Head of Private Alternatives Technology. ARPA Institute - Co-Chairman of the Board

7mo

Very helpful, Thanks!

To view or add a comment, sign in

Explore topics