Davit Baghdasaryan’s Post

Name: What AssemblyAI is up to and why it matters | Davit Baghdasaryan posted on the topic | LinkedIn
Uploaded: 2024-05-02T14:08:47.450Z
Duration: 1 min 47 s
Channel: Davit Baghdasaryan

Davit Baghdasaryan

7mo Edited

There are hundreds of AI models out there 🚀 Every vendor claims theirs is the best 💪 How do we know who to trust? 👉 Benchmarks! But, AI benchmarks are like the Wild West today. In this interview, Dylan Fox, Founder/CEO of AssemblyAI and I do a deep dive into what AssemblyAI is up to and how the industry should do benchmarks. Here’s what stood out to me most 👇 1) AssemblyAI is processing terabytes of audio data every single day - podcasts, meetings, phone calls, radio, TV, broadcast, etc 2) Over 100K developers use the API, resulting in 30M AI inference calls per day 3) Every 6 months the cost has been going down due to the economy of scale and model optimizations 4) There is a ton of interest in streaming use cases (voice bots, agent assist, close captions) as well as non-streaming 5) Since non-streaming models can work bi-directionally, they will always produce higher quality. The majority of users submit non-streaming tasks. 6) They recently launched their newest model called Universal-1 7) Universal-1 can do both streaming and non-streaming. It was trained on 12.5M hours of voice data. 90-93% accuracy in English and 90-92% in French, Spanish and German 8) Today’s AI benchmarks are the Wild West. Industry must use independent 3rd parties for benchmarks. The benchmarking data must be closed source so that companies cannot play the system. 9) Average WER is not a good metric as it’s not representative of real-world user needs 10) WER doesn’t include quality for detecting rare words, alphanumerics, proper nouns, emails, formatting, or context. But these are super important for Speech AI workflows (e.g. summaries) 11) What users care about is not WER but fluency of output 12) AssemblyAI is doing a lot of human evaluations of models 13) They used Google TPUv5 for training Universal-1 14) They will always work to make STT models better, faster, and cheaper. STT market will grow faster once the models improve. New use cases will unlock. 15) In 18mo-24mo models will be much more accurate 16) Currently, AssemblyAI is highly focused on STT and Speech Understanding. TTS and Translation will come over time but not soon. Dylan, thanks for your time and insights 🙏 Full interview here 👉 https://lnkd.in/dbSVzc6G

1 Comment

Ruben Lusinyants

Managing Director Morgan Stanley - Head of Private Alternatives Technology. ARPA Institute - Co-Chairman of the Board

7mo

Very helpful, Thanks!

1 Reaction

To view or add a comment, sign in

More Relevant Posts

AssemblyAI

30,557 followers
7mo
Report this post
🎙 Our CEO Dylan Fox sat down with Davit Baghdasaryan to chat about all-things AssemblyAI and the importance of having industry benchmarks that users can trust. 👀 Watch the full conversation here: https://lnkd.in/dNYcU8WH

Davit Baghdasaryan
7mo Edited

There are hundreds of AI models out there 🚀 Every vendor claims theirs is the best 💪 How do we know who to trust? 👉 Benchmarks! But, AI benchmarks are like the Wild West today. In this interview, Dylan Fox, Founder/CEO of AssemblyAI and I do a deep dive into what AssemblyAI is up to and how the industry should do benchmarks. Here’s what stood out to me most 👇 1) AssemblyAI is processing terabytes of audio data every single day - podcasts, meetings, phone calls, radio, TV, broadcast, etc 2) Over 100K developers use the API, resulting in 30M AI inference calls per day 3) Every 6 months the cost has been going down due to the economy of scale and model optimizations 4) There is a ton of interest in streaming use cases (voice bots, agent assist, close captions) as well as non-streaming 5) Since non-streaming models can work bi-directionally, they will always produce higher quality. The majority of users submit non-streaming tasks. 6) They recently launched their newest model called Universal-1 7) Universal-1 can do both streaming and non-streaming. It was trained on 12.5M hours of voice data. 90-93% accuracy in English and 90-92% in French, Spanish and German 8) Today’s AI benchmarks are the Wild West. Industry must use independent 3rd parties for benchmarks. The benchmarking data must be closed source so that companies cannot play the system. 9) Average WER is not a good metric as it’s not representative of real-world user needs 10) WER doesn’t include quality for detecting rare words, alphanumerics, proper nouns, emails, formatting, or context. But these are super important for Speech AI workflows (e.g. summaries) 11) What users care about is not WER but fluency of output 12) AssemblyAI is doing a lot of human evaluations of models 13) They used Google TPUv5 for training Universal-1 14) They will always work to make STT models better, faster, and cheaper. STT market will grow faster once the models improve. New use cases will unlock. 15) In 18mo-24mo models will be much more accurate 16) Currently, AssemblyAI is highly focused on STT and Speech Understanding. TTS and Translation will come over time but not soon. Dylan, thanks for your time and insights 🙏 Full interview here 👉 https://lnkd.in/dbSVzc6G
Like Comment
To view or add a comment, sign in
Paul Colon

Strategic Technical Account & Customer Success Manager | Growth-driven Solutions | Expert in Networks, Systems & Customer-Centricity | Empowering Clients through Exceptional Service & Collaboration
7mo
Report this post
🚀 Venturing through the AI landscape is like exploring an ever-shifting digital maze! 🧩 But fear not, in our CEO's recent interview with Dylan Fox of AssemblyAI, they dug deep into how to separate the gems from the gravel. Here's the scoop: 👇 #AI #AssemblyAI #Benchmarks 📊 #Krisp

Davit Baghdasaryan
7mo Edited

There are hundreds of AI models out there 🚀 Every vendor claims theirs is the best 💪 How do we know who to trust? 👉 Benchmarks! But, AI benchmarks are like the Wild West today. In this interview, Dylan Fox, Founder/CEO of AssemblyAI and I do a deep dive into what AssemblyAI is up to and how the industry should do benchmarks. Here’s what stood out to me most 👇 1) AssemblyAI is processing terabytes of audio data every single day - podcasts, meetings, phone calls, radio, TV, broadcast, etc 2) Over 100K developers use the API, resulting in 30M AI inference calls per day 3) Every 6 months the cost has been going down due to the economy of scale and model optimizations 4) There is a ton of interest in streaming use cases (voice bots, agent assist, close captions) as well as non-streaming 5) Since non-streaming models can work bi-directionally, they will always produce higher quality. The majority of users submit non-streaming tasks. 6) They recently launched their newest model called Universal-1 7) Universal-1 can do both streaming and non-streaming. It was trained on 12.5M hours of voice data. 90-93% accuracy in English and 90-92% in French, Spanish and German 8) Today’s AI benchmarks are the Wild West. Industry must use independent 3rd parties for benchmarks. The benchmarking data must be closed source so that companies cannot play the system. 9) Average WER is not a good metric as it’s not representative of real-world user needs 10) WER doesn’t include quality for detecting rare words, alphanumerics, proper nouns, emails, formatting, or context. But these are super important for Speech AI workflows (e.g. summaries) 11) What users care about is not WER but fluency of output 12) AssemblyAI is doing a lot of human evaluations of models 13) They used Google TPUv5 for training Universal-1 14) They will always work to make STT models better, faster, and cheaper. STT market will grow faster once the models improve. New use cases will unlock. 15) In 18mo-24mo models will be much more accurate 16) Currently, AssemblyAI is highly focused on STT and Speech Understanding. TTS and Translation will come over time but not soon. Dylan, thanks for your time and insights 🙏 Full interview here 👉 https://lnkd.in/dbSVzc6G
Like Comment
To view or add a comment, sign in
Pedro Sanders

Telephony and Voice AI @ Fonoster
4mo
Report this post
4 surprisingly unique uses for bi-directional streams: (Going beyond Voice AI.) Bi-directional streams are changing the game. Here are some exciting use cases: 1. Accent softening: ↳ Great for smooth communication across regions. ↳ Company: Tomato.ai 2. Emotion cancelling: ↳ Ideal for dealing with challenging customers. ↳ Company: ??? 3. Two-way translation: ↳ Excellent for B2B and consumer markets alike. ↳ Company: Sindarin - Conversational Speech AI 4. Text in / Audio out: ↳ Perfect for accessible communications. ↳ Company: Nagish What are other places where you are seeing streams?

3 Comments
Like Comment
To view or add a comment, sign in
James Brooks

Innovation Principal | Customer Experience | Generative AI | Customer Contact | AI Transformation | Design Thinking | Author
9mo
Report this post
POV: You’re a large, established corporation, with mature business processes looking to invest in generative AI. Where do you begin? 📺 A couple of interesting and real use cases from the BBC across three key themes; Maximising value of existing content, new audience experiences, and efficiency ➡️ Reformatting existing content in a way that widens its appeal e.g. taking a live sport radio commentary and transcribing it rapidly to text for BBC Sport’s live pages. ➡️ A new BBC Assistant with the potential for chatbots to provide interactive and tailored learning on BBC Bitesize. ➡️ GenAI can help teams find content within our programmes through things like better labelling – which will help them create new forms of content more quickly - e.g. A clip or collection of certain moments within a programme/programmes. #GenAI #LLM #AI #Automation https://lnkd.in/eXBtp8bW

An update on the BBC’s plans for Generative AI (Gen AI) and how we plan to use AI tools responsibly

bbc.com

5 Comments
Like Comment
To view or add a comment, sign in
Matthew Haltom

Creative Technologist, Sr. Experience Designer - Generative AI @Adobe, Spa business owner
7mo Edited
Report this post
Ever found yourself wondering how to best interact with a Large Language Model (LLM)? Prompt libraries offer a bridge to understanding and leveraging AI capabilities effectively. A prompt library serves as a curated collection of user queries designed to elicit specific responses from an AI, making them invaluable tools for newcomers to familiarize themselves with the breadth of AI functionalities. They demystify the process of engaging with LLMs, providing a foundational understanding and structure for people to use these technologies. Furthermore, if implemented with to the right context and intent, they are a driver for engagement. LLMs can suggest follow-up questions or actions, and these presented by filtering a problem library based on use cases and intent. However, it’s important to recognize that prompt libraries are but stepping stones. As we advance, the focus shifts towards developing AI systems tailored to specific use cases, user needs, and creating more contextual interactions. We’ll slowly transition from a one-size-fits-all approach to bespoke conversational experiences with embedded micro-UI’s that will slowly become the norm for on-demand content. Suggested content and actions will only become more tailored to use cases, and rightly so. In essence, prompt libraries play a critical role in bridging the knowledge gap of today with the future of tomorrow. Let’s embrace these developments, looking forward to a future where AI enhances our digital engagements in ever more personalized and meaningful ways.

3 Comments
Like Comment
To view or add a comment, sign in
Arcee AI

7,177 followers
2w Edited
Report this post
Here at Arcee AI, we have spent the last year refining small language models (SLMs) to solve real-world challenges. Our tools have helped businesses train custom models that are powerful, efficient, and aligned with their needs. But we noticed something important: many businesses aren’t just looking for better models—they need solutions that work at the product level. They want to automate tasks, simplify workflows, and create incredible experiences for their customers. That’s why we built 𝗔𝗿𝗰𝗲𝗲 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮. Arcee Orchestra is a platform designed to make AI practical and actionable. It’s intuitive, adaptable, and built for businesses that want to get things done—faster and smarter. And that's not all we're bringing you today. We're also launching the public beta of the 𝗔𝗿𝗰𝗲𝗲 𝗠𝗼𝗱𝗲𝗹 𝗘𝗻𝗴𝗶𝗻𝗲, our hosted inference service—it gives you direct access to the suite of SLMs that power Arcee Orchestra. You can get started with Model Engine via the link in the comments below. Read about our journey from creating the SLM category to now leveraging SLMs to their full potential to create agentic AI workflows with Arcee Orchestra. 𝗔𝗻𝗱 𝘁𝗼 𝗯𝗲 𝗮𝗺𝗼𝗻𝗴 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝘁𝗼 𝗴𝗲𝘁 𝗮 𝗱𝗲𝗺𝗼 𝗼𝗳 𝗔𝗿𝗰𝗲𝗲 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮, 𝗰𝗼𝗺𝗲 𝘀𝗲𝗲 𝘂𝘀 𝗮𝘁 𝗿𝗲:𝗜𝗻𝘃𝗲𝗻𝘁–𝘄𝗲’𝗿𝗲 𝗮𝘁 𝗕𝗼𝗼𝘁𝗵 𝟭𝟰𝟬𝟲!! https://lnkd.in/evk26zGt #LLMs #EnterpriseAI #reInvent2024

Arcee AI, From Small Language Model Pioneer, to Pioneering SLM-Powered Agentic AI Workflows

blog.arcee.ai

5 Comments
Like Comment
To view or add a comment, sign in
Davis Stone

Strategic Partnerships and GTM @ Arcee.ai ☁ Ex-AWS GenAI | Former Founder
2w
Report this post
🚀🚨 PRODUCT LAUNCH ALERT🚨🚀 Here at Arcee AI, we have spent the last year refining small language models (SLMs) to solve real-world challenges. Our tools have helped businesses train custom models that are powerful, efficient, and aligned with their needs. But we noticed something important: many businesses aren’t just looking for better models—they need solutions that work at the product level. They want to automate tasks, simplify workflows, and create incredible experiences for their customers. That’s why we built 𝗔𝗿𝗰𝗲𝗲 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮, which is launching in beta today. Arcee Orchestra is a platform designed to make AI practical and actionable. It’s intuitive, adaptable, and built for businesses that want to get things done—faster and smarter. And that's not all we're bringing you today. We're also launching the public beta of the 𝗔𝗿𝗰𝗲𝗲 𝗠𝗼𝗱𝗲𝗹 𝗘𝗻𝗴𝗶𝗻𝗲, our hosted inference service—it gives you direct access to the suite of SLMs that power Arcee Orchestra. You can get started with Model Engine via the link in the comments below. Read about our journey from creating the SLM category to now leveraging SLMs to their full potential to create agentic AI workflows with Arcee Orchestra. 𝗔𝗻𝗱 𝘁𝗼 𝗯𝗲 𝗮𝗺𝗼𝗻𝗴 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝘁𝗼 𝗴𝗲𝘁 𝗮 𝗱𝗲𝗺𝗼 𝗼𝗳 𝗔𝗿𝗰𝗲𝗲 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮, 𝗰𝗼𝗺𝗲 𝘀𝗲𝗲 𝘂𝘀 𝗮𝘁 𝗿𝗲:𝗜𝗻𝘃𝗲𝗻𝘁–𝘄𝗲’𝗿𝗲 𝗮𝘁 𝗕𝗼𝗼𝘁𝗵 𝟭𝟰𝟬𝟲!! https://lnkd.in/eQ-EWN_r

Arcee AI, From Small Language Model Pioneer, to Pioneering SLM-Powered Agentic AI Workflows

blog.arcee.ai

3 Comments
Like Comment
To view or add a comment, sign in
Hugo Carreira

Helping companies to manage IT Strategy, AI Strategy, IT M&A Program
5mo
Report this post
#AI, Embracing the AI future of fleet management AI and natural language interfaces can improve how businesses operate, but many fleet companies still struggle using the data they already have. Even though AI language models are spreading fast for consumers, enterprises and fleets specifically are not taking full advantage of them. Data holds the keys to unlocking new efficiencies, reducing costs, enhancing sustainability and delivering better customer experiences at every level of the organization – from the C-suite to drivers on the road. So why are large, asset-heavy fleets failing to leverage available data, leaving critical business questions unanswered? This is a massive, missed opportunity. Some of the difficulty stems from the fact that many questions require technical resources, including teams of data analysts and developers, to answer. Data is siloed, inaccessible and opaque. But thanks to recent developments in large language models (LLM) and AI, that gap can be bridged. This is the essence of the LLM revolution. Not just the use of natural language (which is impressive enough) but moving beyond siloed data and specialized AI tools to a holistic solution that provides universal access to insights and actions across the entire fleet operation. This allows front-line employees to analyze their data and ask the questions affecting them the most, democratizing access to information and empowering innovations through local champions. No more running custom queries or compiling pivot tables from fragmented reports. Just simple questions and answers around drivers, vehicles, routes or tasks – all in one place. Questions such as: * Which of my sites has the most expensive cleaning costs per vehicle? * What was the most common “out of service” reason this week? * Are my drivers starting their routes on time? * What correlates to reservations that get the highest NPS scores? * What was the average waiting time for charging-stop tasks this week? * Who are the drivers with the highest number of parking violations in the last year? #AI, #RB
Like Comment
To view or add a comment, sign in
Weng Honn Kan

Content and Marketing Strategist | Tech Start-up | SocMed Enthusiast
4mo
Report this post
Is AI becoming more like humans, or are we becoming more like AI? I had an awesome session with Ashvin Praveen, where he shared how his AI startup helps content creators write like it’s them, with their own tone and voice. It made me ponder: Is AI emulating human creativity well, or are we adapting to think more like AI? This isn’t just a techie question—it’s about our daily lives. Think about creativity. We’re used to associating it with humans. When we paint, write, or compose, we believe it’s a deeply personal process. But AI can now write poems, create art, and compose music. Sounds cool, right? But here’s the twist—AI does this by learning patterns from us, then mimicking them. On the flip side, how are we adapting? Look at us typing on our devices, using predictive text. We’ve gotten used to recommendations from algorithms—whether it’s Netflix suggesting a binge-worthy series or Spotify curating our next favorite song. We’re relying on AI to make choices for us. - 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧: We’ve started using abbreviations and emojis because that's what our devices suggest. - 𝐃𝐞𝐜𝐢𝐬𝐢𝐨𝐧-𝐌𝐚𝐤𝐢𝐧𝐠: We’re influenced by recommendations from AI, affecting our shopping, watching, and even dating choices. - 𝐖𝐨𝐫𝐤 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲: AI tools help us schedule meetings, reply to emails, and even draft documents. So, are we becoming more like AI, methodically efficient and data-driven, or is AI becoming more like us, creative and expressive? It’s a bit of both. AI learns from us to be more human-like, while we adapt to AI's efficiency and convenience. In the end, it’s not about choosing sides but understanding the harmonious balance. Can we embrace this evolution without losing our human touch? That’s the exciting challenge we face. The full podcast will be at: 1) https://lnkd.in/g3K2fktt 2) https://lnkd.in/gi6WPN95 3) https://lnkd.in/gRSHpJFY 4) https://lnkd.in/gGrpG5bZ Above article is generated by Cleve.Ai ~pretty "real" right? Test it yourself here https://www.cleve.ai/ #wenghonn #aimarketing
Like Comment
To view or add a comment, sign in
Carl P.

Operations Manager / R&D / Systems @MiSolutions Group
8mo
Report this post
The Pulse on AI - CSPARNELL Welcome to Friday Factual! Today we're covering these intriguing topics in the AI world: 1.Generative AI's Ascension: The AI landscape in 2023 has seen a significant rise in generative AI technologies, moving them towards mainstream adoption. Companies that have embraced these technologies early are finding themselves at a competitive advantage, innovating in areas like content creation and business process automation [[❞]](https://lnkd.in/g6Y5hFz2). 2. AI in Filmmaking: AI is revolutionizing the film industry, from generative AI tools in big studios like Paramount and Disney to startups like Runway, which hosts an AI film festival with a hefty prize. This technology is altering everything from special effects to actor dubbing [[❞]](https://lnkd.in/guViF6U3). 3. Regulatory Landscape: As AI technologies, especially generative ones, evolve, they increasingly face legal challenges. Intellectual property disputes and ethical concerns are prompting more robust regulatory frameworks, affecting companies across the spectrum from startups to tech giants [[❞]](https://lnkd.in/gAhCu4Vm). 4. AI's Impact on Employment: McKinsey's 2023 survey highlights that AI is increasingly influencing business functions like product development and risk management. However, there's a growing need for companies to address the workforce disruptions caused by AI, emphasizing a shift in talent needs and roles [[❞]](https://lnkd.in/gs4jicKf). 5. AI for Good: Google’s AI for Social Good initiative demonstrates how AI is being leveraged to address societal challenges. Their projects range from environmental efforts to educational tools, showcasing AI's potential to contribute positively beyond commercial applications [[❞]](https://ai.google/). Top 5 AI Tools of the Week: 1.[ChatGPT](https://lnkd.in/giNczqP7): Advanced conversational model. 2.[DeepL](https://lnkd.in/gsDggjt8): Superior language translation service. 3.[Cresta](https://www.cresta.com): AI for real-time sales and customer service enhancement. 4. [Jasper](https://www.jasper.ai): Content generation and marketing AI. 5.[Synthesia](https://www.synthesia.io): AI video generation platform. #AIRevolution #TechTrends #FutureOfWork #DigitalTransformation #SocialGoodAI
Like Comment
To view or add a comment, sign in

20,795 followers

469 Posts

View Profile Follow

Davit Baghdasaryan’s Post

More Relevant Posts

Explore topics