Lior Sinclair’s Post

Name: Lior Sinclair on LinkedIn: You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new… | 71 comments
Uploaded: 2024-09-16T15:01:59.699Z
Duration: 32 s
Channel: Lior Sinclair

Lior Sinclair

Covering the latest in AI R&D • ML-Engineer • MIT Lecturer • Building AlphaSignal, a newsletter read by 200,000+ AI engineers.

3mo

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.

71 Comments

Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

3mo

For those looking to run Whisper locally in one click, Mozilla openly released Whisperfile a few weeks ago. A high-performance, local tool for audio transcription and translation using OpenAI's Whisper model https://huggingface.co/Mozilla/whisperfile

18 Reactions

Cohorte

3mo

This is amazing! 🚀 Transcribing 2.5 hours of audio in just 98 seconds locally is a huge leap forward. Loving how insanely-fast-whisper leverages Whisper and Pyannote for such efficiency. Can't wait to try it out on my projects! Who else is excited to boost their transcription workflows? 🔥 #AI #Transcription #MachineLearning #TechInnovation

7 Reactions

Isham Rashik

🤖 Machine Learning Engineer 🦾 Generative AI 🧠 Natural Language Processing 💻 Prompt Engineering 👨💻 Computer Vision 👁️ Data Science 📊 Community Builder & Mentor 👨🏫 On a drive to change the world 🚀

3mo

I used this 8 months ago

2 Reactions

Simon Sobisch

Senior Backend Developer; GNU Maintainer and Project Lead of the GnuCOBOL compiler

3mo

Somehow people tend to forget how easy it is to add your sources... I personally find that more useful than reposting. https://github.com/Vaibhavs10/insanely-fast-whisper with the info from above plus more documentation and example calls.

7 Reactions

Adebolajo Sunday

AI/ML ENGINEER (COMPUTER VISION, NLP and System Integration)

3mo

I did benchmarking on both faster whisper and insanely fast whisper on (RTX 3070) . Faster whisper still perform better. Check this benchmarking asl.https://medium.com/@GenerationAI/streaming-with-faster-whisper-vs-insanely-fast-whisper-9ecfa4792fd7

1 Reaction

Antonio Bray

Founder, President, Chief Visionary Officer and Chief Technology Officer at AudioOne, Inc

3mo

Looks very useful. i am thinking of the 1000s of uses cases now, especially if this could be automated.

🦋 Jochen Schultz

Let machines take over!

3mo

Just cut the audio in chunks (look out for low amplitude) and go for parallel processing.. It would surprise me if you can't transcribe it in under a second. Just add more hardware.

1 Reaction

DataInsta

3mo

that sounds like a game changer! super cool tech for quick audio transcriptions. how do you see it impacting your work? Lior Sinclair

1 Reaction

Swapnil Gupta

3mo

Can you let me know any good text to speech model, that has good human like audio

Salama B.

3mo

Whether you're working on research, podcasting, or interviews, tools like this are driving accessibility and productivity through rapid, high-quality transcriptions. AI innovation at its best!

See more comments

To view or add a comment, sign in

More Relevant Posts

⭐Patrice Séjalon

CTO | Transforming risk communication & insurance training with AI | Innovator | Metaverse | Web3 | Founder | ex Société Générale, Crédit Agricole, BNP | Zurich, Silicon Valley
3mo
Report this post
🚀 Breakthrough in Audio Transcription Speed! 🎙️ Imagine transcribing a feature-length movie in less time than it takes to make a cup of coffee. That's now possible with 'insanely-fast-whisper', a game-changing GitHub project. Key highlights: • Transcribe 2.5 hours of audio in just 98 seconds • Works locally on Mac or Nvidia GPUs • Combines Whisper + Pyannote for rapid transcription and speaker segmentation For the tech-savvy, here's a quick setup: 1. pip install insanely-fast-whisper 2. Run with your file and settings This tool isn't just fast—it's revolutionizing how we process audio data. Think about the implications for: • Journalists transcribing interviews • Researchers analyzing focus groups • Content creators captioning videos What would you do with local and near-instant transcription? Share your ideas below! 👇

Lior Sinclair Lior Sinclair is an Influencer

Covering the latest in AI R&D • ML-Engineer • MIT Lecturer • Building AlphaSignal, a newsletter read by 200,000+ AI engineers.
3mo

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.
Like Comment
To view or add a comment, sign in
Antonio Bray

Founder, President, Chief Visionary Officer and Chief Technology Officer at AudioOne, Inc
3mo Edited
Report this post
This looks like it could be very useful. Thinking of the 1000s of scenarios. hopefully this can be automated, deployed and scaled beyond local. Repo: https://lnkd.in/ePmTCfsK

Lior Sinclair Lior Sinclair is an Influencer

Covering the latest in AI R&D • ML-Engineer • MIT Lecturer • Building AlphaSignal, a newsletter read by 200,000+ AI engineers.
3mo

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.
Like Comment
To view or add a comment, sign in
David K.

🚀 LLMs & NLP Innovator | AI & Big Data Engineering Leader | Python Back-end Expert | 15+ Years in Tech | Speaker & Mentor
2mo
Report this post
pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN>

Lior Sinclair Lior Sinclair is an Influencer

Covering the latest in AI R&D • ML-Engineer • MIT Lecturer • Building AlphaSignal, a newsletter read by 200,000+ AI engineers.
3mo

You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN> ♻️ Repost this if you found it useful. ↓ Are you technical? Check out https://AlphaSignal.ai to get a daily summary of breakthrough models, repos and papers in AI. Read by 200,000+ devs.

2 Comments
Like Comment
To view or add a comment, sign in
Sayed Raheel Hussain

ML Engineer | AI Researcher | Generative AI & LLMs | Computer Vision & Data Science
2mo Edited
Report this post
🚀 NVIDIA launches Llama-3.1-Nemotron-70B! https://lnkd.in/efahSbKW 📊 As of Oct 1, 2024, it's topping the charts: • Arena Hard: 85.0 • AlpacaEval 2 LC: 57.6 • MT-Bench: 8.98 🔑 Key points: • Outperforms other models across multiple benchmarks • Longer responses (avg 2199.8 characters) • Consistent performance (narrow confidence interval) 𝗛𝗲𝗿𝗲 𝗮𝗿𝗲 𝘀𝗼𝗺𝗲 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻 𝗮𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗘𝘃𝗮𝗹 𝗺𝗲𝘁𝗿𝗶𝗰𝘀 𝘂𝘀𝗲𝗱 𝑨𝒓𝒆𝒏𝒂-𝑯𝒂𝒓𝒅 is a challenging AI benchmark created from real user queries on Chatbot Arena. It uses the BenchBuilder pipeline to select 500 high-quality, diverse prompts that test language models on complex, real-world tasks across various domains. The process involves: * Question Source: Real user queries from Chatbot Arena (initially 200,000) * Question Selection Process: - BenchBuilder pipeline filters and evaluates queries - AI (GPT-4-Turbo) scores questions on 7 key qualities: - Specificity - Domain knowledge - Complexity - Problem-solving - Creativity - Technical accuracy - Real-world relevance - Topic modeling clusters similar queries - High-quality clusters are selected - 500 challenging prompts sampled for final benchmark Evaluation uses an LLM-as-judge approach, comparing model outputs to a baseline. This method provides a more comprehensive, updatable, and cost-effective evaluation than previous benchmarks, better separating top models and aligning well with human judgments. Arena-Hard Paper: https://lnkd.in/eh_hK7NK Github: https://lnkd.in/e9qBbd3E #NVIDIA #AI #LLM #TechInnovation #LLM
Like Comment
To view or add a comment, sign in
Sai Ruthvik

Sharing my observations | Startup Enthusiast | ML enthusiast | Data Science from IIT Madras |
1mo
Report this post
NVIDIA has silently released its new open-source model, Llama-3.1-Nemotron-70B, which is reported to surpass OpenAI's GPT-4o and Anthropic's Claude 3.5 in many crucial benchmarks with relatively lesser parameters Key Features of Llama-3.1-Nemotron-70B: - Can answer How many r's are in strawberry? accurately 😅 - Performance: The model has performed well on several alignment benchmarks, for example: - Arena Hard: 85.0 - AlpacaEval 2 LC: 57.6 - GPT-4-Turbo MT-Bench: 8.98 These scores show that the Nemotron model does not only outperform but also beats the performance of its larger counterparts, which have more parameters. - Architecture: The Nemotron model is built on the Llama 3.1 framework and uses transformers to generate coherent responses. It has relatively modest 70 billion parameters but is quite efficient and capable in the processing of user inquiries. - Open Source Availability: The model and its corresponding reward model with its training dataset are open sourced and are available on the Hugging Face so there's a scope for us to experiment with the model - RLHF (Reinforcement Learning from Human Feedback) : This model was trained on RLHF techniques that relied on the REINFORCE algorithm to optimize the best performance based on human preference. If you are interested in exploring or trying out this model further, preview the official website of NVIDIA. #AI #NVIDIA #Llama #MachineLearning #OpenSource #Innovation https://lnkd.in/gd-_p_rP

nvidia model by llama-3_1-nemotron-70b-reward | NVIDIA NIM

build.nvidia.com

1 Comment
Like Comment
To view or add a comment, sign in
Ritesh Shergill

Senior Data and Systems Architect | Gen AI | AI and Software Architecture Consultations | Career Guidance | Ex Vice President at JP Morgan Chase | Startup Mentor | Angel Investor | Author
9mo
Report this post
Is having a background in AI imperative to get into Generative AI? Maybe Not.. Maybe not if you want to build an RAG pipeline. But most definitely yes if you want to Fine Tune LLMs. I consider myself lucky to have that background, so I was able to transition seamlessly into the task of Fine Tuning models but honestly, most work you might do with LLMs would involve building an RAG pipeline with Prompt Engineering. With Autogen, it has become even more easier to setup a RAG workflow. So do you need a background in AI to work with LLMs? - YES - if you need to Fine Tune models - NOT NECESSARILY - If you want to build an RAG pipeline. PS: If you have an NVIDIA GPU>= 4070 powering your laptop, you can just about Fine Tune your own LLM models using PEFT. Or use Google Collab to try it out. #llm #rag

2 Comments
Like Comment
To view or add a comment, sign in
Mujtaba Abdul Haque

Software Developer| MSCS Graduate from Seattle University | AWS Certified Solutions Architect
2mo Edited
Report this post
Ever imagine an AI that could not only chat with you about Shakespeare’s sonnets but also critique the artistry in your vacation photos? Enter NVIDIA’s latest marvel, the NVLM-D-72B model, a veritable Swiss Army knife in the AI world, blending text and visuals with the ease of a seasoned bartender mixing your favorite cocktail 🍹. NVIDIA’s brainchild, the NVLM-D-72B, isn’t just smart—it’s a smart aleck that can understand pictures and words together 📸📖. Picture this: an AI that can look at a doodle and not only tell you what it sees but also spin up a story about it! Trained on NVIDIA’s ultra-sophisticated Megatron-LM and cozying up with Hugging Face, this model is like having a supercomputer in your pocket, only it doesn’t weigh you down. Whether you’re looking to spice up your meme game or develop an app that finally understands fashion advice from a picture, NVLM-D-72B is your go-to 🚀. So, if you’re ready to ride the wave of AI innovation, NVIDIA’s got your ticket to the future, and it’s first class! Dive into the details on their Hugging Face page, and let’s get this techno-party started! 🎉 For a deeper dive and more giggles, check out the full tech extravaganza at NVIDIA's Hugging Face space: #AI #MachineLearning #DataScience #NVIDIA #Technology #Innovations #DeepLearning #AITransformation

nvidia/NVLM-D-72B · Hugging Face

huggingface.co

1 Comment
Like Comment
To view or add a comment, sign in
Prince Jain

Senior Data Scientist
4mo Edited
Report this post
The Biggest problem in AI is today getting GPU for LLM, below formula is handy for this:- 🚀 **Understanding GPU Memory Requirements for AI Models** 🚀 The formula is simple: **M** = (P*4B)*1.2/(32/Q) Where: - **M**: GPU memory required (in GB) - **P**: Number of parameters in the model (e.g., a 7B model has 7 billion parameters) - **4B**: Bytes used per parameter (4 bytes) - **32**: Total bits in 4 bytes - **Q**: Bits used to load the model (e.g., 16 bits, 8 bits, or 4 bits) - **1.2**: Accounts for a 20% overhead for additional memory usage 🔍 **Example**: For a model with 7B parameters loaded with 16-bit precision, you can estimate the GPU memory required using this formula. It's an essential consideration when planning your AI infrastructure! 💡 #AI #MachineLearning #DeepLearning #GPUMemory #ModelOptimization
Like Comment
To view or add a comment, sign in
David Pereira

Europe & Latam Lead - Data & AI
2mo
Report this post
𝐍𝐕𝐈𝐃𝐈𝐀 𝐨𝐮𝐭𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐬 𝐆𝐏𝐓4𝐨 𝐚𝐧𝐝 𝐒𝐨𝐧𝐧𝐞𝐭 3.5 NVIDIA has released a fine-tuned version of Llama 3.1 70B that outperforms both OpenAI's GPT4o and Anthropic's Sonnet 3.5 on multiple benchmarks. This model was trained using Reinforcement Learning from Human Feedback (RLHF), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as initial policy. NVIDIA models are now available worldwide under a Llama 3.1 Open Source license on Hugging Face, and according to it, the Llama-3.1-Nemotron-70B-Reward-HF model is "#1 on all three automatic alignment benchmarks, edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet." as o October 1st. 🔗 Hugging face repository: https://lnkd.in/dqexeux6 🔗 Paper: https://lnkd.in/db7YfEDP
1 Comment
Like Comment
To view or add a comment, sign in
Druce Vertes
9mo Edited
Report this post
AI Reading for Thursday March 14, 2024! Cerebras shows giant AI training chip, betting on bifurcation of GPU market into separate training and inference chips. - SiliconANGLE https://lnkd.in/e7kDyANc Nvidia to talk about new H200 chips with more memory, upcoming Blackwell architecture at GTC conference next week - Spiceworks OpenAI briefly posted hints at GPT 4.5 coming late summer 2024. - Android Authority A profile of Scott Wu, creator of Devin - Analytics India Magazine Black Nazis and Vikings are back, this time via Adobe Firefly, and the outrage machine snowflakes need a safe space from AI - Mail Online Demo of Figure Humanoid Android now with OpenAI - YouTube Watch Anymal robot dog run an obstacle course, climb rubble. Kurzweil and Musk talk about the singularity, and when we'll have more intelligence and computational capacity in data centers than all human brains - The Independent Testing an AI's moral compass by giving it 50,000 trolley problems - Ars Technica Making AI-generated YouTube video content spam targeted at kids for fun and profit - WIRED Satya Nadella says Google should have been 'default winner' in AI - Business Insider Softbank eyes stake in Mistral NYT responds to OpenAI: no 'hacking' here, just OpenAI ignoring our paywalls and IP rights - Ars Technica Akaso night-vision binoculars turn night into day with AI. For all your peeping-Tom adventures j/k! - Yanko Design - Modern Industrial Design News A copilot for your bike to record and warn you about your surroundings and possible close calls - Ars Technica Fabric adds LLM intelligence to your command line, lets you complete many daily tasks through a prompt library.

AI Reading for Thursday March 14, 2024

skynetandchill.com
Like Comment
To view or add a comment, sign in

148,420 followers

919 Posts

View Profile Connect

Lior Sinclair’s Post

More Relevant Posts

Explore topics