🧐 Machine Learning Case Study: Apple building transcripts for Podcasts
Apple recently updated the Podcasts app with transcriptions.
You can now listen to podcast episodes alongside readable transcriptions.
Of course, at Apple's scale, there was no way to do this by hand.
ML to the rescue!
Aside from the transcripts being high quality, they released a blog post detailing how they got there.
Instead of using the common academic benchmark WER (word error rate) they used HEWER (human evaluation word error rate).
The former takes into account every error (including “ah”, “umm”, "yeah" vs "yes" etc) whereas the latter focuses on errors that make the transcript unreadable or have a different meaning.
Often transcripts will have a large WER but a much lower HEWER (the script is still readable).
The takeaway?
This is a perfect example of where default academic benchmarks (while useful) are not always the most ideal for your production use case.
When a new model comes out with state-of-the-art results on academic benchmarks it can be tempting to adopt it immediately.
However, the correct approach is:
Develop a test set for your use case -> experiment, experiment, experiment (with existing and new models)!
For example, working on Nutrify, I've found that a fine-tuned 2-year old pretrained computer vision model outperforms almost all recent models on our custom dataset when accounting for performance/latency tradeoffs.
However, I know this may change at any given moment.
Hence, experiment, experiment, experiment!
---
Links in top comment.
#ai #machinelearning #apple #podcasts
Find and follow our podcast here: Spotify: https://open.spotify.com/show/4l0X1hc8niGdoTxrKLGvp6 Apple Podcasts: https://podcasts.apple.com/us/podcast/builders-by-proxify/id1741714317 YouTube: https://www.youtube.com/@proxify.developers