Navigating AI In Audio
Beginning this month, the voice of Reid Hoffman, founding host of Masters of Scale, will occasionally be digitally rendered using AI.
Yikes, right?
That was our gut reaction, too. Then we took a look at the podcasting landscape and saw what others were doing. We talked to experts in the field of audio and AI voice cloning. We identified the existing guidelines for disclosure and began to imagine what our own protocol might look like. At some point, we stopped feeling so weird and started feeling excited for the future.
It should be noted that what we’re discussing here is different from Reid’s AI avatar that was unveiled earlier this month. Masters of Scale is not looking for a GPT-powered replacement for Reid’s brain. We’re just looking for a new way to bring his voice to our show.
Now, let’s talk about why.
Picture if you will Silicon Valley legend Reid Hoffman sitting in the overheated closet of a hotel room, reading the same sentence over and over, stumbling on the same tricky phrase, wearing out his vocal cords, while a producer sits on the other end of the line preparing to sift through a tremendous amount of tape in order to finalize the episode on time. Add to that the logistical headache of making sure someone as busy and well-traveled as Reid always has ample time in his schedule, access to a quiet room, and a high-quality microphone on hand wherever he might be in a given week.
It’s not glamorous, but this is often what making a podcast looks and sounds like. Voiceover can be a laborious, time-consuming process that takes away from what Reid really loves about making the show.
What does he really love? Building relationships with guests and hearing their stories. Learning something new and unexpected. And, above all, being the founder of Masters of Scale — the Big Ideas person who connects us to exciting thought leaders, steers the intellectual heart of the show, and selflessly shares the mic with new voices.
We want Reid to keep doing what he enjoys and what he’s good at. And we want listeners to feel like they’re still getting the show they know and love. The show that spotlights insightful, thought-provoking conversations between Masters of Scale.
We think AI presents an innovative and productive solution to our problem. AI-powered vocal cloning technology:
Reduces the amount of time Reid spends doing things that are repetitive, monotonous, and non-creative.
Frees up our producers to be more efficient and focused on tasks that really improve the show.
Lets all parties involved spend more time doing the things they enjoy, achieving their full potential.
This is the promise of AI as we see it. That’s how we talk about it on our show. We aim to practice what we preach.
For us, that means continuing to keep humans at the center of Masters of Scale. Human writers crafting scripts based on Reid’s business theories. Human producers cutting tape and doing quality control. Human hosts interviewing human guests telling amazingly human stories of scale.
By occasionally using a digital Reid soundalike (aka Synthetic Voice Reid or Reid-ish) to fill in narrative gaps, smooth over bits of badly recorded audio, and highlight specific learnings at key moments, we’re able to deliver on our promise to you, which is to share regularly released, well-produced lessons of scale from respected leaders. It’s a solution that we, and hopefully you, feel comfortable with.
This week, we’re offering listeners an exclusive look at our process in a behind-the-scenes episode of Masters of Scale. You’ll hear our producers go on the journey of creating a replica of Reid’s voice with the help of Ukrainian start-up Respeecher and fine-tuning the technology to meet our production standards. There will also be a conversation between WaitWhat CEO and Masters of Scale host Jeff Berman and Reid as he discusses his feelings about employing his audio replica. (Spoiler alert: He’s into it.)
But we recognize this is a big deal. Not just for us, but for podcasting as a whole.
The moment we start using a computer to simulate a human being’s voice we open ourselves up to important questions around ethics, disclosure, and the listener’s right to know what kind of content they’re being served.
Currently, there’s no industry standard for how to navigate these tricky ethical questions. Most people would agree that manipulative audio deepfakes are morally wrong. What about the myriad other uses of AI that aren’t malicious but still give that initial gut feeling of…weirdness? This is uncharted territory that we hope to chart.
We can safely assume conversations about how, when, and why to use these tools are taking place in media companies around the world. We think listeners should be included in those conversations. After all, it’s going in your ears, right?
For a more in-depth look at our approach to using AI in audio, the standards for disclosure we plan to adhere to, and our call for the adoption of a Listener’s Bill of Rights, click here.
As we said, we’re excited for the future. There’s a lot of potential in these tools, even if some of them inspire an uneasy initial reaction. As we dive headlong into the world of simulated human voices, we plan to keep our ears open to the feedback, thoughts, and criticism coming from the voices that matter most: Yours.
Leave a comment below or email us at [email protected] to let us know your thoughts. We promise a human will be reading them.
Gov't reform (Administrative Law, statistics SME ) advocate. Ex-Treasurer-Board of Directors; Vice President of Public Policy at Confluence Ballet Co., USF School of Public Affairs Master of public administration alumni
6mointeresting
Developmental Paediatrician @ Melbourne | Fellow, RACP
7moWoah