What Does the HarperCollins/Microsoft AI Contract Mean For The Publishing Industry?
This week, HarperCollins announced that it was going to begin licensing the rights to its nonfiction backlist to a mystery company's new AI model. A day later, it was later revealed the mystery company was Microsoft, but the AI model in question is apparently not its current AI iteration, CoPilot.
What the conglomerate is planning on doing with the AI in question has not been disclosed. A source from Bloomberg's coverage says that the end goal is not to begin publishing AI-generated books, but when given an opportunity to expand on the question, Microsoft declined to comment.
For those of us keeping an eye on what is happening in the AI-verse, this latest deal is not surprising. In May of this year, NewsCorp (parent company of HarperCollins) struck a deal with OpenAI to access current and archived content from NewsCorp's major news and information publications, including The Wall Street Journal, New York Post, The Times, and many others.
I can't find any information on whether the journalists whose work is scraped in this deal are getting any compensation, or whether the projected value of $250 million over five years is exclusively going to benefit NewsCorp and its new partner.
In this respect, the HarperCollins/Microsoft partnership is more promising:
Authors of the nonfiction books in question are being invited to sign rights individually rather than the publisher granting blanket permissions
Authors and publishers both get compensation per book, to the tune of $5K, split 50/50
The $2500 per title authors receive are not counted against existing advances
This approach is much better than the unregulated IP piracy currently standard in AI training protocols, so even the most skeptical self-proclaimed AI Luddites should be viewing this as a positive move. However, many valid concerns still remain, especially as it relates to AI's continuing and expanding impact on the publishing industry.
Who Gets The Money, And How Much?
Many authors are already expressing discomfort with the HarperCollins/Microsoft contract, which apportions equal compensation to both the author and the publisher for the AI training rights. After all, it's the author's creation... so shouldn't they get the lion's share of the money from licensing contracts?
As a hybrid and professional publishing advocate, I have long been of the opinion that authors concerned primarily with making money off their IP and/or retaining as many rights to their IP as possible should approach traditional publishing options with caution. This latest development only adds fuel to that fire.
Traditional publishers are first and foremost IP traders, and their business model functions off the purchasing of authors' IP and the selling of that IP in the form of books and other rights. In exchange for this, they produce, manufacture, warehouse, distribute, and market those books at their own expense.
To keep this business model viable, traditional publishers need to be making back a return on investment at minimum, and ideally a profit sufficient to send the next acquired author an advance on their IP. Whether you agree with the ethics of this model or not, it comes down to some pretty straightforward arithmetic: X - Y = Z.
In the case of the HarperCollins/Microsoft contract, there probably wasn't anything in the original acquisition agreement discussing AI training rights. In the absence of that information, what is the best course of action?
There is a good case to be made that in the absence of explicit language, the author retains full rights to the IP in this case. There is also a reasonable argument supporting the fact that because Microsoft wants access to the published book, an IP that HarperCollins does own by virtue of the fact that it was published, the IP belongs to the publisher. Not to mention that each acquisition agreement would be slightly different based on how well the author's representation haggled for or against certain inclusions.
I suspect that in order to minimize back-and-forth and avoid the lift required to parse every agreement in their backlist to customize a contract specific to each author, HarperCollins agreed to a 50/50 split, knowing full well that some authors would push back.
And some authors are pushing back. The Authors Guild published an overview of the AI training contract, and stated their opinion on the ethics of 50/50 compensation:
We believe that a 50-50 split for a mere AI training license gives far too much to the publisher. These rights belong to the author as they are not book or excerpt rights; it is the authors’ expression that produces value in AI licensing. Even when the publisher is serving as the licensor on behalf of its authors, the authors should receive most of the revenue, minus only the equivalent of an agent’s fee, plus what is needed to compensate the publisher for additional labor or rights, such as creating the files that are licensed and providing metadata—and that is to be negotiated between the publisher and the author or their agent.
HarperCollins author Daniel Kibblesmith took to BlueSky to state that the $2500 he'd be granted was insufficient and the contract he'd been sent was "abominable." When asked how much money he'd actually accept for his IP in this case, he responded:
I’d probably do it for a billion dollars. I’d do it for an amount of money that wouldn’t require me to work anymore, since that’s the end goal of this technology.
My prediction is that in the very near future, AI licensing rights are going to make their way into standard publishing contracts across the industry, and literary agents are going to have to add it to the list of considerations to look out for.
Perhaps pessimistically, I suspect that the 50/50 split we're seeing in this groundbreaking contract is the most generous offer authors are going to get from the traditional publishing industry. Moving forward, publishers are going to expect to be able to sell IP to AI companies right along with translations and film rights, and will take a lion's share of the cut.
I also predict this will be another area where hybrid and/or professional publishing will be able to rise to the occasion and offer authors a more lucrative alternative, continuing their tradition of above-industry-standard royalty agreements and/or full IP retention. That is, as long as their primary distributor, Amazon, doesn't build AI licensing into Kindle Direct Publishing's TOS, similarly to the approach Twitter/X took this November, thereby taking the decision out of users' hands entirely).
What Does The "Three-Year" Term Mean?
The HarperCollins/Microsoft AI contract has a term of three years. Absent reviewing the actual contract, I have questions about what this actually entails and how transparent Microsoft is being about their plans for this new reservoir of content.
Let's imagine this contract has come to an end. At this point, the author and publisher have been paid, and the content has been circulated, amalgamated, re-synthesized, and re-used for three years. What happens next? How do you un-read a book? Do you ask the AI program to forget the information it was using, or scrub through to delete? Perhaps the plan is to simply cut off access to the source material, but would such a step make any difference fundamentally if the content of the book had already been integrated into the model for years?
It's all well and good to hypothesize, but the truth is it's common knowledge in the tech world that nobody really knows how AI works, not even the people who create it, study it, or make money off it. As Will Douglas Heaven writes in the MIT Technology Review:
The largest models, and large language models in particular, seem to behave in ways textbook math says they shouldn’t. This highlights a remarkable fact about deep learning, the fundamental technology behind today’s AI boom: for all its runaway success, nobody knows exactly how—or why—it works.
Anyone at Microsoft who said they knew exactly how to stop AI from accessing content it had been synthesizing for years is pulling stuff out of thin air. So really the only things that seem like they would expire at the end of a contract are the guardrails set up by Microsoft to sell the contract as equitable to authors and publishers:
max 200 consecutive words of the source material in AI output
limitations to using 5% of the text at any one time
agreement not to scrape pirated content from illegal sources
Once the three-year contract is over, do these promises stand? Or is the three-year contract simply a grace period, a courtesy, before the AI is free to use as much of the books' content as they would like, from any source, without paying anything extra?
These are not attempted gotchas, these are genuine questions that I would personally want answers to before agreeing to anything (as an author or as a publisher). The fact that an explanation of how this would work is absent from any coverage I've seen so far is concerning: either it wasn't thought about, or someone is obfuscating intentions past the three-year mark.
What Will Be Done With The Content?
Microsoft declining to comment on whether or not the as-yet-unnamed AI model will be put to work producing AI-generated books is not promising. Not that the idea of AI-generated books is new. Currently, Amazon, Meta, and other content-sharing platforms require you to label AI-generated content as such, and some, like Kindle Direct Publishing, have explicit bans on publishing fully AI-generated material (although you can use "AI-assisted" material).
But that's not stopping people who really want to use generative AI to bypass the work of creating and get straight to the accolades. This is a bell that freelance ghostwriters, editors, and other content creatives have been ringing for a while now. Josh Bernoff published a piece this week about how AI is "screwing up" his editing ecosystem that's well worth a read. The Authors Guild has also been publishing information about how AI scrapes content illegally (and has for some time).
In my experience as a operations expert for professional publishing startups, I've seen many an author client show up with a manuscript partially or wholly created using ChatGPT prompts in order to speed along the publication process. My experience in academic publishing is also fraught with this issue: my journal editors are increasingly forced to grind the peer review process to a halt and run manuscripts suspected of plagiaristic or erroneous AI content through time- and resource-consuming research integrity protocols.
This is an ongoing conversation and one that will be happening for a long time. Why the HarperCollins/Microsoft contract warrants a closer look is that it's the first time one of the Big Five traditional publishers is picking a side. This agreement is precedent-setting, and what comes from it also will be.
On the one hand, I am pleasantly surprised by the decision to jump on new technology. Traditional publishing historically has been a bit of a dinosaur, and this bold step in a new direction is worlds different than the rending of clothes that occurred during the rise of the ebook in 2012. Embracing new technological solutions to get books published and into readers' hands is a good thing.
On the other hand, of course, authors are now being asked by the gatekeepers of literary culture to sign contracts that will make their own dreams of writing and publishing books that much more fraught. Why go through the hassle of negotiating with literary agents about royalties and advances and rights if you can just generate proprietary content at the press of a few buttons? Why have an acquisitions department at all, watching the market and sorting through slush piles to find the next bestseller, when you can just ask an AI model what type of books are trending... and write a perfect match in a few hours?
From a purely business standpoint, at least in the short term, it's really a no-brainer. Imagine how lean you could get. Thanks to the authors who came before, no more authors are needed in the publication of books. And if, in the meantime, authors decline to sign contracts that give AI access to their published IP, well then, they don't get the privilege of being one of the last humans to have their book published.
Where Do We Go From Here?
I'm being melodramatic to serve a larger point: There is a lot we don't know, and the fact that the bastions of traditionalism and literary gatekeeping are no longer reliably bastioning or gatekeeping in the way they were, we need to reorient. It's about more than who gets the lion's share of the compensation or what happens after the contract term ends.
If books are just another product, I understand the business case for the HarperCollins/Microsoft partnership and everything that comes from it. If books are more than products, if the end-to-end human element is something at all worth preserving, this is a moment to sit back and think about where we go from here.
This is especially a moment for hybrid and professional publishers to take stock. We've been the rakish daredevils, the scrappy younger siblings, the ones ready to adopt any and all technology to get more books published faster and with more flexibility. That's been the party line, and we've seen our industries grow exponentially as a result. "Disruption" meant eschewing traditionalism and gatekeeping in favor of going fast and doing new shit.
Now the traditional publishing houses are noticing and jumping on the bandwagon. And frankly, if IP is king, they have a huge advantage. A hundred years of amassed intellectual wealth and all the weight of their brand identities that (for now, at least) are synonymous with quality, exclusivity, and authority.
So how are we going to differentiate ourselves in the future? Well, for starters, we can start by doubling down on the good work we're already doing in terms of offering authors IP retention (or at least higher than average royalties). That by itself will be a selling point in just a few years. Simply staying the course will earn dividends.
But there's a more philosophical element to consider as well. What does "disruption" look like in a fully disrupted industry? In a world of AI-generated books, "disruption" may look like slowing down, getting deeper and more intimately in touch with our innate creativity, and being the place that people can come to publish and read books written by humans, for humans.
Book publisher for entrepreneurs and innovators | Founder of Damn Gravity Media | Subscribe to the Future Author newsletter ⬇️
3wI tend to believe the best content will win out. What I'm not sure of is who (or what) will end up creating the best content. Maybe this opens the door to authors who are willing to put in the work of writing by hand? Or maybe AI will get so good that we can't tell the difference. No clue.