In the context of The New York Times lawsuit against OpenAI and Microsoft, a central debate emerging right now is whether large language models like ChatGPT should have the right to train on copyrighted news articles without payment or permission.
So, for this week's episode, Jeff Jarvis joined me to discuss the issue around copyright and whether AI companies should be allowed to use news media's copyrighted content to train their models.
Jarvis is a veteran journalist and professor who recently testified to the US Senate Judiciary Subcommittee on Privacy, Technology, and Law on AI and the Future of Journalism. He's been the director of the Tow-Knight Center for Entrepreneurial Journalism at the Craig Newmark Graduate School of Journalism at the City University of New York. He is the author of six books, most recently "The Gutenberg Parenthesis: The Age of Print and its Lessons for the Age of the Internet." He co-hosts the podcasts "This Week in Google" and "AI Inside"
Some argue the large language models undermine their business models and intellectual property. But Jarvis thinks how the AI model is trained to learn is not fundamentally different from what human journalists already do in learning from others' reporting. However, he does see the potential for news companies to create revenue opportunities by licensing access to news databases.
We also discuss whether an obsessive focus on monetizing 'content' misses the important value journalists provide to society regarding trust, authority, and service. With AI set to commoditize content creation, Jarvis believes news organizations must concentrate more on these unique strengths.
I found the historical context around copyright in the US really fascinating, too. Did you know newspapers were not covered by early copyright legislation in the US, partly because governments wanted information to spread freely? It's funny how the script has flipped today.
Rebecca Tushnet, the Frank Stanton Professor of the First Amendment at Harvard Law School, in an interview with the Harvard Gazette, says, "If you care about the potential for lost jobs, we need to look to labor law and unfair competition law. Copyright is not going to help you with that." That has got me thinking: is there a different set of laws and policies for AI and publishing that we must focus on?
Jarvis and I dicussed where legal liability should reside in AI systems that distribute misinformation or infringe copyright. Is it the model builder, application developer, or end user? And what does society gain by having AI broadening access to information versus concerns around the replication of bias?
Stay tuned for the second part, where we discuss how journalism business models may be affected by the rise of generative AI.